Sunday, February 23, 2020

Case study: Design twitter

Feb. 23, 2020

Introduction


It was my mock interview as a system design interviewer. I have to push myself hard to perform better as an interviewer. I had chance to interview an engineer with strong coding skills, since I am not sure how good he is in terms of system design, I also asked a coding question to double check. I like to write a short case study on this mock interview.

Case study


I will add more detail later.

Design twitter, 1000 users,
- users : tweet, follow people, and have followers
users:
userId(twitterId) : Name, location, email address
tweets table:
tweetId -> userId, content of tweet(text), time
Likedtweets
tweetId -> userId, time
Retweets
tweetId -> retrweeUserId, time
FollowTable
user1->user2
user2->user3
user3->user1
POST CreateUser(email, name, password) OK (200)
UpdateUser()
Login(email, password) -> return auth token OK (200)
POST WriteTweet() -> OK(200)
DeleteTweet(tweetId)
Retweet(tweetId)
LikeTweet(tweetId)
GET GetTweets(int GetNext) JSON {tweets}
Post FollowUser
TwitterService
CAP theorom
push message onto serach service queue with tweetID
- analysis of words and hastags that are in tweet
- go ahead and perform some sort of ML model on tweet
- store searchable word -> tweetId
- intwerviewing.io -> 50 tweets
- system desing - > 30 tweets
- rolling average of the last 60 minutes of trending stuff
- we can start fresh at the start of every hour and count things that are trending
Search Service, reads the tweet, parses out hashtags, words and word/hashtag -> tweetId
trending -> count
celebrity, topic, hashtag
- availability over consistency
show case your design:
celebrity with milions of followers, tweet, notification to millions - bottleneck, network, cache, push/pull, search, ...
- user A has 20M followers
we're doing push? then we end up with 20M requests sent down
- push down to users who phave been active in last 24 hours
- periodically prepopulate feed for users
Database scaling
- shard based on userID consistent hashing
- cluser of databases (read replicas) (master slave replication, where we're opting mostly for availaiblity) - async replication
cache
- search service (trending)
- tweet itself
- tweetID does exist in chace? yes, get from cache, no, hit database
- cache users and their tweets
database, storage - build index,
- transaction
deposit
withdraw
- consistency
storage design - cloud storage
- balance binary search tree - B-tree
- transaction - log -
log-structure merge tree -
CA P
-
- large data stream, integer,
1, 2, 3, 4, 5, 6, -> six, 3, 4,
3 + 4/ 2
1, 2,3,4,5 , odd number, median -
assume that you store all integer into a few data structure, order can be lost, it is ok.
Add(int, ) -> do not require ->
findMedian() -> O(1) time complexity
minheap, maxheap, keep size of two heap max diff < 1
determine how to get median,
LSM-tree -> video, Microsoft, merge, Apple employee ->
Log -> append -> merge <- backward ->
client-> load balanace-> reverse proxy -> twitter, socail profile servcice -> cache -> storage -> database
long polling -> ...
booking in United statemet
diagram
- hand
https://gist.github.com/jianminchen/ff21b0a1cad67216376999512dd1b696
booking
design twitter ->
URL -
view raw Design twitter hosted with ❤ by GitHub
Here is the link.


Actionable Items


The interviewee is Microsoft SDE II. I should learn from his performance. Good analysis and very good communication skills.

No comments:

Post a Comment