Sunday, March 22, 2020

Case study: system design twitter

March 22, 2020


Introduction


It is the best mock interview I have so far as an interviewer. The interviewee performed so well, so that I felt that I was sitting in the classroom and have a class about how to design of twitter. After the mock interview, he told me that he was preparing for Facebook and Google onsite. Definitely it is a good showcase how good the top performer can do this day, March 22, 2020.


Case study


I will go over the notes later and add more detail.

Here is the link.


Use cases
Functional requirements:
We'll scope the problem to handle only the following use cases
i) User posts a tweet
avg tweet, video or audio combo size is 10 KB
ii) Service pushes tweets to followers, sending push notifications and emails
latency is less thab 2 seconds
iii) User views the user timeline (activity from the user) - chrnological order
50 tweets
iv) User views the home timeline (activity from people the user is following) -chrnological order
50 tweets
v) User searches keywords
tweet related to keywords -> max search result range 20 tweets (sort timestamp)
vi) Performance metrics , monitoring or logging
Non functional requirements:
i) Service has high availability
ii) Latnecy should be low
Layout:
Estimates 100 million daily active users, 500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec
RPS : 1000 req/sec
tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data
user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data
Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data
Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data
Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers)
Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data
Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec
API design:
i) postTweet(userId, tweetText, location, timestamp, byte[] media)
ii) getHomeTimeline(userId, timestamp, size)
iii) getUserTimeline(userId (usertimeline), vistorId, size)
iv) search(userId, location, timestamp, searchText)
Database design:
User:
UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc
UserTweet:
UserID (row key)
tweetId, creation date
Tweet
TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount
Media
MediaId,blobpath, creationdate
Followers: (who is following me)
UserId
FollowerId
Following: (What I am following)
UserId
FollowingIds
High level design:
Caching
Paritioning
Scalability
Availability
Latency
bottlenecks
extra stuff
view raw Design twitter hosted with ❤ by GitHub
Drawing of system design


My notes taken in the mock interview as an interviewer

I also learned from the interviewer, and what I did is to take notes every step, and I felt that I was sitting in the classroom and the interviewee gave me a teaching class.





No comments:

Post a Comment