Introduction
It is the best mock interview I have so far as an interviewer. The interviewee performed so well, so that I felt that I was sitting in the classroom and have a class about how to design of twitter. After the mock interview, he told me that he was preparing for Facebook and Google onsite. Definitely it is a good showcase how good the top performer can do this day, March 22, 2020.
Case study
I will go over the notes later and add more detail.
Here is the link.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use cases | |
Functional requirements: | |
We'll scope the problem to handle only the following use cases | |
i) User posts a tweet | |
avg tweet, video or audio combo size is 10 KB | |
ii) Service pushes tweets to followers, sending push notifications and emails | |
latency is less thab 2 seconds | |
iii) User views the user timeline (activity from the user) - chrnological order | |
50 tweets | |
iv) User views the home timeline (activity from people the user is following) -chrnological order | |
50 tweets | |
v) User searches keywords | |
tweet related to keywords -> max search result range 20 tweets (sort timestamp) | |
vi) Performance metrics , monitoring or logging | |
Non functional requirements: | |
i) Service has high availability | |
ii) Latnecy should be low | |
Layout: | |
Estimates 100 million daily active users, 500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec | |
RPS : 1000 req/sec | |
tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data | |
user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data | |
Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data | |
Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data | |
Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers) | |
Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data | |
Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec | |
API design: | |
i) postTweet(userId, tweetText, location, timestamp, byte[] media) | |
ii) getHomeTimeline(userId, timestamp, size) | |
iii) getUserTimeline(userId (usertimeline), vistorId, size) | |
iv) search(userId, location, timestamp, searchText) | |
Database design: | |
User: | |
UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc | |
UserTweet: | |
UserID (row key) | |
tweetId, creation date | |
Tweet | |
TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount | |
Media | |
MediaId,blobpath, creationdate | |
Followers: (who is following me) | |
UserId | |
FollowerId | |
Following: (What I am following) | |
UserId | |
FollowingIds | |
High level design: | |
Caching | |
Paritioning | |
Scalability | |
Availability | |
Latency | |
bottlenecks | |
extra stuff | |
My notes taken in the mock interview as an interviewer
I also learned from the interviewer, and what I did is to take notes every step, and I felt that I was sitting in the classroom and the interviewee gave me a teaching class.
No comments:
Post a Comment