Introduction
It is tough for me to be a system design interviewer. I could not probe the depth of technology as an interviewer, and I got lowest rating the first time. I like to do a short case study on this interview.
Case study
I will add more detail later.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Design a twitter, simple version, 100 follower at most for one person, RESTFUL API. Later on celebrity millions follower, hashtag, allow search using hashtag. | |
Accounts, 100 followers max | |
Searching - hashtag, text | |
Feed generation | |
SCHEMA | |
------ | |
Account | |
- handle (PK) | |
- display name | |
- followers | |
- feed (point to head of linked list) | |
- last_login | |
- src IP address | |
Follow table | |
- src | |
- dst | |
Message | |
- message_id (PK) | |
- author (FK into account) | |
- content (str, max 140 chars) | |
- hashtags | |
- time | |
Hashtags | |
- hashtag (PK) | |
- message_id (FK into message) | |
API | |
--- | |
different endpoints for different functions | |
use cases: | |
- post message | |
- delete message? | |
- search_text | |
- search_hashtag | |
- request_feed | |
- follow | |
post_message(handle, message) -> POST | |
delete_message(handle, message_id) -> DELETE | |
search_text(text) -> GET # need the requesting users handle so we can see private tweets | |
search_hashtag(hashtag) -> GET | |
request_feed(handle) -> GET | |
follow(handle_src, handle_dst) -> POST | |
SQL Database + cache (write through to elasticsearch) | |
Elasticsearch (for text queries only) | |
Text search service # maybe backed by alternative database suited for querying eg. elasticsearch or other document store | |
Hashtag search service | |
Feed generation service | |
Fraud detection service | |
Notification service -> question about protocol? long polling | |
Follow service (checks whether exceeded follow limit, raise event to regenerate feed) | |
when a user A posts a tweet -> raise event -> queue to recompute feeds for users B that follow user A | |
Elasticsearch | |
words -> documents (tweets) | |
"the dog went to the park" Tweet #23 | |
Index - inverted index | |
"the" -> [23] | |
"dog" -> [23] | |
.. | |
.. | |
.. | |
"park" -> [23] | |
"typical index" id -> contents | |
"inverted index" contents -> id | |
How to scale the system? | |
How to see tweets of people they are following? | |
14:00 create new account | |
14:000001 100 new follows | |
14:02 1 follows | |
14:03 1 follow | |
how to deploy code? | |
Build, | |
4GB -> code | |
10,000 deployment instances? | |
7 regions | |
Problems: | |
- Efficient way to build code so we don't have to build it 10,000 times? A: use docker container to avoid recompiling 4G of code for each deployment | |
- How to update live version with new build? | |
- Load balancers: | |
- spin up small number of instances with new build | |
- load balancer redirects small portion of traffic to new instances | |
- wait to see if there are errors/crashes etc | |
- gradually introduce new instances of new build, and remove instances of old build | |
- if errors, take new instances down until we fix bug | |
- if no errors, continue increasing instances of new build until it reaches 10,000 | |
- schedule based on low-demand times for each region |
Interviewer feedback
No comments:
Post a Comment