Jan. 27, 2020
Introduction
I met a software engineer who worked with me on system design on Jan, the topic is to design twitter on Jan. 27, 2020. I believe that he has very good presentation of his knowledge and experience. I like to work on a short case study.
Case study
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
design a twitter, let us start, less than 1000 users, RSTful API, post twitter, follow, follower can see time line, hashtag, search using hashtag, -> scability, bottom neck | |
API | |
authentication jwt token | |
encrpty user_id with information this token only remaining valid around 1 week | |
define salt key random string | |
decrypt token invalid | |
/we store token after user authenticate redis | |
http protocol | |
register(username,password) | |
login(username, password) | |
post_tweet(user_id, content) | |
follow(user_id, followed_user_id) | |
get_recent_timeline(user_id, start, end) return array of tweets | |
search(hashtag) return array of tweets | |
header 200 ok | |
server internal error 500 | |
request parameter invalid 400 | |
body: { | |
message: "ok" | |
error_code: 23, | |
error_message: "" | |
} | |
Table | |
User table | |
id , username , password, created_at | |
Tweet table | |
id, user_id , content, created_at | |
Hashtag table | |
id, tweet_id, hashtag | |
1 . 1 . hashtag1 | |
2 1 . hashtag2 | |
Follower table | |
user_id, follow_id bot id foreign key to user table, created_at | |
Timeline: | |
agregate based on recent tweets from our follower | |
so it will be agregate background and it will store in cache in redis | |
redis | |
user_id: [tweet_id, tweet_id, tweet_id ] | |
Diagram: | |
less than 1000 user | |
Client -> LB -> Reverse Proxy -> UserService -> Redis ->MYSQL | |
-> TimelineService -> Redis -> MYSQL | |
please describe and write down why we need reverse proxy: | |
Reverse Proxy | |
we can add rate limiter | |
SSL termination | |
compression response for outbound | |
blaclist certain ip | |
select * from hashtag_table like '#hashtag' - for searching hashtag | |
ranking based on like, retweet on tweet | |
Activity: | |
type (like, retweet) | actor_id ( user_id) | tweet_id | created_at | |
1st query if grow big store in cache | |
tweet_id_total_likes: 123 | |
tweet_id_total_retweet: 123 | |
every time we do agregation we can also update in tweet table | |
denomarlize | |
Tweet table | |
id, user_id , content, created_at, total_likes, total_retweet | |
Ranking: | |
we can read based on total_likes or total_retweet basedo n algorithm | |
100 Million | |
something #hastag something | |
if tweet include hashtag it will do tokenizer based on hashtag as the main key it will send to queue and worker will index to elastic search it will use inverted index | |
got tweet send asycn to queue | |
worker instance will call index api to elastic search | |
inverted index is data structure for faster lookup | |
let say | |
doc1 tweet #something #abc | |
doc2 tweet #something #abc2 | |
tokenizer | |
#s | |
#so | |
#som | |
#some | |
// keyword index criteria - | |
1 "something like this #abc" | |
fuzzy search some | |
1 | |
#ab | |
- | |
#abc | |
1 | |
tokenizer phrase that not has hashtag | |
we define min length 4 | |
some | |
somet | |
someth | |
somethi | |
somethin | |
something | |
like | |
#abc | |
#something doc1, doc2 | |
#abc doc1 | |
#abc2 doc2 | |
Client -> LB -> Reverse Proxy -> Fanout -> SearchService -> Queue -> Worker -> ElasticSearch | |
-> PostTweetService -> MYSQL | |
-> Redis | |
Fanout | |
Please write down some keywords to help understand FAUout | |
Fanout is for sending /forwarding request to multiple service | |
Fanin call from server service and merge | |
-> TimelineService -> Redis -> MYSQL | |
we will have a lot of followers | |
we can 2 mechanism timeline | |
push /pull | |
hybrid | |
server will do push information new tweet if user follower if less than 1 million | |
if user follower if greater than 1 million it means to get the new tweets from that user they must using pull | |
push mechanism can be divide in two | |
SSE / Websocket | |
SSE | |
Pull mechanism | |
short polling every 5 minutes calling to see new tweet from celebrity follower | |
Survey - | |
Cassadra - | |
distributed storage - | |
AWS - storage - | |
zookeeper - Redis - idea | |
Cassandra High throupout and high availability but the downside eventual consistent | |
Timeline | |
TimelineFeed | |
User_id -> list(tweet_id) | |
redis | |
tweet_id->"content" | |
mysql it use master slave | |
quorum ring based consistent hashing | |
consistent hashing | |
5 server | |
size of the ring 0 - 2^32-1 | |
server1 1 | |
server2 3 | |
server3 4 | |
hash the data and doing % with size of the ring | |
it will place the data based on the clock wise | |
let say i hash data i got number 2 it will store in server2 | |
large data intensive application (this i read the book) | |
- do you learn by reading or you also work on large distribute system | |
real experience - large distributed system - | |
what is technology different from twitter system | |
what is weakness/ strength system design? | |
if familiar with questoin is easy | |
let say design something im not familiar like live streaming like |
No comments:
Post a Comment