Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Monday, January 27, 2020

Case study: System design - design twitter

Jan. 27, 2020

Introduction

I met a software engineer who worked with me on system design on Jan, the topic is to design twitter on Jan. 27, 2020. I believe that he has very good presentation of his knowledge and experience. I like to work on a short case study.

Case study

	design a twitter, let us start, less than 1000 users, RSTful API, post twitter, follow, follower can see time line, hashtag, search using hashtag, -> scability, bottom neck

	API
	authentication jwt token

	encrpty user_id with information this token only remaining valid around 1 week
	define salt key random string
	decrypt token invalid

	/we store token after user authenticate redis

	http protocol
	register(username,password)
	login(username, password)
	post_tweet(user_id, content)
	follow(user_id, followed_user_id)
	get_recent_timeline(user_id, start, end) return array of tweets
	search(hashtag) return array of tweets

	header 200 ok
	server internal error 500
	request parameter invalid 400

	body: {
	message: "ok"
	error_code: 23,
	error_message: ""
	}

	Table
	User table
	id , username , password, created_at

	Tweet table
	id, user_id , content, created_at

	Hashtag table
	id, tweet_id, hashtag
	1 . 1 . hashtag1
	2 1 . hashtag2


	Follower table
	user_id, follow_id bot id foreign key to user table, created_at


	Timeline:
	agregate based on recent tweets from our follower
	so it will be agregate background and it will store in cache in redis
	redis
	user_id: [tweet_id, tweet_id, tweet_id ]




	Diagram:
	less than 1000 user

	Client -> LB -> Reverse Proxy -> UserService -> Redis ->MYSQL
	-> TimelineService -> Redis -> MYSQL

	please describe and write down why we need reverse proxy:

	Reverse Proxy
	we can add rate limiter
	SSL termination
	compression response for outbound
	blaclist certain ip

	select * from hashtag_table like '#hashtag' - for searching hashtag

	ranking based on like, retweet on tweet

	Activity:
	type (like, retweet) \| actor_id ( user_id) \| tweet_id \| created_at



	1st query if grow big store in cache
	tweet_id_total_likes: 123
	tweet_id_total_retweet: 123

	every time we do agregation we can also update in tweet table
	denomarlize

	Tweet table
	id, user_id , content, created_at, total_likes, total_retweet



	Ranking:
	we can read based on total_likes or total_retweet basedo n algorithm


	100 Million

	something #hastag something
	if tweet include hashtag it will do tokenizer based on hashtag as the main key it will send to queue and worker will index to elastic search it will use inverted index

	got tweet send asycn to queue
	worker instance will call index api to elastic search

	inverted index is data structure for faster lookup

	let say

	doc1 tweet #something #abc
	doc2 tweet #something #abc2

	tokenizer
	#s
	#so
	#som
	#some

	// keyword index criteria -
	1 "something like this #abc"

	fuzzy search some
	1
	#ab
	-
	#abc
	1
	tokenizer phrase that not has hashtag
	we define min length 4

	some
	somet
	someth
	somethi
	somethin
	something

	like

	#abc



	#something doc1, doc2
	#abc doc1
	#abc2 doc2

	Client -> LB -> Reverse Proxy -> Fanout -> SearchService -> Queue -> Worker -> ElasticSearch
	-> PostTweetService -> MYSQL
	-> Redis
	Fanout
	Please write down some keywords to help understand FAUout

	Fanout is for sending /forwarding request to multiple service
	Fanin call from server service and merge

	-> TimelineService -> Redis -> MYSQL

	we will have a lot of followers

	we can 2 mechanism timeline
	push /pull
	hybrid

	server will do push information new tweet if user follower if less than 1 million
	if user follower if greater than 1 million it means to get the new tweets from that user they must using pull

	push mechanism can be divide in two
	SSE / Websocket
	SSE

	Pull mechanism
	short polling every 5 minutes calling to see new tweet from celebrity follower

	Survey -
	Cassadra -
	distributed storage -
	AWS - storage -
	zookeeper - Redis - idea


	Cassandra High throupout and high availability but the downside eventual consistent
	Timeline
	TimelineFeed
	User_id -> list(tweet_id)

	redis
	tweet_id->"content"

	mysql it use master slave


	quorum ring based consistent hashing
	consistent hashing
	5 server
	size of the ring 0 - 2^32-1

	server1 1
	server2 3
	server3 4

	hash the data and doing % with size of the ring
	it will place the data based on the clock wise
	let say i hash data i got number 2 it will store in server2


	large data intensive application (this i read the book)

	- do you learn by reading or you also work on large distribute system
	real experience - large distributed system -
	what is technology different from twitter system

	what is weakness/ strength system design?
	if familiar with questoin is easy
	let say design something im not familiar like live streaming like

view raw Design twitter hosted with ❤ by GitHub

Julia's coding blog - Practice makes perfect

Monday, January 27, 2020

Case study: System design - design twitter

Introduction

No comments:

Post a Comment