Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Monday, August 12, 2019

Case study: system design - instagram

August 12, 2019

Introduction

It is very good learning experience for me to learn how to design instagram based on grokking system design content. I like to do a case study and then allow myself to learn how to work on design with very good structure, gathering requirement, high level design, and dive deep, and tradeoff and quantitative analysis.

Case study

Here are the highlights I should remember to be evaluated in system design.

1. Problem exploration
2. Design approach
3. Data management
4. Tradeoffs
5. Deep dive
6. Quantitative analysis

I have to push myself to learn and how to design a simpler version of instagram.

Functional requirements

1. User should be able to upload/download/view photos
2. User can perform searches based on photo/video titles
3. User can follow other users
4. The system should be able to generate and display a user's News Feed consisting of top photos from all the people the user follows.

Non-functional requirements

1. Our service needs to be highly available.
2. The acceptable latency of the system is 200ms for News Feed generation.
3. Consistency can take a hit (in the interest of availability), if a user doesn't see a photo for a while; it should be find.
4. The system should be highly reliable; any uploaded photo or video should never be lost.

Not in scope:
Adding tags to photos, searching photos on tags, commenting on photos, tagging users to photos, who to follow, etc.

So, three things:

Functional requirements
Non-functional requirements
Not in scope

Design approach

The system would be read-heavy, so we will focus on building a system that can retrieve photos quickly.

1. Management of storage is important
2. Low latency is expected while viewing photos
3. Data should be 100% reliable. If a user uploads a photo, the system will guarantee that it will never be lost.

- ideas related to management of storage: distributed file storage like HDFS or S3.

4. Capacity estimation and constraints

Let us assume we have 500M total users, with 1M daily active users.
2M new photos every day, 23 new photos every second.
Average photo file size => 200KB
total space required for 1 day of photos
2M * 200KB => 400 GB
total space required for 10 years:
400GB * 365 ( days a year) * 10 (years) ~= 1425 TB

5. Database schema - Design database tables first

It is a good idea to design the database schema. The database schema would help to understand the data flow among various components and later would guide towards data partitioning.

It is important to write down tables with columns in the table, size and data type, primary key.

Tables:
Photo
User
UserFollow

We can consider using an RDBMS like MySQL, there is an issue to scale them.
We can consider using a distributed key-value store instead of using NoSQL.

I need to learn how to dive deep and do quantitative analysis.

It is important for me to write down table, column name, data type, size, so I can calculate in detail how many bytes for each row.

Let me skip the calculation. But I have to look into the size of table, 3.7 TB for all tables.

Photo:
284 bytes - each row in photo's table
one day, 2 million new photos get uploaded every day, 0.5 GB of storage for one day.
For 10 years 1.88 TB of storage.

I need to think carefully about Sharding.

Assume that a web server can have a maximum of 500 connections at any time.
Upload photos can be a bottleneck.
The idea is to separate reads from writes into two different service. We will have dedicated servers for reads and different servers for writes to ensure that uploads don't affect the system.

Data sharing

I like to learn how to figure out different schemes for metadata sharing:

1. Partitioning based on UserID

If one DB shard is 1TB, we will need four shards to store 3.7 TB of data. Let's assume that we keep 10 shards.

To uniquely identify any photo in the system, it is a working idea to append shard number with each PhotoID.

How can we generate PhotoIDs?

What are the different issues with this partitioning scheme?

It is challenging for me to think about the issues with the partitioning scheme.

1. How are hot users handled?

2. Some users will have a lot of photos compared to others, non-uniform distribution of storage

3. One user's photos are on multiple shards, will it cause higher latencies?

4. One shard for a user's all photos, if the shard is down or high latency if it is serving high load.

b. Partitioning based on PhotoID

1. wouldn't this key generating DB be a single point of failure?

Yes. It would be. A workaround for that is to use two databases with one generating even number and the other odd numbered.

Ranking and News Feed Generation

I am so surprised to read the content. I do not remember at all I read those content in 2018 when I prepare for Amazon onsite on June 6, 2018. I just could not believe that I did not write down any note on this topic at all.

News feed creation with sharded data

Given the task to create the News Feed for any given user is to fetch the latest photos from all people the user follows.

How to come out a mechanism to sort photos on their time of creation. One of ideas is to add photo creation time part of the PhotoID. As the primary index is available on PhotoID, it will be easy.

Follow up

Nov. 15, 2019

It is the first time I like to write down my review after ~~six~~ three months. It is true that I have no idea how good or bad I will be in terms of distributed system design. But I like to have some structure in terms of learning. Read Grokking system design, take notes, and then review notes once a while.

It is better to spend 20 minutes to read and think about what to learn from this blog.

Dec. 11, 2019
I spent time to review the blog, I have onsite from dialpad on Dec. 16, four days away.