Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Saturday, December 14, 2019

Case study: Design dropbox

Dec. 14, 2019

Introduction

It is challenging to be pragmatic learner to be a system designer. What I like to do is to review notes from grokking system design, and write down some notes to design dropbox.

Case study

I like to take some time to write notes for designing dropbox.

Requirements for the system:
1. users should be able to upload and download their files/ photos from any device.
2. users should be able to share file or folders with other users.
3. support automatic synchronization between devices
4. file can be up to a GB.
5. ACID-ity is required. Atomcity, consistency, isolation and durability of all file operations should be guaranteed.
6. offline editing. offline editing, once online, the change should be synced.

Extended requirement

Design consideration

files can be stored in small parts or chunks (say 4MB), this can provide a lot of benefits.

Capacity estimation and constraints

. 500M total users, and 100M daily active users
. on average each user connects from there different devices.
. on average if a user has 200 files/photos, we will have 100 billion total files.
. average file size is 100 KB, total storage is 100B * 100KB => 10 PB
. one million active connections per minute

High level design

Block servers will work with clients to upload/download files from cloud storage, and Metadata servers will keep metadata of files updated in a SQL or NoSQL database. Synchronization servers will handle the workflow of notifying all clients about different changes for synchronization.

client -> block server, metadata server, synchronization server -> cloud storage, metadata storage

Component design

thing I learned and like to share here is the question:

How can clients efficiently listen to changes happening on other clients?

client periodically check with the server / HTTP long polling.

a. client

internal metadata database
chunker
Watcher
indexer

b. Metadata database

Metadata database should store those information: chunks, files, user, devices, workspace (sync folders)

c. Synchronization service

d. Message queuing service

client -> request queue -> synchronization service -> metadata DB

<- response queue

Message Queuing Service

3. Cloud/ Block storage

web client - notification server - synchronization queue - metadata servers - load balancers - block server <- metadata DB -> Metadata cache server, storage cache server - block/ cloud storage

Metadata partitioning

1. vertical partitioning

2. range based partitioning

3. hash-based partitioning

Caching

Load balancer