Introduction
It is challenging to be pragmatic learner to be a system designer. What I like to do is to review notes from grokking system design, and write down some notes to design dropbox.
Case study
I like to take some time to write notes for designing dropbox.
Requirements for the system:
1. users should be able to upload and download their files/ photos from any device.
2. users should be able to share file or folders with other users.
3. support automatic synchronization between devices
4. file can be up to a GB.
5. ACID-ity is required. Atomcity, consistency, isolation and durability of all file operations should be guaranteed.
6. offline editing. offline editing, once online, the change should be synced.
Extended requirement
Design consideration
files can be stored in small parts or chunks (say 4MB), this can provide a lot of benefits.
Capacity estimation and constraints
. 500M total users, and 100M daily active users
. on average each user connects from there different devices.
. on average if a user has 200 files/photos, we will have 100 billion total files.
. average file size is 100 KB, total storage is 100B * 100KB => 10 PB
. one million active connections per minute
High level design
Block servers will work with clients to upload/download files from cloud storage, and Metadata servers will keep metadata of files updated in a SQL or NoSQL database. Synchronization servers will handle the workflow of notifying all clients about different changes for synchronization.
client -> block server, metadata server, synchronization server -> cloud storage, metadata storage
client -> block server, metadata server, synchronization server -> cloud storage, metadata storage
Component design
thing I learned and like to share here is the question:
How can clients efficiently listen to changes happening on other clients?
client periodically check with the server / HTTP long polling.
a. client
- internal metadata database
- chunker
- Watcher
- indexer
b. Metadata database
Metadata database should store those information: chunks, files, user, devices, workspace (sync folders)
c. Synchronization service
d. Message queuing service
client -> request queue -> synchronization service -> metadata DB
<- response queue
Message Queuing Service
3. Cloud/ Block storage
web client - notification server - synchronization queue - metadata servers - load balancers - block server <- metadata DB -> Metadata cache server, storage cache server - block/ cloud storage
Metadata partitioning
1. vertical partitioning
2. range based partitioning
3. hash-based partitioning
Caching
Load balancer
No comments:
Post a Comment