Here is the link.
Sharding or Data Partitioning:
- Sharding is a technique to break up a big database into many smaller parts
- Horizontal scaling means adding more machines, which is cheaper & more feasible
- Vertical scaling means improving servers
- Partitioning Methods:
- Horizontal partitioning:
- In this we put different rows into different table(db), i.e. rows can be based on location with zip codes, this is also range based sharding
- Problem is if range is not choosen carefully, it'll lead to unbalanced servers.
- Vertical Partitioning:
- In this we divide data to store tables related to a specfic feature to their own servers.i.e. In Instagram, we can have 1 for user profile, 1 for photos, 1 for friend list.
- Problem is in case of additional growth, its necessary to further partition a feature specific DB across servers.
- Directory based Partitioning:
- In this, create a lookup service which knows your current partitioning scheme & get it from DB access code.
- To find out a data entry, we query directory server that holds the mapping between each tuple key to its DB server.
- This is good to perform tasks like adding servers to the DB pool or change our partitioning scheme without impacting application.
- Partitioning Criteria:
- Hash-based partitioning:
- In this we apply a hash function to some key attribute of the entity we're storing, that gives partition number.
- Make sure to ensure uniform allocation of data among servers
- Problem is it effectively fixes total number of DB servers, since adding a new server means changing the hash function, which would require redistribution of data & downtime, the workaround is Consistent Hashing.
- List Partitioning:
- In this each partition (DB server) is assigned a list of values, i.e. APAC, EMEA, US region has respective partition.
- Round-robin Partitioning:
- One by one assign which ensure uniform data distribution
- Composite partitioning:
- Combining any of above partitioning schemas to devise a new scheme. i.e First list partitioning with hash partitioning in each.
- Common problems with Sharding:
- Joins & Denormalisation:
- Performing joins on a database which is running on one server is straightforward, but if DB is partitioned & spread across multiple machines, it's not feasible to perform joins that span database shards.
- These joins won't be performance efficient.
- Workaround is to denormalise the DB so that queries that required joins can be performed on single table.
- Referential integrity:
- Trying to enforce data integrity constraint suh as foreign keys in a sharded DB can be extremely difficult. Most of RDMS do not support this.
- Rebalancing:
- When data distribution is not uniform.
- When there is lot of load on a particular Shard.
In this video, we're going to reveal Sharding used in System Design:
- What is Sharding/ Data Partitioning
- Sharding Methods
- Sharding Criteria
- Sharding Challenges
From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment