Sunday, December 15, 2019

System Design - Sharding | Data Partitioning

Here is the link.


Sharding or Data Partitioning: - Sharding is a technique to break up a big database into many smaller parts - Horizontal scaling means adding more machines, which is cheaper & more feasible - Vertical scaling means improving servers - Partitioning Methods: - Horizontal partitioning: - In this we put different rows into different table(db), i.e. rows can be based on location with zip codes, this is also range based sharding - Problem is if range is not choosen carefully, it'll lead to unbalanced servers. - Vertical Partitioning: - In this we divide data to store tables related to a specfic feature to their own servers.i.e. In Instagram, we can have 1 for user profile, 1 for photos, 1 for friend list. - Problem is in case of additional growth, its necessary to further partition a feature specific DB across servers. - Directory based Partitioning: - In this, create a lookup service which knows your current partitioning scheme & get it from DB access code. - To find out a data entry, we query directory server that holds the mapping between each tuple key to its DB server. - This is good to perform tasks like adding servers to the DB pool or change our partitioning scheme without impacting application. - Partitioning Criteria: - Hash-based partitioning: - In this we apply a hash function to some key attribute of the entity we're storing, that gives partition number. - Make sure to ensure uniform allocation of data among servers - Problem is it effectively fixes total number of DB servers, since adding a new server means changing the hash function, which would require redistribution of data & downtime, the workaround is Consistent Hashing. - List Partitioning: - In this each partition (DB server) is assigned a list of values, i.e. APAC, EMEA, US region has respective partition. - Round-robin Partitioning: - One by one assign which ensure uniform data distribution - Composite partitioning: - Combining any of above partitioning schemas to devise a new scheme. i.e First list partitioning with hash partitioning in each. - Common problems with Sharding: - Joins & Denormalisation: - Performing joins on a database which is running on one server is straightforward, but if DB is partitioned & spread across multiple machines, it's not feasible to perform joins that span database shards. - These joins won't be performance efficient. - Workaround is to denormalise the DB so that queries that required joins can be performed on single table. - Referential integrity: - Trying to enforce data integrity constraint suh as foreign keys in a sharded DB can be extremely difficult. Most of RDMS do not support this. - Rebalancing: - When data distribution is not uniform. - When there is lot of load on a particular Shard. In this video, we're going to reveal Sharding used in System Design: - What is Sharding/ Data Partitioning - Sharding Methods - Sharding Criteria - Sharding Challenges

No comments:

Post a Comment