Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Thursday, February 11, 2021

Sharding: The Perils of Manual Sharding

Feb. 11, 2021

Manual sharding - my learning notes:

Automatica sharding
Manual sharding at the application layer
common mistakes: uneven data distribution, hotspots, and storing data on too few shards
challenges: database joins cross multiple servers

Sharding, including the day-1 shard creation and day-2 shard rebalancing, when completely automated can be a boon to high-volume data applications. Unfortunately, monolithic databases like Oracle, PostgreSQL, MySQL, and even newer distributed SQL databases like Amazon Aurora do not support automatic sharding. This means manual sharding at the application layer has to be performed if you want to continue to use these databases. The net result is a massive increase in development complexity. Your application now has additional sharding logic to know exactly how your data is distributed, and how to fetch it. You also have to decide what sharding approach to adopt, how many shards to create, and how many nodes to use. And also account for shard key as well as even sharding approach changes if your business needs change.

One of the most significant challenges with manual sharding is uneven shard allocation. Disproportionate distribution of data could cause shards to become unbalanced, with some overloaded while others remain relatively empty. It’s best to avoid accruing too much data on a shard, because a hotspot can lead to slowdowns and server crashes. This problem could also arise from a small shard set, which forces data to be spread across too few shards. This is acceptable in development and testing environments, but not in production. Uneven data distribution, hotspots, and storing data on too few shards can all cause shard and server resource exhaustion.

Finally, manual sharding can complicate operational processes. Backups will now have to be performed for multiple s

servers. Data migration and schema changes must be carefully coordinated to ensure all shards have the same schema copy. Without sufficient optimization, database joins across multiple servers could highly inefficient and difficult to perform.

Julia's coding blog - Practice makes perfect

Thursday, February 11, 2021

Sharding: The Perils of Manual Sharding

No comments:

Post a Comment