Thursday, February 11, 2021

Sharding: database sharding

My notes:

  1. query throughput and response times - 
  2. adding resources to support database operations - vertical scaling - scaling up - its own set of limits
  3. horizontally partitioning a table - more compute capacity - serve incoming queries - faster query response times and index builds 
  4. a network of smaller, cheaper servers vs one big server 
  5. high availability at a low cost than vertical scaling 


 Business applications that rely on a monolithic RDBMS hit bottlenecks as they grow. With limited CPU, storage capacity, and memory, query throughput and response times are bound to suffer. When it comes to adding resources to support database operations, vertical scaling (aka scaling up) has its own set of limits and eventually reaches a point of diminishing returns.

On the other hand, horizontally partitioning a table means more compute capacity to serve incoming queries, and therefore you end up with faster query response times and index builds. By continuously balancing the load and data set over additional nodes, sharding also enables usage of additional capacity. Moreover, a network of smaller, cheaper servers may be more cost effective in the long term than maintaining one big server.

Besides resolving scaling challenges, sharding can potentially alleviate the impact of unplanned outages. During downtime, all the data in an unsharded database is inaccessible, which can be disruptive or downright disastrous. When done right, sharding can ensure high availability: even if one or two nodes hosting a few shards are down, the rest of the database is still available for read/write operations as long as the other nodes (hosting the remaining shards) run in different failure domains. Overall, sharding can increase total cluster storage capacity, speed up processing, and offer higher availability at a lower cost than vertical scaling.



No comments:

Post a Comment