Tuesday, September 7, 2021

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore | My first 50 minutes study

Sept. 7, 2021

Here is the link. 

Alfred Fuller, Matt Wilder

For the first three years of App Engine, the health of the datastore was tied to the health of a single data center. Users had low latency and strong consistency, but also transient data unavailability and planned read-only periods. The High Replication Datastore trades small amounts of latency and consistency for significantly higher availability. In this talk we discuss user-facing and operational issues of the original Master/Slave Datastore, and how the High Replication Datastore addresses these issues.

Notes from this website

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

Notes taken from Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

  1. Two types of datastore in App Engine:
    • Master/Slave
      • This is the old style. There is one master that handles all the reads/writes and asynchronous writes happen to the slave.
    • High Replication
      • This is the new default style. There is no master in this one as writes happen to all nodes synchronously. All act as a collective master.
  2. Datastore Stack:
    • The actual datastore is the highest level. This is schema-less storage and has advance query engine.
    • This sits atop megastore which is defined by a strict schema and queried using standard SQL.
    • Megastore is powered by Bigtable which is a distributed key-value store.
      • Big Table is super fast and highly scalable. However, this design has some tradeoffs. Mainly data can be unavailable for short periods of time
    • Finally, the file system that powers all this is GFSv2, a distributed filesystem.
  3. Writes to Datastore
    • In a Master/Slave, write happens to Datacenter A and gets asynchronously written to Datacenter B at a later time.
    • In High Replication, write happens to a majority of the replicas synchronously. The other replica(s) that don’t get the write synchronously gets an asynchronous write scheduled. Or can be on-demand replication when Read comes in to that datastore and it realizes that it doesn’t have that data.
    • Writes to Master/Slave is faster (20ms) compared to High Replication Datastore (45ms).
    • Read latency is about the same but read error rate in High Replication is way less (0.001% vs 1%). Thus, resulting in 5m vs 9h downtime.
  4. Planned Maintenance
    • Master/Slave
      • Datacenter A becomes readonly, thus app running on app engine will be readonly. In the meantime, the catchup happens to datacenter B.
      • Once that is done, then the switchover will happen.
      • Requires engineer to initiate switchover.
    • High Replication
      • Seamless migration. Switching is almost transparent.
      • Memcache flush + 1 min no-caching.
      • This is primarily hosted in a single datacenter. Reason is memcache is quite fast, and doing replication across datacenters is too slow.
  5. Unplanned Maintenance
    • Master/Slave experiences immediate switchover. Thus, some data is lost and app is serving stale data. Up to devs to manually flush partial data that was written to Datacenter A to Datacenter B.
    • For High Replication, this is the same as a planned maintenance. Designed to withstand multiple datacenter failures.
  6. Some Issues with Bigtable
    • Since multiple apps share the same Bigtable instance, a short period that the Bigtable is unavailable for that Datacenter can cause apps hosted by that datacenter to be unavailable. Note that this is only for Master/slave setup.
    • High replication does not get affected, since it will try a request on another bigtable in another datacenter.

No comments:

Post a Comment