Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Tuesday, September 7, 2021

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore | My first 50 minutes study

Sept. 7, 2021

Here is the link.

Alfred Fuller, Matt Wilder

For the first three years of App Engine, the health of the datastore was tied to the health of a single data center. Users had low latency and strong consistency, but also transient data unavailability and planned read-only periods. The High Replication Datastore trades small amounts of latency and consistency for significantly higher availability. In this talk we discuss user-facing and operational issues of the original Master/Slave Datastore, and how the High Replication Datastore addresses these issues.

Notes from this website

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

Notes taken from Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

Two types of datastore in App Engine:
- Master/Slave
  - This is the old style. There is one master that handles all the reads/writes and asynchronous writes happen to the slave.
- High Replication
  - This is the new default style. There is no master in this one as writes happen to all nodes synchronously. All act as a collective master.
Datastore Stack:
- The actual datastore is the highest level. This is schema-less storage and has advance query engine.
- This sits atop megastore which is defined by a strict schema and queried using standard SQL.
- Megastore is powered by Bigtable which is a distributed key-value store.
  - Big Table is super fast and highly scalable. However, this design has some tradeoffs. Mainly data can be unavailable for short periods of time
- Finally, the file system that powers all this is GFSv2, a distributed filesystem.
Writes to Datastore
- In a Master/Slave, write happens to Datacenter A and gets asynchronously written to Datacenter B at a later time.
- In High Replication, write happens to a majority of the replicas synchronously. The other replica(s) that don’t get the write synchronously gets an asynchronous write scheduled. Or can be on-demand replication when Read comes in to that datastore and it realizes that it doesn’t have that data.
- Writes to Master/Slave is faster (20ms) compared to High Replication Datastore (45ms).
- Read latency is about the same but read error rate in High Replication is way less (0.001% vs 1%). Thus, resulting in 5m vs 9h downtime.
Planned Maintenance
- Master/Slave
  - Datacenter A becomes readonly, thus app running on app engine will be readonly. In the meantime, the catchup happens to datacenter B.
  - Once that is done, then the switchover will happen.
  - Requires engineer to initiate switchover.
- High Replication
  - Seamless migration. Switching is almost transparent.
  - Memcache flush + 1 min no-caching.
  - This is primarily hosted in a single datacenter. Reason is memcache is quite fast, and doing replication across datacenters is too slow.
Unplanned Maintenance
- Master/Slave experiences immediate switchover. Thus, some data is lost and app is serving stale data. Up to devs to manually flush partial data that was written to Datacenter A to Datacenter B.
- For High Replication, this is the same as a planned maintenance. Designed to withstand multiple datacenter failures.
Some Issues with Bigtable
- Since multiple apps share the same Bigtable instance, a short period that the Bigtable is unavailable for that Datacenter can cause apps hosted by that datacenter to be unavailable. Note that this is only for Master/slave setup.
- High replication does not get affected, since it will try a request on another bigtable in another datacenter.

Julia's coding blog - Practice makes perfect

Tuesday, September 7, 2021

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore | My first 50 minutes study

Google I/O 2011: More 9s Please: Under The Covers of the High Replication Datastore

No comments:

Post a Comment