Sunday, August 11, 2019

TAO: The power of the graph

Here is the link.

I plan to read the article from 10:00 AM - 10:110 AM on August 11, 2019.

Even though memcache has "cache" in its name, it's really a general-purpose networked in-memory data-store with a key-value data model.

I need to get familiar with terminologies used in the article

wordload on its data backend
social graph

Users sees
News Feed stories
comments
likes
share for those stories
photos and check-ins from their friends

Best relational database technology
a poor match
supplemented by a large distributed cache that offloads the persistent store.

Bugs
User-visible inconsistencies
site performance issue

product engineer
two data stores
different data models: a large cluster of MySQL servers for storing data persistently in relational tables
an equally large collection of memcache servers for storing and serving flat key-value pairs derived.

Implementation

The TAO service runs across a collection of server clusters geographically distributed and organized logically as a tree. Separate clusters are used for storing objects and associations persistenly, and for caching them in RAM and FLASH memory. This separation allows us to scale different types of clusters independently and to make efficient use of the server hardware.

Caching clusters running TAO servers - what are caching clusters?
In addition to satisfying most read requests from a write-through cache, TAO servers orchestrate the execution of writes and maintain cache consistency among all TAO clusters. We continue to use MySQL to manage persistent storage for TAO objects and associations.

The data set managed by TAO is partitioned into hundreds of thousands of shards. All objects and associations in the same shard are stored persistently in the same MySQL database, and are cached on the same set of servers in each caching cluster. Individual objects and associations can optionally be assigned to specific shards at creating time. Controlling the degree of data collocation proved to be an important optimization technique for reducing communication overhead and avoiding hot spots.

Shards can be migrated or cloned among servers in the same cluster to equalize the load and to smooth out load spikes.

Actionable Items


I like to read the article and record using my Samsung phone. And then I like to share my recording as an attachment.


No comments:

Post a Comment