Wednesday, April 21, 2021

System design: Book reading | Chapter 4 | ZooKeeper: Distributed Process Coordination Book by Benjamin Reed and Flavio Junqueira

For example, let’s consider backup masters; they need to know when the primary has

crashed so that they can fail over. To reduce the time it takes to recover from the primary

crash, we need to poll frequently—say, every 50 ms—just for an example of aggressive

polling. In this case, each backup master generates 20 requests/second. If there are

multiple backup masters, we multiply this frequency by the number of backups to obtain

the total request traffic generated just to poll ZooKeeper for the status of the primary

master. Even if such an amount of traffic is easy for a system like ZooKeeper to deal

with, primary master crashes should be rare, so most of this traffic is unnecessary.

Suppose we therefore reduce the amount of polling traffic to ZooKeeper by increasing

the period between requests for the status of the primary, say to 1 second. The problem

with increasing this period is that it increases the time it takes to recover from a primary

crash.

We can avoid this tuning and polling traffic altogether by having ZooKeeper notify

interested clients of concrete events. The primary mechanism ZooKeeper provides to

deal with changes is watches. With watches, a client registers its request to receive a onetime

notification of a change to a given znode. For example, we can have the primary

master create an ephemeral znode representing the master lock, and the backup masters

register a watch for the existence of the master lock. In the case that the primary crashes,

the master lock is automatically deleted and the backup masters are notified. Once the

backup masters receive their notifications, they can start a new master election by trying 

to create a new ephemeral znode to represent the master lock, as we showed in “Getting

Mastership” on page 51.

Watches and notifications form a general mechanism that enables clients to observe

changes by other clients without having to continually poll ZooKeeper. We have illustrated

with the master example, but the general mechanism is applicable to a wide variety

of situations.


No comments:

Post a Comment