May 19, 2021
I like to read the article and also highlight things I like to learn next 20 minutes.
Here are highlights:
- The DynamoDB Lock Client is a Java Library widely used inside Amazon, which enables you to solve distributed computing problems like leader election and distributed locking with client-only code and a DynamoDB table
- client-only code and a DynamoDB table
AWS Database Blog
Building Distributed Locks with the DynamoDB Lock Client
Alexander Patrikalakis and Sasha Slutsker are senior software development engineers with Amazon.
TL;DR
At Amazon, teams all across the company build upon AWS services just like AWS’s external customers. Here we have a contribution from Alex and Sasha, who work on teams supporting Amazon.com about a new client library that they developed to make their applications better – hopefully you’ll find it useful too. The DynamoDB Lock Client is a Java Library widely used inside Amazon, which enables you to solve distributed computing problems like leader election and distributed locking with client-only code and a DynamoDB table.
DynamoDB supports mechanisms, like conditional writes, that are necessary for distributed locks. However, the AWS SDK doesn’t include the logic needed to actually implement distributed locks. The DynamoDB Lock Client wraps up the necessary client logic for distributed advisory locks in an easy-to-use client interface. The protocol in the lock client Java library is widely applicable, and we encourage you to apply it in other languages.
Background
Distributed locking can be a complicated challenge to solve, because you need to atomically ensure only one actor is modifying a stateful resource at any given time. For example, perhaps you have a database that serves as the central source of truth for your application. To ensure data is always accurate, you need to ensure that only one application server in your application server fleet is modifying a particular record in your database.
In AWS, the many moving parts of Amazon EC2 also need to agree on what their configuration should be so that they can survive many different failure modes. A primitive that enables this consensus must also be highly available, consistent, and partition-tolerant. Ideally, this primitive is unaffected by clock skew across large fleets of machines.
Systems like Raft and Paxos were designed to address these challenges, but they are notoriously difficult to implement and operate. What if you just want to easily implement a lock for a simple distributed application you’re writing? Amazon EC2 does just this with DynamoDB. We’re happy to present the DynamoDB Lock Client, a Java client library that uses locks to coordinate cluster configuration independently of system time, with baked-in DynamoDB primitives like consistent reads and conditional writes. We baked so you can have your cake too…
A practical example
Let’s suppose that you are a retail bank that wants to ensure that at most one customer service representative changes customer details at a time. From a fraud prevention perspective and from a consistency perspective, temporarily locking customer records during an update makes a lot of sense. In this case, each customer has a unique identifier. The bank stores customer information like account balances, transactions, addresses, contact information, and relationship history in many different tables. To make their system scale horizontally, this bank doesn’t embed foreign key relationships in any of their tables. Each table is isolated from all others. The bank uses their application layer to coordinate changes to each customer in a distributed fashion.
The tables are independent, so you can’t just wrap the changes you need in a relational transaction. Instead, you can lock the customer’s unique identifier at a high level. Alternatively, you can lock the unique identifiers of customer details (addresses and telephone numbers) at a finer-grained level. You’d do so with a locking API action for a certain duration in your application before making any changes.
The DynamoDB Lock Client implements a protocol allowing similar applications to take advisory locks on any part of your problem domain, big or small. This protocol ensures your players “stay in possession of the ball” for a certain period of time.
The locking protocol
For a new lock, the lock client stores a lock item in the lock table. With the item, it stores the host name of the owner, the lease duration in milliseconds, a UUID unique to the host, and the host system clock time when the lock was initially created. The lock table looks something like this in the AWS Management Console.
The sequence and architecture diagram following shows the locking protocol. The protocol architecture includes an EC2 Auto Scaling group. The group spans two subnets in two Availability Zones of a single AWS Region and two EC2 instances with a Java application running in each subnet. Each of the instances is trying to acquire a lock on Moe.
- Host A acquires a lock on Moe by writing an item to the lock table on the condition that no item keyed at “Moe” exists yet. Host A acquires the lock with a revision version number (RVN) of UUID1.
- Host B tries to get a lock on Moe with a RVN UUID2.
- Host B checks to see if a lock already exists with a GetItem call.
- In this case, host B finds that host A holds a lock on Moe with a record version number (RVN) of UUID1. The same application runs on hosts A and B. That being so, host B expects host A to heartbeat and renew the lock on Moe in less than 10 seconds, if host A intends to keep the lock on Moe. Host A heartbeats once, and uses a conditional update on the lock keyed at Moe to update the RVN of the lock to UUID3.
- Host B checks 10 seconds after the first AcquireLock call to see if the RVN in A’s lock on Moe changed with a conditional UpdateItem call and a RVN of UUID4.
- Host A successfully updates the lock. Thus, host B finds the new RVN equal to UUID3 and waited 10 more seconds. Host A died after the first heartbeat, so it never changes the RVN past UUID3. When host B calls tries to acquire a lock on Moe for the third time, it finds that the RVN was still UUID3, the same RVN retrieved on the second lock attempt.
- In this case, hosts A and B run the same application. Because host B expects host A to heartbeat if host A is healthy and intends to keep the lock, host B considers the lock on Moe expired. Host B’s conditional update to acquire the lock on Moe succeeds, and your application makes progress!
Sample code
The following is a code sample that attempts to take a lock on Moe with 10-second leases and heart beating, as described in the preceding example.
Before running the code listing preceding, create a dependency on the DynamoDB Lock Client in your POM as shown following.
Features
The DynamoDB Lock Client uses the DynamoDB UpdateItem API to heartbeat and extend locks each host owns. Additionally, the lock client uses client-side TTL to expire locks. The lock client uses a recent version of the AWS SDK for Java. The locking protocol employs conditional updates extensively and throughout the lifecycle of a lock (creation, renewal by heartbeat, and expiration by deletion). The lock client is an excellent example of how to use condition and update expressions when updating items in a DynamoDB table. Finally, the lock client POM demonstrates best practices that developers can use to integration-test their applications with DynamoDB Local and the Maven failsafe plugin.
Scaling your locks
Application developers using the lock client can scale their locking use in a few dimensions. First, by varying the granularity of locks taken, you can reduce contention to the smallest possible surface area necessary for a logical operation to succeed. Second, you can configure the heartbeat and TTL intervals of the entity locks based on the read and write latency of the persistence layer where you store your business entities. By doing so, you can reduce contention by taking locks only for the amount of time you need.
A lock table schema with a workload equally distributed among the partition and sort key space helps you use all the throughput that you allocate to the lock table. Use composite partition keys when possible. You can usually use them when you don’t need to read a range of locks associated with the same primary entity—for example, when one customer has multiple orders on an e-commerce platform. Finally, you can scale the backing DynamoDB lock table to arbitrary levels of reads and writes per second (provisioned capacity).
Future directions
One way we might expand on this work is to implement client support for other SDKs and platforms. JavaScript, iOS, and Android are all likely candidates. When AWS SDK for Java 2.0 is generally available, you can update the lock library to take advantage of the new Java SDK. Another thing we might add is support for server-side TTL, a recently released DynamoDB feature. Finally, we might later embed this library in other applications that require coordination. As always, pull requests are welcome.
Summary
We are excited to share the DynamoDB Lock Client with you. We look forward to helping you use it to iterate on your own distributed platforms. Happy tinkering!
No comments:
Post a Comment