April 7, 2021
Here is the link.
I like to spend two hours to read the book. I like to highlight the content I need to learn more first, and then continue to read other interesting books.
I like to try various ways to read books, and then I have chance to learn more about cloud solution. I need to learn some basics first. It is much better to read well-written book content before I watch videos on system design topics.
I need to be patient, and also focus on fundamental basics of distributed system.
- pessimistic locking
- eventual consistency
Managing Data Consistency
Every web application and service uses data. This data is frequently required to help users and organizations make business decisions, and therefore it may be important that this data accurately represents the most current information available and that it is consistent. Data consistency implies that all instances of an application are presented with the same set of data values all of the time. This approach is sometimes referred to as strong data consistency.
In the world of relational databases, consistency is often enforced by transactional models that use locks to prevent concurrent application instances from modifying the same data at the same time. In a strongly consistent system, the locks also block concurrent requests to query data, but many relational databases enable an application to relax this rule and provide access to a copy of the data that reflects the state it was in before the update started. Many applications that store data in non-relational databases, flat files, or other structures follow a similar strategy, known as pessimistic locking. An application instance locks data while it is being modified, and then releases the lock when the update is complete.
In a modern cloud application, the data is likely to be partitioned across data stores hosted at different sites, some of which could be dispersed over a wide geography. This can occur for a variety of reasons: to improve scalability by balancing the load across multiple computers, to improve response time by co-locating data close to the users and services that access it, or to improve availability by replicating data across different sites.
Maintaining data consistency across distributed data stores can be a significant challenge. The issue is that strategies such as serialization and locking only work well if all application instances share the same data store, and the application is designed to ensure that the locks are very short-lived. However, if data is partitioned or replicated across different data stores, locking and serializing data access to maintain consistency can become an expensive overhead that impacts the throughput, response time, and scalability of a system. Therefore, most modern distributed applications do not lock the data that they modify, and they take a rather more relaxed approach to consistency, known as eventual consistency.
The following sections provide more information on strong consistency and eventual consistency, and the issues that surround these different approaches to maintaining data consistency in a distributed environment such as the cloud.
- strong consistency model
- inconsistent view of the data
- Argument: The aim of the strong consistency model is to minimize the chance that an application instance might be presented with an inconsistent view of the data
- D
Strong Consistency
In the strong consistency model, all changes are atomic. If a transaction updates multiple data items, the transaction is not allowed to complete until either all of the changes have been made successfully, or (in the event of a failure) they have all been undone. In the time between a transaction starting and completing, other concurrent transactions may not be able access any of the data that has been modified; they will be blocked. If data is being replicated, a transaction that implements strong consistency may not be allowed to complete until every copy of each item that has changed has been successfully updated.
The aim of the strong consistency model is to minimize the chance that an application instance might be presented with an inconsistent view of the data. The cost of implementing this model is the impact it has on the availability, performance, and scalability of the resulting solution. In a distributed environment, if the data stores holding the data affected by a transaction are geographically remote from each other, network latency could adversely impact the performance of such transactions and result in concurrent access to data being blocked for an extended period. If a network failure renders one or more of the data stores inaccessible during a transaction, an application updating data in a system that implements strong consistency may be blocked until every data store becomes accessible again.
Additionally, in a distributed environment such as the cloud, implementing strong consistency is not tolerant of the types of failure that may occur. For example, it may not be possible to roll back a transaction and release the resources that it holds if a component participating in the transaction has stopped responding due to a long-lasting network outage. In this case, it will be necessary to resolve the situation through other means, such as manually reconciling the data.
In a cloud application, you should implement strong consistency only where it is absolutely necessary. For example, if an application updates multiple items that are located within the same data store, the benefits of strong consistency may outweigh the disadvantages because data is likely to be locked only for a very short period. However, if the items to be updated are dispersed across a network, it may be more appropriate to relax the requirement for strong consistency.
In a system that implements strong consistency but also replicates data to remote locations, it may be appropriate to propagate changes to replicas outside the scope of a strongly consistent transaction. Some level of transient inconsistency is almost inevitable while replicas are updated—but the data will eventually become consistent after the synchronization between replicas has completed. For more information, see the Data Replication and Synchronization Guidance.
An alternative approach for maintaining strong consistency across replicated data that is frequently implemented by highly scalable NoSQL databases is to use read and write quorums and versioning. This approach avoids locking data, at the expense of some additional complexity in the processes that read and write data. For more information see the section “Improving Consistency” in Chapter 1, “Data Storage for Modern High-Performance Business Applications” of the guide Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence on MSDN.
- Eventual consistency also impacts data consistency when using caching.
- implementing techniques such as the Cache Aside pattern, can help to reduce the chances of inconsistencies occurring.
Eventual Consistency
Eventual consistency is a rather more pragmatic approach to data consistency. In many cases, strong consistency is not actually required as long all the work performed by a transaction is completed or rolled back at some point, and no updates are lost. In the eventual consistency model, data update operations that span multiple sites can ripple through the various data stores in their own time, without blocking concurrent application instances that access the same data.
One of the drives for eventual consistency is that distributed data stores are subject to the CAP Theorem. This theorem states that a distributed system can implement only two of the three features (Consistency, Availability, and Partition Tolerance) at any one time. In practice, this means that you can either:
- Provide a consistent view of distributed (partitioned) data at the cost of blocking access to that data while any inconsistencies are resolved. This may take an indeterminate time, especially in systems that exhibit a high degree of latency or if a network failure causes loss of connectivity to one or more partitions.
- Provide immediate access to the data at the risk of it being inconsistent across sites. Traditional database management systems focus on providing strong consistency, whereas cloud-based solutions that utilize partitioned data stores are typically motivated by ensuring higher availability, and are therefore more oriented towards eventual consistency.
An application instance may see a view of a data item affected by an operation in the state it is in while the operation is in flight, and this view may be temporarily inconsistent. Depending on the requirements of the system, the developer might need to design applications to detect and handle such inconsistencies, and then take steps to resolve them if necessary.
Eventual consistency also impacts data consistency when using caching. If the data in the remote data store changes, all copies cached by applications will be most likely be out of date. Configuring a cache expiration policy that prevents cached data from becoming too stale, and implementing techniques such as the Cache Aside pattern, can help to reduce the chances of inconsistencies occurring. However, these approaches are unlikely to completely eliminate inconsistencies in cached data, and it is important that applications that use caching as an optimization strategy can handle these inconsistencies.
It is worth bearing in mind that an application may not actually require data to be consistent all of the time. For example, in a typical ecommerce web application that enables a user to browse and purchase goods, any stock levels presented to a user are likely to be static values determined when the details for a stock item are queried. If another concurrent user purchases the same item, the stock level in the system will decrease but this change will probably not need to be reflected in the data displayed to the first user. If the stock level drops to zero and the first user attempts to purchase the item, the system could either alert the user that the item is now out of stock, or place the item on back order and inform the user that the delivery time may be extended.
Considerations for Implementing Eventual Consistency
Eventual consistency is often the preferred model for managing distributed data in a cloud environment, but there are many issues that you must consider if you follow this model. These issues are best summarized by using an example. Figure 1 shows a simple ecommerce application that could benefit from following the eventual consistency approach.
When a customer places an order, the application instance performs the following operations across a collection of heterogeneous data stores held in various locations:
- Update the stock level of the item ordered.
- Record the details of the order.
- Verify payment details for the order.
Although these operations comprise a logical transaction, attempting to implement strong transactional consistency in this scenario is likely to be impractical. Instead, implementing the order process as an eventually consistent**series of steps, where each step in the process is essentially an autonomous operation, is a much more scalable solution. While these steps are progressing, the state of the overall system is inconsistent. For example, after the stock level has been updated but before the details of the order have been recorded the system has temporarily lost some stock. However, when all the steps have been completed, the system returns to a consistent state and all stock items can be accounted for.
Despite the fact that implementing eventual consistency in this example appears to be conceptually quite simple, the developer must ensure that the system does eventually become consistent. In other words, the application is responsible for guaranteeing either that all three steps in the order process complete, or determining the actions to take if any of the steps fail. How you resolve this situation in any given system is inevitably application specific.
For an example showing how you can implement an eventually consistent system spanning different data stores, see Chapter 8, “Building a Polyglot Solution,” in the guide “Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence” on MSDN. The following sections of this guidance also provide some suggestions.
Retrying Failing Steps
In a distributed environment, the inability to complete an operation is often due to some type of temporary error (communication failure is always a possibility.) If such a failure occurs, an application might assume that the situation is transient and simply attempt to repeat the step that failed. Less transient exceptions, such as database or virtual machine failure, may also occur and the remedy might be similar—wait for the system to be recovered and then try the failing operation again. This approach could result in the same step actually being run twice, possibly resulting in multiple updates. It is very difficult to design a solution to prevent this repetition from occurring, but the application should attempt to render such repetition harmless.
One strategy is to design each step in an operation to be idempotent. This means that a step that had previously succeeded can be repeated without actually changing the state of the system. The steps that comprise a business operation are naturally heavily dependent on the business logic of your system, and the way in which you implement them will be heavily influenced by the structure of the data. Defining idempotent steps requires a deep, domain-specific understanding of your system.
Some steps might be naturally idempotent. For example, a step that sets a particular item to a specific value (such as “ZipCode = 11111”) can be repeated many times and the result will always be the same. However, natural idempotency is not always possible. In a system that incorporates services, such as the payment system shown in the ecommerce example, it may be possible to implement some form of artificial idempotency. A common technique is to associate the message sent to the service with a unique identifier. The service can store the identifier for each message it receives locally, and only process a message if the identifier does not match that of a message it received earlier. This technique is known as de-duping (the removal of duplicate messages). This strategy, exemplified by the Idempotent Receiver pattern, depends on the service being able to store message identifiers successfully.
Partitioning Data and Using Idempotent Commands
Multiple instances of an application competing to modify the same data at the same time are another common cause of failure to achieve eventual consistency. If possible, you should design your system to minimize these situations. You should try and partition your system to ensure that concurrent instances of an application attempting to perform the same operations simultaneously do not conflict with each other.
Rather than thinking in terms of simple CRUD (create, retrieve, update, and delete) operations, you can structure your system around atomic commands that perform business tasks in an idempotent style. For more information, see the Command and Query Responsibility Segregation pattern. The commands in a CQRS solution are frequently implemented by following the Event Sourcing pattern. Event sourcing performs operations on data by driving tasks from a sequence of events, each of which is recorded in an append-only store.
No comments:
Post a Comment