Tuesday, May 11, 2021

System design: CQRS pattern | My 60 minutes paper reading | CQRS documents by Greg Young

May 11, 2021

Here is the article. 

Page  23/ 56

By applying CQRS the concepts of Reads and Writes have been separated. It really begs the question of whether the two should exist reading the same data model or perhaps they can be treated as if they were two integrated systems, Figure 5 illustrates this concept. There are many well known integration patterns between multiple data sources in order to maintain synchronisity either in a consistent or eventually consistent fashion. The two distinct data sources allow the data models to be optimized to the task at hand. As an example the Read side can be modeled in 1NF and the transactional model could be modeled in 3nf. The choice of integration model though is very important as translation and synchronization between models can be become a very expensive undertaking. The model that is best suited is the introduction of events, events are a well known integration pattern and offer the best mechanism for model synchronization.

Events as a Storage Mechanism 

Most systems in production today rely on the storing of current state in order to process transactions. In fact it is rare to meet a developer who has worked on a system that maintains current state in any other way. It has not always been like this. Before the general acceptance of the RDBMS as the center of the architecture many systems did not store current state. This was especially true in high performance, mission critical, and/or highly secure systems. In fact if we look at the inner workings of a RDBMS we will find that most RDBMSs themselves not actually work by managing current state! The goal of this section is to introduce the concept of event sourcing, to show the benefits, to show how a simple event storage system can be created utilizing a Relational Database for underlying data management.

Page 37/ 56

Case study: Purchase items previously in shopping cart 

There are however other types of queries that are becoming more and more popular in business, they focus on the how. Examples can commonly be seen in the buzzword “Business Intelligence”. Perhaps there is a correlation between people having done an action and their likelyhood of purchasing some product? These types of questions generally focus on how something came into being as opposed to what it came out to be. 

It is best to go through an example. There is a development team at a large online retailer. In an iteration planning meeting a domain expert comes up with an idea. He believes that there is a correlation between people having added then removed an item from their cart and their likelihood of responding to suggestions of that product by purchasing it at a later point. The feature is added to the following iteration. 

The first hypothetical team is utilizing a stereotypical current state based mechanism for storing state. They plan that in this iteration they will add tracking of items via a fact table that are removed from carts. They plan for the next iteration that they will then build a report. The business will receive after the second iteration a report that can show them information back to the previous iteration when the team released the functionality that began tracking items being removed from carts.  

This is a very stereotypical process, at some organizations the report and the tracking may be released simultaneously but this is a relatively small detail in the handling. From a business perspective the domain experts are happy, they made a request of the team and the team was able to quickly fulfill the request, new functionality has been added in a quick and relatively painless way

The second team will however have quite a different result. The second team has been storing events; they represent their current state by building up off of a series of events. They just like the first team go through and add tracking of items removed from carts via a fact table but they also run this handler from the beginning of the event log to back populate all of the data from the time that the business started. They release the report in the same iteration and the report has data that dates back for years

The second team can do this because they have managed to store what the system actually did as opposed to what the current state of data is. It is possible to go back and look and interpret the old data in new and interesting ways. It was never considered to track what items were removed from carts or perhaps the number of times a user removes and items from their cart was considered important. These are both examples of new and interesting ways of looking at data.

Argument or statement? It is true to learn the following claim. 

As the events represent every action the system has undertaken any possible model describing the system can be built from the events

Page 41 - 58

Building an Event Storage 

In “Events as a Storage Mechanism” the concept of rebuilding state from a series of events was looked at from a conceptual viewpoint. This chapter will focus on the implementation of an actual Event Storage and some of the issues that come up in producing an implementation. 

The implementation discussed in this chapter is not intended to be a production quality Event Storage, more so it is provided as a discussion point around how to build an Event Storage. The implementation here although not highly performant could meet the needs of a large percentage of applications that are built today. 

For the explanatory implementation it is easiest to build the Event Storage in an existing technology such as a RDBMS. This will alleviate many of the technical issues that can arise that are out of the scope of a basic discussion on how to build an event storage such as transaction commit models or data locality for read performance. 

Structure 

A basic Event Storage can be represented in a Relational Database utilizing only two tables. 

  • Column Name  Column Type 
  • AggregateId      Guid 
  • Data                   Blob 
  • Version              Int 

Figure 19 Table Layout for Events Table 

This table represents the actual Event Log. There will be one entry per event in this table. The event itself is stored in the [Data] column. The event is stored using some form of serialization, for the rest of this discussion the mechanism will assumed to be built in serialization although the use of the memento pattern can be highly advantageous. 

The table is shown with the minimum amount of information possible, most organizations would want to add a few columns such as the time that the change was made or context information associated with the change. Examples of context information might include the user that initiated the change, the ip address they sourced the change from, or their level of permission when they sourced the change. 

A version number is also stored with each event in the Events Table. This can generally be thought of as an increasing integer for most cases. Each event that is saved has an incremented version number. The version number is unique and sequential only within the context of a given aggregate. This is because Aggregate Root boundaries are consistency boundaries. 

The [AggregateId] column is a foreign key that should be indexed; it points to the next table which is the Aggregates table.


 

 

No comments:

Post a Comment