Monday, August 12, 2019

Streaming Data Solutions on AWS with Amazon Kinesis

Introduction


It is my personal finance research. I try to figure out  what motivates me. How do I push myself to learn in a week to prepare most important onsite interviews related to system design and product design. I am so happy to read the first white paper from Amazon.


Technical detail



27 pages white paper is here to download.

I plan to read the paper in 30 minutes.

You need a different set of tools to collect, prepare, and process real-time streaming data than those tools that you have traditionally used for batch analytics. With traditional analytics, you gather the data, load it periodically into a database, and analyze it hours, days, or weeks later. Analyzing real-time data requires a different approach. Instead of running database queries over stored data, stream processing applications process data continuously in real time, even before it is stored. Streaming data can come in at a blistering pace and data volumes can vary up and down at any time. Stream data processing platforms have to be able to handle the speed and variability of incoming data and process it as it arrives, often millions to hundreds of millions of events per hour.

Case study


ABC tolling company

First requirement:

ABC Tolls would like to make some modifications to its system. The first requirement comes from its business analyst team. They have asked for the ability to run reports from their data warehouse with data that is no older than 30 minutes.

Second requirement:

ABC Tolls is also developing a new mobile application for its customers. While developing the application, they decided to create some new features. One feature gives customers the ability to set a spending threshold for their account. If a customer’s cumulative toll bill surpasses this threshold, ABC Tolls wants to send an in-application message to the customer to notify them that the threshold has been breached within 10 minutes of the breach occurring.


To support the feature to send a notification when a spending threshold is breached, the ABC Tolls development team has created a mobile application and an Amazon DynamoDB table.9 The application allows customers to set their threshold, and the table stores this value for each customer. The table is also used to store the cumulative amount spent by each customer, each month. To provide timely notifications, ABC Tolls needs to update the cumulative value in this table in a timely manner, and compare that value with the threshold to determine if a notification should be sent to the customer. Since their toll transactions are already streaming through Kinesis Firehose, they decided to use this streaming data as the source for their aggregation and alerting. And because Kinesis Analytics enabled them to use SQL to aggregate the streaming data, it is an ideal solution to the problem. In this solution, Kinesis Analytics totals the value of the transactions for each customer over a 10-minute time period (window). At the end of the window, it sends the totals to a Kinesis stream. This stream is the event source for an AWS Lambda function. The Lambda function queries the DynamoDB table to retrieve the thresholds and current total spent by each customer represented in the output from Kinesis Analytics. For each customer, the Lambda function updates the current total in DynamoDB and also compares the total with the threshold. If the threshold has been exceeded, it uses the AWS SDK to tell Amazon Simple Notification Service (SNS) to send a notification to the customers.

No comments:

Post a Comment