Saturday, May 1, 2021

System design: Amazon click stream analysis | My first 30 minutes study

May 1, 2021

Here is the link. 

 This Quick Start builds a clickstream analytics solution on Amazon Web Services (AWS) in about 30 minutes. It integrates AWS services such as Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon Elasticsearch Service (Amazon ES), Amazon Redshift, and Amazon QuickSight. The clickstream analytics solution provides:

  • Streaming data ingestion, which can process millions of website clicks (clickstream data) a day from global websites.
  • Near real-time visualizations of web usage metrics such as events per hour, visitor count, and referrers.
  • Ability to build a recommendation engine with Amazon Redshift application programming interfaces (APIs).
  • Ability to publish your website clickstream data to Amazon S3, Amazon Redshift, and Amazon ES.
  • Analysis and visualizations of your clickstream data by using Kibana (an open-source tool that's included with Amazon ES) and Amazon QuickSight.
 
This Quick Start is for users who want to get started with AWS-native components for clickstream analytics on AWS. Once this foundational layer is in place, you can use it to ingest, analyze, and generate business insights from your websites’ clickstream data.

What you'll build 

Use this Quick Start to automatically set up the following environment on AWS:

  • A highly available architecture that spans two Availability Zones.*
  • A virtual private cloud (VPC) configured with public and private subnets according to AWS best practices, to provide you with your own virtual network on AWS.*
  • In the public subnets:
    • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*
    • A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon Elastic Compute Cloud (Amazon EC2) instances in public and private subnets.*
    • A publicly accessible Amazon Redshift cluster for data aggregation, analysis, transformation, and creation of new clickstream datasets.
  • In the private subnets, two web server instances running Apache in an Auto Scaling group with Amazon Kinesis Agent installed.
  • AWS Identity and Access Management (IAM) security groups (stateful firewall) at the EC2 instance level.
  • An Application Load Balancer (ALB) to balance traffic between the two web servers. A separate target group is created for SSH access to the backend instances via the ALB, as an alternative to using the bastion host.
  • Publicly accessible Amazon ES with Elasticsearch version 6.3 (default) for indexing and searching functionality on the clickstream data.
  • Three Kinesis Data Firehose delivery streams to push clickstream data to the destinations: Amazon S3, Amazon Redshift, and Amazon ES.
  • An Amazon S3 bucket for the Kinesis Data Firehose delivery stream.
  • Integration with other Amazon services such as Amazon S3, Amazon Kinesis Data Firehose, Amazon ES with Kibana, and Amazon QuickSight.
  • IAM roles to provide permissions to access AWS resources. Examples include permitting Amazon ES to access VPC resources, and allowing Amazon Kinesis Data Firehose to access Amazon S3, Amazon Redshift, and Amazon ES.
  • Amazon Simple Notification Service (Amazon SNS) to notify you about automatic scaling operations and rollback of AWS CloudFormation stack creation.



No comments:

Post a Comment