Saturday, May 1, 2021

System design: Amazon Athena | My first 20 minutes study

 May 1, 2021

Here is the link. 

Movable Ink uses AWS to query seven years’ worth of historical data and get results in moments, with the flexibility to explore data for deeper insights. Movable Ink provides real-time personalization of marketing emails based on a wide range of user, device, and contextual data, driving higher response rates and better customer experiences. The company uses the Amazon Athena serverless query service to analyze data stored in Amazon S3, gaining insights to improve results for customers’ marketing campaigns.

Using Data to Drive Email Marketing That Works Movable Ink’s Intelligent Content Platform supports real-time personalization of email campaigns using up-to-date information from websites, social-media platforms, and APIs, as well as contextual data about device type, weather, recent user activity, and more. Additionally, it enables customers to analyze data about their users to make better marketing decisions. This process incorporates large and unpredictable amounts of data from a wide variety of sources, making the scale, elasticity, and connected nature of cloud services a logical fit. To meet these challenges, Movable Ink migrated its entire production environment to Amazon Web Services (AWS) in 2015, taking advantage of multiple regions and availability zones to provide redundancy, resilience, and scalability. In addition to the data and content used in personalized email messages, Movable Ink captures data on user behavior after users receive those messages, such as whether they opened the email, what items they clicked on, and what they browsed and purchased on websites as a result

This data is used by Movable Ink in its client-billing systems, in reporting results to clients, and as the basis for building new services such as recommendation engines. Movable Ink uses Amazon Elastic MapReduce (Amazon EMR) clusters to capture data about user actions and push it to Amazon Simple Storage Service (Amazon S3). Movable Ink has been collecting data on user actions since 2011, and this database grows by up to 100 GB per day. To reduce time to insight, optimize costs, and increase flexibility for its analytics, the company recently adopted the serverless Amazon Athena query service. Amazon Athena enables interactive querying of large-scale data sets in Amazon S3 using standard SQL. It eliminates the need for complex extract, transform, and load (ETL) jobs to prepare data for analysis, and delivers most results within seconds. Transforming Discovery with Serverless Querying Before adopting Amazon Athena, Movable Ink was using Apache Hive for querying user-activity data. However, this solution lacked the necessary performance and added cost and management complexity. Movable Ink had to keep an Amazon EMR cluster running to load data into local databases when queries were performed. Since the company began using Amazon Athena, it has realized both cost savings and improved performance for analytics related to user actions. “One of the big attractions of Amazon Athena is that it’s serverless and purely consumption-based,” says Matt Chesler, director of DevOps at Movable Ink. “We only pay when we’re actually querying the data, and we don’t have to keep a cluster running all the time. Using Amazon Athena, we’re able to query seven years’ worth of data—adding up to hundreds of terabytes— get results at least 50 percent faster, and save nearly $15,000 per month.” Better Performance at Lower Cost Movable Ink has optimized its approach to Athena to achieve the best cost-to-performance ratio. “For data we use often, we have invested in optimizations such as repartitioning and changing data from row based to columnar to further reduce costs.” Movable Ink has also built a microservice that caches data so users running similar queries do not generate a call to Amazon Athena with each request. In addition, Movable Ink has taken advantage of API access built into Amazon Athena to provide programmatic access to data. “Previously, we would run a query against Hive and load it into a local database instance, which was a fragile and slow process,” says Chesler. “With Amazon Athena API access, we can run the query directly against Amazon S3. Besides avoiding the need to create and manage local database instances, the big benefit of this is that our users can modify the query on the fly to explore data and get deeper insights.”

Benefits 

  • Improved performance by 50% compared to persistent cluster solution 
  • Reduced costs by $15,000 per month 
  • Provided analysts with the flexibility to adapt queries on the fly for richer insights 
  • Adopted pay-as-you-go querying to optimize costs 
  • Eliminated the need to manage persistent database instances 

AWS Services Used 
  • Amazon Athena 
  • Amazon Elastic MapReduce 
  • Amazon S3 

No comments:

Post a Comment