Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Saturday, May 1, 2021

System design: Amazon Athena | My first 20 minutes study

May 1, 2021

Here is the link.

Movable Ink uses AWS to query seven years’ worth of historical data and get results in moments, with the flexibility to explore data for deeper insights. Movable Ink provides real-time personalization of marketing emails based on a wide range of user, device, and contextual data, driving higher response rates and better customer experiences. The company uses the Amazon Athena serverless query service to analyze data stored in Amazon S3, gaining insights to improve results for customers’ marketing campaigns.

Using Data to Drive Email Marketing That Works Movable Ink’s Intelligent Content Platform supports real-time personalization of email campaigns using up-to-date information from websites, social-media platforms, and APIs, as well as contextual data about device type, weather, recent user activity, and more. Additionally, it enables customers to analyze data about their users to make better marketing decisions. This process incorporates large and unpredictable amounts of data from a wide variety of sources, making the scale, elasticity, and connected nature of cloud services a logical fit. To meet these challenges, Movable Ink migrated its entire production environment to Amazon Web Services (AWS) in 2015, taking advantage of multiple regions and availability zones to provide redundancy, resilience, and scalability. In addition to the data and content used in personalized email messages, Movable Ink captures data on user behavior after users receive those messages, such as whether they opened the email, what items they clicked on, and what they browsed and purchased on websites as a result.

This data is used by Movable Ink in its client-billing systems, in reporting results to clients, and as the basis for building new services such as recommendation engines. Movable Ink uses Amazon Elastic MapReduce (Amazon EMR) clusters to capture data about user actions and push it to Amazon Simple Storage Service (Amazon S3). Movable Ink has been collecting data on user actions since 2011, and this database grows by up to 100 GB per day. To reduce time to insight, optimize costs, and increase flexibility for its analytics, the company recently adopted the serverless Amazon Athena query service. Amazon Athena enables interactive querying of large-scale data sets in Amazon S3 using standard SQL. It eliminates the need for complex extract, transform, and load (ETL) jobs to prepare data for analysis, and delivers most results within seconds. Transforming Discovery with Serverless Querying Before adopting Amazon Athena, Movable Ink was using Apache Hive for querying user-activity data. However, this solution lacked the necessary performance and added cost and management complexity. Movable Ink had to keep an Amazon EMR cluster running to load data into local databases when queries were performed. Since the company began using Amazon Athena, it has realized both cost savings and improved performance for analytics related to user actions. “One of the big attractions of Amazon Athena is that it’s serverless and purely consumption-based,” says Matt Chesler, director of DevOps at Movable Ink. “We only pay when we’re actually querying the data, and we don’t have to keep a cluster running all the time. Using Amazon Athena, we’re able to query seven years’ worth of data—adding up to hundreds of terabytes— get results at least 50 percent faster, and save nearly $15,000 per month.” Better Performance at Lower Cost Movable Ink has optimized its approach to Athena to achieve the best cost-to-performance ratio. “For data we use often, we have invested in optimizations such as repartitioning and changing data from row based to columnar to further reduce costs.” Movable Ink has also built a microservice that caches data so users running similar queries do not generate a call to Amazon Athena with each request. In addition, Movable Ink has taken advantage of API access built into Amazon Athena to provide programmatic access to data. “Previously, we would run a query against Hive and load it into a local database instance, which was a fragile and slow process,” says Chesler. “With Amazon Athena API access, we can run the query directly against Amazon S3. Besides avoiding the need to create and manage local database instances, the big benefit of this is that our users can modify the query on the fly to explore data and get deeper insights.”

Benefits

Improved performance by 50% compared to persistent cluster solution
Reduced costs by $15,000 per month
Provided analysts with the flexibility to adapt queries on the fly for richer insights
Adopted pay-as-you-go querying to optimize costs
Eliminated the need to manage persistent database instances

AWS Services Used

Amazon Athena
Amazon Elastic MapReduce
Amazon S3

Julia's coding blog - Practice makes perfect

Saturday, May 1, 2021

System design: Amazon Athena | My first 20 minutes study

No comments:

Post a Comment