Amazon EMR is the industry-leading cloud big data platform that supports a variety of open-source tools for processing big data, such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.
YiYi Media Technology Co., Ltd. is a media company that integrates data analysis tracking, content incubation, and media advertising. The company provides customized one-stop mobile marketing solutions for advertisers, helping them reach their target users more precisely.
(1) The website collects user ad click behavior data through embedded tracking. This data is encrypted using TLS and uploaded to an ELB, which offloads the encryption and distributes the data to backend servers across multiple availability zones for preliminary processing. The backend servers are an EC2 Auto Scaling group that can automatically scale resources up or down based on traffic patterns to save costs.
(2) The data is streamed in real-time via Kinesis Data Streams to downstream applications for further processing, including real-time and batch processing.
(3) A Spark Streaming EMR cluster processes the real-time data and stores the results in Redis, with data visualization through Amazon QuickSight to enable timely ad placement adjustment decisions.
(4) Simultaneously, Kinesis Firehose ingests the source data into S3 for backup storage and future batch processing using Hadoop EMR clusters.