This topic describes how to apply Realtime Compute for Apache Flink to different enterprise departments and technologies for real-time big data computing.
Background information
As a stream compute engine, Flink can be widely used for real-time data computing, including the online service logs of Elastic Compute Service (ECS) instances and sensor data in Internet of things (IoT) scenarios. You can use Flink to subscribe to binary log updates from relational databases such as ApsaraDB RDS and PolarDB, and use services such as DataHub, Simple Log Service, and Kafka to collect real-time data to a real-time service for data analytics and processing. Then, the data can be written to downstream data services, such as MaxCompute, Hologres, Machine Learning Platform for AI, and Elasticsearch, to improve data utilization and meet your business requirements.
Departments
From a departmental perspective, Realtime Compute for Apache Flink can provide the following capabilities:
Business department: real-time fraud detection, real-time recommendation, and real-time indexing of search engines.
Data department: real-time data warehousing, real-time reports, and real-time dashboards.
O&M department: real-time monitoring, real-time exception detection and alerting, and end-to-end debugging.
Technologies
From a technical perspective, Realtime Compute for Apache Flink is suitable for the following scenarios:
Real-time ETL and data streams
Data is delivered from Point A to Point B by using the real-time extract, transform, and load (ETL) process and data streams. During delivery, data cleansing and integration may be required. Examples include real-time indexing in the search system and ETL operations for real-time data warehousing.
Real-time data analytics
Data analytics is the process of extracting and integrating information from raw data to achieve your business objectives. For example, you can view the top 10 products sold per day, the average turn-around time of a warehouse, the average document click rate, and the reachability of push notifications. Real-time data analytics allows you to view real-time reports or dashboards.
Event-driven applications
An event-driven application is a system that processes or reacts to subscription events. Event-driven applications depend on internal states and respond to suspicious events detected during fraud detection or in the risk management system or O&M exception detection system. If the behavior of a user triggers a risk management rule, the system captures the event and analyzes the current and previous user behavior to determine whether to implement risk management precautions.
Risk management system
Realtime Compute for Apache Flink can handle complex stream computing tasks and batch processing tasks. Its powerful APIs let you perform complex mathematical calculations and run complex event processing rules. This helps enterprises analyze data in real time and improves their risk management capabilities. For example, Realtime Compute for Apache Flink can identify user behavior in apps and identify irregularities in IoT data streams.
The preceding technology flowcharts are obtained from the Apache Flink.