Flink-CDC project address: https://github.com/ververica/flink-cdc-connectors
The demos of this tutorial are based on the Docker environment and will be performed in the Flink SQL CLI, which only involves SQL, without a single line of Java/Scala code and without installing an IDE.
Let's imagine we are running an e-commerce business. The product and order data stored are in MySQL, and the shipment data related to the order are stored in Postgres. We want to enrich the orders using the product and shipment table and load the enriched orders to Elasticsearch in real-time.
All exercises in this tutorial are performed in the Flink SQL CLI. The entire process uses standard SQL syntax without a single line of Java/Scala code or IDE installation.
The overview of the architecture is listed below:
Prepare a Linux or MacOS computer with Docker installed.
The components required in this demo are all managed in containers, so we will use docker-compose
to start them.
Create a docker-compose.yml
file using the following content:
version: '2.1'
services:
postgres:
image: debezium/example-postgres:1.1
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=1234
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
mysql:
image: debezium/example-mysql:1.1
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=123456
- MYSQL_USER=mysqluser
- MYSQL_PASSWORD=mysqlpw
elasticsearch:
image: elastic/elasticsearch:7.6.0
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
kibana:
image: elastic/kibana:7.6.0
ports:
- "5601:5601"
The Docker Compose environment consists of the following containers:
products
and orders
tables will be stored in the database and joined with data in Postgres to enrich the orders.shipments
table will be stored in the database.Run the following command in the directory that contains the docker-compose.yml
file to start all the containers:
docker-compose up –d
This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run docker ps to check whether these containers are running properly. We can also visit http://localhost:5601/
to see if Kibana is running normally.
flink-1.13.2
flink-1.13.2/lib/
:Download links are only available for stable releases.
1. Enter MySQL's container:
docker-compose exec mysql mysql -uroot -p123456
2. Create tables and populate data:
-- MySQL
CREATE DATABASE mydb;
USE mydb;
CREATE TABLE products (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description VARCHAR(512)
);
ALTER TABLE products AUTO_INCREMENT = 101;
INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"spare tire","24 inch spare tire");
CREATE TABLE orders (
order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_date DATETIME NOT NULL,
customer_name VARCHAR(255) NOT NULL,
price DECIMAL(10, 5) NOT NULL,
product_id INTEGER NOT NULL,
order_status BOOLEAN NOT NULL -- Whether order has been placed
) AUTO_INCREMENT = 10001;
INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
(default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
(default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);
1. Enter Postgres's container:
docker-compose exec postgres psql -h localhost -U postgres
2. Create tables and populate data:
-- PG
CREATE TABLE shipments (
shipment_id SERIAL NOT NULL PRIMARY KEY,
order_id SERIAL NOT NULL,
origin VARCHAR(255) NOT NULL,
destination VARCHAR(255) NOT NULL,
is_arrived BOOLEAN NOT NULL
);
ALTER SEQUENCE public.shipments_shipment_id_seq RESTART WITH 1001;
ALTER TABLE public.shipments REPLICA IDENTITY FULL;
INSERT INTO shipments
VALUES (default,10001,'Beijing','Shanghai',false),
(default,10002,'Hangzhou','Shanghai',false),
(default,10003,'Shanghai','Hangzhou',false);
1. Use the following command to change to the Flink directory:
cd flink-1.13.2
2. Use the following command to start a Flink cluster:
./bin/start-cluster.sh
Then, we can visit http://localhost:8081/
to see if Flink is running normally. The web page is shown below:
3. Use the following command to start a Flink SQL CLI:
./bin/sql-client.sh
We should see the welcome screen of the CLI client.
First, enable checkpoints every three seconds:
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Then, create tables that capture the change data from the corresponding database tables:
-- Flink SQL
Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products'
);
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders'
);
Flink SQL> CREATE TABLE shipments (
shipment_id INT,
order_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (shipment_id) NOT ENFORCED
) WITH (
'connector' = 'postgres-cdc',
'hostname' = 'localhost',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'postgres',
'schema-name' = 'public',
'table-name' = 'shipments'
);
Finally, create a enriched_orders
table to load data to Elasticsearch:
-- Flink SQL
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
shipment_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders'
);
Use Flink SQL to join the order
table with the products
and shipments
table to enrich orders and write to Elasticsearch:
-- Flink SQL
Flink SQL> INSERT INTO enriched_orders
SELECT o.*, p.name, p.description, s.shipment_id, s.origin, s.destination, s.is_arrived
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
LEFT JOIN shipments AS s ON o.order_id = s.order_id;
Now, the enriched orders should be shown in Kibana. Visit http://localhost:5601/app/kibana#/management/kibana/index_pattern
to create an index pattern enriched_orders
:
Visit http://localhost:5601/app/kibana#/discover
to find the enriched orders:
Next, make changes in the databases, and the enriched orders shown in Kibana will be updated after each step in real-time.
1. Insert a new order in MySQL:
--MySQL
INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);
2. Insert a shipment in Postgres:
--PG
INSERT INTO shipments
VALUES (default,10004,'Shanghai','Beijing',false);
3. Update the order status in MySQL:
--MySQL
UPDATE orders SET order_status = true WHERE order_id = 10004;
4. Update the shipment status in Postgres:
--PG
UPDATE shipments SET is_arrived = true WHERE shipment_id = 1004;
5. Delete the order in MySQL:
--MySQL
DELETE FROM orders WHERE order_id = 10004;
The changes of enriched orders in Kibana are listed below:
After finishing the tutorial, run the following command to stop all the containers in the directory of docker-compose.yml
:
docker-compose down
Run the following command to stop the Flink cluster in the directory of Flink flink-1.13.2
:
./bin/stop-cluster.sh
This article explains how to use Flink CDC to build Streaming ETL with a simple business scenario. I hope this article can help readers get started with Flink CDC and meet your business needs.
150 posts | 43 followers
FollowApache Flink Community China - May 13, 2021
Apache Flink Community China - September 26, 2021
Alibaba Cloud Indonesia - March 5, 2024
Alibaba Cloud Indonesia - March 24, 2023
Alibaba Cloud Indonesia - March 23, 2023
Apache Flink Community China - February 19, 2021
150 posts | 43 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Apache Flink Community
5444248861672821 April 2, 2022 at 1:13 am
Really Nice post! I have a question regarding the elasticsearch sql connector part, it seems like they don't have SSL options (like ca.crt file path...) in current connector, does anybody have any idea how to connect to ES as a sink with ssl?