Overview
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop and database jobs. Azkaban resolves the ordering through job dependencies and provides an easy-to-use web interface to track and maintain your workflows.
Azkaban supports ApsaraDB RDS for MySQL and PolarDB as the back end database. This solution deploys Azkaban in multiple executor mode with the high availability version of ApsaraDB RDS MySQL, using only one ECS to host both the Azkaban web server and an Azkaban executor server, and runs a demo workflow task.
Reference Architecture
Steps
Deploy Resources
Use this main.tf file in Terraform to provision ECS, EIP, and RDS MySQL instances from this solution.
The ECS, EIP, and RDS PostgreSQL instances information will be listed after the script execution is completed.
eip_ecs
: The public EIP of the ECS for Azkaban installation hostrds_mysql_url
: The connection URL of the backend RDS MySQL database for Azkabanrds_pg_url_azkaban_demo_database
: The connection URL of the demo RDS PostgreSQL database using Azkabanrds_pg_port_azkaban_demo_database
: The connection Port of the demo RDS PostgreSQL database using Azkaban, by default, it is 1921 for RDS PostgreSQL
Set up Azkaban on ECS with RDS MySQL
1. Log on to ECS via SSH with the default password
N1cetest
.ssh root@ECP_EIP
2. Run the following command to install GCC, JDK 8, Git, MySQL client, Python 3, Python module
psycopg2
and the PostgreSQL client on the ECS:yum install -y gcc-c++*
yum install -y java-1.8.0-openjdk-devel.x86_64
yum install -y git
yum install -y mysql.x86_64
yum install -y python39
yum install -y postgresql-devel
pip3 install psycopg2
cd ~
wget http://mirror.centos.org/centos/8/AppStream/x86_64/os/Packages/compat-openssl10-1.0.2o-3.el8.x86_64.rpm
rpm -i compat-openssl10-1.0.2o-3.el8.x86_64.rpm
wget http://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/181125/cn_zh/1598426198114/adbpg_client_package.el7.x86_64.tar.gz
. tar -xzvf adbpg_client_package.el7.x86_64.tar.gz
3. Run the following command to download and build the Azkaban project:
cd ~
git clone https://github.com/azkaban/azkaban.git
cd ~/azkaban
./gradlew clean
./gradlew build installDist -x test
4. Run the following command to build module:
azkaban-db
.cd ~/azkaban/azkaban-db; ../gradlew build installDist -x test
5. Run the following command to create all the tables needed for Azkaban on RDS MySQL. Please replace
rds_mysql_url
with the provisioned RDS MySQL connection string:cd ~/azkaban/azkaban-db/build/distributions
unzip azkaban-db-*.zip
mysql -h rds_mysql_url -P3306 -uazkaban -pN1cetest azkaban < ~/azkaban/azkaban-db/build/distributions/azkaban-db-*/create-all-sql-*.sql
6. Connect to the RDS MySQL again, and run
show tables
to view the created tables for Azkaban:mysql -hrm-3nssusij8bbe3a9c3.mysql.rds.aliyuncs.com -P3306 -uazkaban -pN1cetest azkaban
7. Run the following command to build module
azkaban-exec-server
, which is the Azkaban Executor Server:cd ~/azkaban/azkaban-exec-server; ../gradlew build installDist -x test
8. Edit the
Please refer to https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html for the property
azkaban.properties
file to modify the properties of executor server accordingly:vim ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server/conf/azkaban.properties
Please refer to https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html for the property
default.timezone.id
. For example, if you are located in China, set the timezone to Asia/Shanghai
.
9. Run the following command to build module
azkaban-web-server
, which is the Azkaban Web Server:cd ~/azkaban/azkaban-web-server; ../gradlew build installDist -x test
10. Edit the
Note: The
azkaban.properties
file to modify the properties of web server accordingly:vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban.properties
Note: The
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
must be replaced with azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
to remove the parameter MinimumFreeMemory
. The web server will check whether the free memory of the executor host will be greater than 6G. If it is less than 6G, the web server will not hand over the task to the executor host for execution. This solution uses entry-level ECS with small memory (less than 6G), so you need to remove this parameter to make the task work.
11. Configure the Azkaban web server user account. Use the username
azkaban
and password azkaban
to log on to the Azkaban web console:vim ~/azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban-users.xml
12. Run the following command to start the Azkaban executor server:
cd ~/azkaban/azkaban-exec-server/build/install/azkaban-exec-server ./bin/start-exec.sh
curl -G "localhost:$(<./executor.port)/executor?action=activate" && echo
13. Run the following command to start the Azkaban web server:
cd ~/azkaban/azkaban-web-server/build/install/azkaban-web-server ./bin/start-web.sh
14. A multi-executor Azkaban instance is ready. Visit
http://ECS_EIP:8081
and log in to Azkaban web console with username azkaban
and password azkaban
.
Download and Prepare Demo Workflow Project Package
1. Get the demo project package.
•
•
•
•
•
•
•
•
•
git clone https://github.com/alibabacloud-howto/opensource_with_apsaradb.git
cd opensource_with_apsaradb/azkaban/project-demo
ls -l
•
_1_prepare_source_db.py
: A Python script to prepare tables and data in the source demo databasenorthwind_source
on RDS PostgreSQL•
_2_prepare_target_db.py
: A Python script to prepare tables and data in the target demo database northwind_target
on RDS PostgreSQL•
_3_data_migration.py
: A Python script to migrate data of products
and orders
in two tables from the source databasenorthwind_source
to target database northwind_target
•
job1_prepare_source_db.job
: Azkaban job to call _1_prepare_source_db.py
•
job2_prepare_target_db.job
: Azkaban job to call _2_prepare_target_db.py
•
job3_data_migration.job
: Azkaban job to call _3_data_migration.py
, which needs job1_prepare_source_db.job
and job2_prepare_target_db.job
to be executed before hand•
northwind_data_source.sql
: DML to insert data to the source demo database northwind_source
•
northwind_data_target.sql
: DML to insert data to the target demo database northwind_target
•
northwind_ddl.sql
: DDL to create tables on both the source demo database northwind_source
and the target demo database northwind_target
2. Edit the Azkaban project files accordingly to connect to the target RDS PostgreSQL demo database. Then run the following command to package all the project files to a zip package:
zip -q -r project_demo_northwind.zip *
Deploy and Run the Demo Azkaban Workflow Task
1. Visit
http://ECS_EIP:8081
and log in to the Azkaban web console with username azkaban
and password azkaban
.
2. Create an Azkaban project.
3. Upload the project zip file packaged beforehand.
4.Click the job entry to go to the workflow page, and then click Schedule / Execute Flow and Execute to run the task. After the task is completed, the job graph on the workflow page will turn green.
Reach Alibaba Cloud experts for support
Contact Us