Life of an SQL Task

By Xianglu

As a newcomer to big data development, I'm curious about the entire execution process of an SQL task. What process does an SQL statement go through during execution? After reviewing relevant documents, I've written this article to share my learning experience and insights with you.

1. Overall Process

The process of creating an SQL task and obtaining the running result involves interactions between multiple systems. Here's an overall flowchart. Let's start with a simple example: counting the prizes distributed in an event.

2. Task Development and Publishing

Create & Edit a Task

First, we need to create and write an SQL task in IDE. Create an offline periodic computing task on the Dataphin R&D page, and write SQL codes.

SQL for counting the prizes in the event:

SELECT  prize_id
        ,COUNT(*) AS prize_send_cnt_1d
FROM    apcdm.dwd_ap_mkt_eqt_send_di
WHERE   dt = '${bizdate}'
AND     prize_id IN ('PZ169328936', 'PZ169298703')
GROUP BY prize_id;

Configure the Scheduling Information

After writing the SQL task, we need to set up its scheduling information, which is considered metadata for the task and is stored in the database. To ensure tasks are scheduled and executed correctly, it's essential to configure this information accurately. The common SQL task configurations are listed as follows:

Basic Information	Task node ID, task name, node type, and O&M engineer.
Scheduling Parameter	Parameters that are used to schedule tasks in Dataphin are automatically replaced with specific values based on the business date, scheduled time, and value format of the parameters. This allows us to dynamically replace the parameters within the task scheduling period. Business date: the day before the scheduling date. Scheduling/Scheduled time: the time when the task instance is scheduled to run.
Scheduling Attribute	Instance generation methods: T+1 next-day generation and immediate generation upon publishing.
	Scheduling types: normal scheduling, pause scheduling, and dry-run.
	Validity Period: date range for automatic scheduling and running.
	Rerun: whether the result will be affected by multiple reruns.
	Scheduling cycle: the regular interval to execute the task. The scheduling cycle can be divided into monthly scheduling, weekly scheduling, daily scheduling, hourly scheduling, and minute scheduling.
	cron expression: the cron expression corresponding to the scheduling cycle.
Scheduling Dependency	The upstream and downstream dependencies between nodes: the downstream task node starts to run only after the upstream task node is successfully run.
Node Context Parameter	Input parameters (receive the output parameter value of the upstream nodes as the input of the current nodes) and output parameters.
Execution Information	Execution engine and scheduling resource group.

Submit and Publish

After writing the SQL task and configuring its scheduling information, we can submit the task and generate a publishing package. Before publishing, we can perform smoke testing in the development environment, which generates an instance with code and scheduling information. If we select an instance generated before yesterday, it meets the scheduling condition, so the task runs immediately. If we select an instance generated yesterday, it waits for the scheduling time to run. After submitting the task and passing smoke testing, we can go to the publishing center, select the task object to be published, and publish it.

3. Instance Conversion / Instance Generation

Now that the task is published online, what happens next?

At 10:00 p.m., the Phoenix scheduling system kicks in. It compiles and generates a batch of executable task instances in advance based on the published task node definitions. Then, it organizes these instances into a DAG for scheduling and executing according to the lineage and time dependencies between tasks. When converting an instance, the system's built-in parsing function parses the cron time expression configured for the task and sets a specific running time for the corresponding task instance.

4. Scheduled Scheduling / Instance Startup

Now that the instance conversion is complete, the DAG of the corresponding task instance is ready. So, how do we wake up the task instance at the right time?

The early instance scheduling system was implemented using the Quartz framework. The system specifies the event trigger time by registering Quartz events and configuring the cron time expression of the trigger. This allows the Quartz framework to trigger the corresponding event at a predetermined time point, starting the corresponding task instance. However, the Quartz framework has a performance bottleneck when dealing with millions of instances. To address this, the new scheduling engine adopts a self-developed solution that maintains status in the database and uses asynchronous queries. Specifically, the execution time of all task instances is maintained in the database, and then the background thread of the scheduling system asynchronously and regularly queries the task instances that meet the time condition, triggering their startup and execution.

5. Scheduling Resource Allocation

After waking up the task instance, we need to allocate resources to the instance and submit it to ODPS for execution. This part of the logic is handled by the execution engine Alisa.

After receiving the task, Alisa sends it to a gateway based on whether the resources in the task resource group are free and whether the resources in the cluster for scheduling task execution are free. A gateway is a process responsible for submitting ODPS tasks and managing task status in the Alisa cluster. Since tasks are submitted to the computing engine for execution, a session connection is maintained from task submission to task running, occupying a slot that refers to the ability to submit tasks. To ensure priority use of resources for important businesses and meet the requirements of multi-user and multi-tenant resources, the gateway execution cluster and resource group mode are designed to control the slots used by tasks. A cluster can contain multiple gateway machines, and the slots of a cluster can be allocated to multiple resource groups for use.

By setting the project space to a specific resource group, the tasks submitted by this project space can use the slots of this resource group. Each resource group controls concurrency by setting a maximum number of slots (max_slot), and the slot usage of the project space can be monitored in real time.

The Alisa execution engine manages slot usage and maintains a real-time heartbeat connection with each gateway machine to allocate slots to tasks. If the scheduling slot is full, it means the job is in a congested state and the task needs to wait for slot allocation. After being allocated to a slot, the task starts submitting on the specified gateway machine, where the gateway starts the odpscmd process and submits the job through odpscmd.

6. Job Operation by ODPS

After submitting the job through odpscmd via the gateway process, the task begins the execution process in the ODPS system.

The ODPS system consists of two layers: the control layer and the computing layer. The submitted jobs first enter the control layer, which includes the request processor Worker, the Scheduler, and the job execution manager Executor. For the instance generated after job submission, the Scheduler is responsible for its scheduling. The Scheduler maintains a list of instances, takes out an instance from the list, splits it into multiple tasks, and puts the runnable tasks in the priority queue TaskPool to queue in the control cluster. One background thread of the Scheduler regularly sorts the priority queue TaskPool, while another background thread regularly queries the resource status of the computing cluster. The Executor polls the Scheduler, and if the Scheduler determines that there are available resources in the control cluster, it sends the top-ranked SQLTask to the Executor.

After receiving the task, the Executor invokes the SQL Parse Planner to lexically and syntactically analyze the SQL statements, generating an abstract syntax tree (AST) as a result. Then, it performs logical analysis to produce an optimized logical execution plan, followed by physical analysis to generate an optimized physical execution plan.

In the physical execution plan, a physical operator DAG is generated based on whether shuffling is required during data processing. In the DAG, each node corresponds to a Fuxi task. The configuration file of Fuxi jobs is generated based on the DAG and the metadata of instances, such as the number of instances and resource specifications of each Fuxi task, and then submitted to the Fuxi Master.

7. Fuxi Computing Layer

Fuxi is a distributed scheduling and execution framework for the big data computing platform, supporting the execution of tens of millions of distributed jobs daily for multiple computing engines, including ODPS and PAI. It processes massive amounts of data at the EB level every day. A Fuxi cluster consists of a Master and an Agent. The Agent, distributed on each compute node, manages the resources of a single node and reports node information to the Master. The Master collects resource usage information from each node and manages the allocation of computing resources.

After the ODPS task is processed, the Fuxi task description file is submitted to the Fuxi Master for processing. The Fuxi Master selects a free node to start the Application Master process of the task. Once started, the Application Master applies to the Fuxi Master for computing resources based on the task's configuration information. All subsequent resource applications for this task are handled by the Application Master.

After receiving resource requests, if the cluster quota is full and resources are tight, the Fuxi Master queues the request in the computing cluster and waits for resource allocation. If resources are available, the Fuxi Master identifies suitable resources (which may only fulfill part of the request) based on conditions such as the quota group, job priority, and preemption status. It then allocates these resources to the Application Master, notifying the process and informing the Fuxi Agent at the resource node about the allocation details. If the request is not fully satisfied, the Fuxi Master continues to allocate resources to the Application Master when additional free resources become available.

After receiving the resource allocation message, the Application Master sends the specific job plan to the corresponding Fuxi Agent. The job plan contains the necessary information to start the specific process, including the executable file name, startup parameters, and resource limits such as CPU and memory.

After receiving the job plan from the Application Master, the Fuxi Agent starts the worker job process. Once the worker starts, it communicates with the Application Master, reads data from distributed storage, and begins executing the computing logic. As computing tasks are executed, workers are continuously pulled up and released. After completing all computing tasks, the workers write the results to the corresponding folders and report to the Application Master that the tasks have been completed. The Application Master then communicates with the Fuxi Master to release the resources.

At this point, the execution of the entire ODPS task is complete, and each component updates the task status in sequence. This SQL task instance is marked as successful and waits for the next wake-up of the scheduling engine.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

Life of an SQL Task

1. Overall Process

2. Task Development and Publishing

Create & Edit a Task

Configure the Scheduling Information

Submit and Publish

3. Instance Conversion / Instance Generation

4. Scheduled Scheduling / Instance Startup

5. Scheduling Resource Allocation

6. Job Operation by ODPS

7. Fuxi Computing Layer

yuaner.wu

You may also like

Comments

yuaner.wu

Related Products

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

ApsaraDB for HBase

Financial Services Solutions