All Products
Search
Document Center

Realtime Compute for Apache Flink:Manage workflows

Last Updated:Jan 16, 2026

A workflow is a visual directed acyclic graph (DAG) that you can create by dragging and connecting task nodes. To run tasks at a specific time, you must also define the scheduling nodes and policies for the workflow. This topic describes how to create and run a workflow.

Limits

  • You can create a workflow to schedule only deployed batch jobs.

  • The task orchestration feature is in public preview. A Service-Level Agreement (SLA) is not guaranteed during the public preview. For more information, see Realtime Compute for Apache Flink Service Level Agreement (SLA). If you have questions about this feature, you can submit a ticket.

  • The task orchestration feature is supported only in the China (Shanghai), China (Hangzhou), China (Beijing), China (Shenzhen), China (Zhangjiakou), and Singapore regions.

Create a workflow

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the navigation pane on the left, choose Operation Center > Task Orchestration.

  4. Click Create Workflow and enter the required information.

    Parameter

    Description

    Workflow name

    The workflow name must be unique within the current project.

    Variable configuration

    Supports data computation based on preset values.

    • Variable name: A custom variable name, such as ${date}.

    • Variable value: A static date, time format, or expression.

    The following system time variables are supported:

    • Variable name system.biz.date and value ${system.biz.date}: The day before the scheduled time of a daily scheduling instance. The format is yyyyMMdd.

    • Variable name system.biz.curdate and value ${system.biz.curdate}: The scheduled date of a daily scheduling instance. The format is yyyyMMdd.

    • Variable name system.datetime and value ${system.datetime}: The scheduled time of a daily scheduling instance. The format is yyyyMMddHHmmss.

    Note
    • This configuration is not required if you create a workflow for a materialized table.

    • The parameters apply to all jobs associated with the nodes. Workflow-level configurations have a higher priority than job-level configurations.

    Scheduling type

    Two scheduling types are supported:

    • Manual Trigger: Manually run the workflow by clicking Run. This is suitable for temporary tests or immediate processing.

    • Recurring Schedule: The workflow is triggered based on scheduler rules. You can run it at regular intervals, such as by the minute, hour, or day.

    Important

    You must select Recurring Schedule for workflows that create materialized table nodes.

    Scheduling cycle

    This parameter is required only for the Recurring Schedule type. You can use cron expressions to set complex rules. Examples:

    • 0 0 */4 ? * *: Runs every 4 hours.

    • 0 0 2 ? * *: Runs at 02:00 every day.

    • 0 0 5,17 ? * MON-FRI: Runs at 05:00 and 17:00 from Monday to Friday.

    For more information about how to write cron expressions, see Rules for writing cron expressions.

    Scheduling start time

    The time when the schedule takes effect. This parameter is required only for the Recurring Schedule type.

    Important
    • After you create a recurring workflow, you must enable its Scheduling State to run it at the effective time.

    • When you set the start time, make sure to set it to a future time. Otherwise, the task may result in a dry-run or failure because of a missed scheduling window.

    Failure retries

    The number of times to retry a failed node in the workflow. By default, a failed node is not retried.

    Failure notification

    The default email address for notifications when a workflow node fails.

    Note

    You can configure alerts through Cloud Monitor to be sent by DingTalk, text message, and other methods. For more information, see Configure monitoring and alerting.

    Resource queue

    The deployment target for the workflow. For more information, see Manage resource queues. This setting applies to all nodes by default. No extra configuration is needed for the nodes.

    Note

    This configuration does not change the deployment target of corresponding deployed batch jobs.

    Tags

    You can set a tag name and tag value for the workflow.

  5. Click Create.

    You are redirected to the workflow node editor page.

  6. Configure the initial node of the workflow.

    By default, the workflow node editor page contains an initial node. Click the initial node, configure the parameters in the Edit Node panel, and then click Save.

    Job

    Parameter

    Description

    Job

    You can only select deployed batch jobs from the current project. Fuzzy search is supported.

    Node name

    The name of the node in the current workflow.

    Upstream nodes

    The upstream dependencies of the current node. You can only select other nodes in the current workflow.

    Note

    The initial node has no upstream dependencies and does not support selecting upstream nodes.

    Failure retries

    The number of times to retry a failed node. By default, this is the same as the workflow's retry count. If set, the node-level setting has a higher priority.

    Status subscription

    Configure the node status subscription policy and notification email. Subscriptions for Start and Fail statuses are supported.

    Timeout

    The timeout duration for node execution. If the running time exceeds this duration, the node is considered to have failed.

    Resource queue

    The deployment target for the node. For more information, see Manage resource queues. If not specified, the resource queue configured for the workflow is used by default.

    Note

    This configuration does not change the deployment target of corresponding deployed batch jobs.

    Tags

    You can set a tag name and tag value for the workflow node.

    Materialized table

    Parameter

    Description

    Materialized table

    You can only select partitioned tables that are created in Ververica Runtime (VVR) 11.0 or later and have a stream refresh mode.

    Node name

    The name of the node in the current workflow.

    Time partition

    • Partition field: The time partition field of the materialized table.

    • Partition format: For example, yyyyMMdd.

    Resource configuration

    You can customize the amount of resources used for scheduled backfills. For concurrency, you can check Auto-infer to automatically adapt to a suitable concurrency.

    Upstream nodes

    The upstream dependencies of the current node. You can only select other nodes in the current workflow.

    Note
    • The initial node has no upstream dependencies and does not support selecting upstream nodes.

    • After a materialized table node is created, descendant nodes are automatically inferred based on data lineage.

    Failure retries

    The number of times to retry a failed node. By default, this is the same as the workflow's retry count. If set, the node-level setting has a higher priority.

    Status subscription

    Configure the node status subscription policy and notification email. Subscriptions for Start and Fail statuses are supported.

    Timeout

    The timeout duration for node execution. If the running time exceeds this duration, the node is considered to have failed.

    Resource queue

    The deployment target for the node. For more information, see Manage resource queues. If not specified, the resource queue configured for the workflow is used by default.

    Note

    This configuration does not change the deployment target of the corresponding deployed materialized table.

    Tags

    You can set a tag name and tag value for the workflow node.

    Note
    • After you create a materialized table node, a dialog box appears. This dialog box uses the data lineage of the materialized table to help you quickly build descendant nodes. You can then select the corresponding nodes to quickly create them.

    • Descendant nodes are subject to the same limits. You can select only partitioned tables that are created in VVR 11.0 or later and use a stream refresh mode with a freshness of less than 30 minutes.

  7. (Optional) Click Add Node at the bottom of the page to add more nodes.

  8. Save the workflow configuration.

    1. Click Save in the upper-right corner of the node editor page.

    2. In the dialog box that appears, click OK.

Run a workflow

Each time a workflow runs, a workflow instance is generated on the Workflow Instance List and Details tab of the workflow details page.

  • Manual trigger: Immediately runs the current workflow task once. This method is suitable for temporary tests or immediate processing.

    In the Actions column of the target workflow, click Run. In the dialog box that appears, select Manual Execution, and then click OK. Each time you perform this operation, the workflow runs once.

  • Recurring schedule: After you start a recurring schedule, the workflow task is triggered at the specified time.

    To run the workflow at the specified effective time, you must enable its Scheduling State.

Note

To backfill historical data or reprocess data in a specific partition for a certain period of time, you can use the data backfill feature.

Data backfill

Data backfill supplements or updates data within a specific time period. It is mainly used for scenarios such as retransmitting historical data from upstream, correcting dimension tables, and adding new interfaces.

Perform a data backfill

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the navigation pane on the left, choose Operation Center > Task Orchestration.

  4. In the Actions column of the target workflow, click Run.

  5. In the Run dialog box, select Data Backfill as the scheduling method and configure the scheduling information.

    11

    Parameter

    Description

    Time interval

    The time period is passed to the workflow's time variables to refresh the data in the partitions for that period.

    Resource queue

    The target queue where the data backfill task runs. The default is default-queue.

  6. Click OK.

Manage data backfill instances

Data backfill instances are managed in the same way as workflow instances. For more information, see Manage workflow instances and node instances. To view a data backfill instance, click the name of the target workflow to go to its instance page.

Click the name of the target workflow to go to the workflow instance page.

22

On the Workflow Instance List and Details tab, you can view the data backfill instances and their details, such as running times and statuses.

Workflow status

In the Running Status column for a workflow, you can view the aggregated running status of all its instances. For example, if a workflow runs once a day for five days, it generates five workflow instances. The Running Status column displays a statistical summary of the running statuses of these five instances.

Status

Description

Purple

Queuing

Blue

Running

Green

Success

Red

Failed

Edit a workflow

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the navigation pane on the left, click Task Orchestration.

  4. In the Actions column of the target workflow, click Edit Workflow.

    For more information about parameter configuration, see Create a workflow.

    Note

    You cannot edit a workflow when its Scheduling State is Enabled.

References