All Products
Search
Document Center

Realtime Compute for Apache Flink:Manage workflows

Last Updated:Oct 16, 2024

A workflow is a directed acyclic graph (DAG) that you can create by dragging tasks and associating the tasks with each other. If you want to run tasks at specific points in time, you can create a workflow and configure tasks and scheduling policies in the workflow. This topic describes how to create and run a workflow.

Limits

  • You can create a workflow to schedule only tasks that are associated with batch deployments.

  • The Workflows feature is in public preview. The service level agreement (SLA) is not guaranteed in the public preview phase. If you have questions when you use this feature, submit a ticket for technical support.

  • The Workflows feature is supported only in regions and zones in China.

Create a workflow

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, choose O&M > Workflows.

  4. In the upper-left corner of the Workflows page, click Create Workflow. In the Create Workflow panel, configure the parameters. The following table describes the parameters.

  5. Parameter

    Description

    Name

    The name of the workflow. The workflow name must be unique in the current namespace.

    Variable Configuration

    One or more preset variables that are used for data computing.

    • Variable Name: Enter a custom variable name, such as ${date}.

    • Variable Value: Enter a static date, a time format, or an expression. For more information, see Compare custom parameters.

    You can configure the following system time variables:

    • Variable whose name is system.biz.date and value is ${system.biz.date}: This variable specifies the date of the day before the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMdd format.

    • Variable whose name is system.biz.curdate and value is ${system.biz.curdate}: This variable specifies the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMdd format.

    • Variable whose name is system.datetime and value is ${system.datetime}: This variable specifies the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMddHHmmss format.

    Note

    The configured variables apply to all task-associated deployments. The variables configured for a workflow take precedence over those configured for a deployment.

    Scheduling Type

    The method used to schedule the workflow. Valid values:

    • Manual Scheduling: You can only click Execute in the Actions column to run the workflow. The workflow runs once each time you click Execute.

    • Periodic Scheduling: The workflow runs based on the scheduling rule. The workflow can be scheduled by minute, hour, or day.

    Scheduling Rule

    This parameter is required only when the Scheduling Type parameter is set to Periodic Scheduling. You can select Use cron expression to specify complex scheduling rules. Example:

    • 0 0 */4 ? * *: The workflow is scheduled every four hours.

    • 0 0 2 ? * *: The workflow is scheduled at 02:00:00 every day.

    • 0 0 5,17 ? * MON-FRI: The workflow is scheduled at 05:00:00 and 17:00:00 from Monday to Friday.

    Scheduled Start Time

    The time when the scheduling rule takes effect. This parameter is required only when the Scheduling Type parameter is set to Periodic Scheduling.

    Important

    After you create a periodic scheduling workflow, you must turn on the switch in the State column to run the workflow at the time that is specified by the Scheduled Start Time parameter.

    Failure Retry Times

    The number of retries for each task in the workflow. By default, a task is not retried if the task fails.

    Failure Notification Email

    The default email address to which notifications are sent if a task fails.

    Note

    CloudMonitor can send notifications to the specified email address by using DingTalk or text messages. For more information, see Configure monitoring and alerting.

    Resource Queue

    The queue on which the workflow is deployed. For more information, see Manage queues. After you configure the queue for a workflow, the tasks in the workflow are automatically deployed on the queue. Therefore, you do not need to specify a queue for the tasks in the workflow.

    Note

    The configuration of this parameter does not change the queue for existing batch deployments.

    Tags

    One or more tags of the workflow. Specify a key and a value for each tag.

  6. After you configure the preceding parameters, click Create.

    After you create a workflow, the task editing page of the workflow appears.image.png

  7. Configure the initial task of the workflow.

    By default, an initial task is displayed on the task editing page of the workflow. To configure the initial task, perform the following steps: Click the initial task. In the Edit Task panel, configure the parameters and click Save.

    Parameter

    Description

    Deployment

    You can select only a batch deployment in the current namespace. Fuzzy search is supported.

    Name

    The name of the task in the current workflow.

    Variable Configuration

    One or more preset variables that are used for data computing.

    • Variable Name: Enter a custom variable name, such as ${date}.

    • Variable Value: Enter a static date, a time format, or an expression. For more information, see Compare custom parameters.

    You can configure the following system time variables:

    • Variable whose name is system.biz.date and value is ${system.biz.date}: This variable specifies the date of the day before the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMdd format.

    • Variable whose name is system.biz.curdate and value is ${system.biz.curdate}: This variable specifies the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMdd format.

    • Variable whose name is system.datetime and value is ${system.datetime}: This variable specifies the scheduled execution time of a daily scheduling workflow instance, in the yyyyMMddHHmmss format.

    Note
    • By default, variables that are not defined in a deployment are parsed in the current task and displayed in the Variable Configuration section. Variables whose name and value are configured are directly displayed. If you have configured a name but not a value for a variable, you must specify a variable value.

    • The variables configured for a task take precedence over those configured for a workflow.

    Upstream Tasks

    The upstream task on which the current task depends. You can select only other tasks in the current workflow.

    Note

    The initial task does not have an upstream task. You cannot select an upstream task for the initial task.

    Failure Retry Times

    The number of retries for each task in the workflow. By default, the value of this parameter is the same as the value of the Failure Retry Times parameter in the Create Workflow panel. If you set this parameter to a different value, the value of this parameter takes precedence over the value of the Failure Retry Times parameter in the Create Workflow panel.

    Subscription

    The state of the task in the workflow to which you want to subscribe. After you configure this parameter, you must configure the Email parameter to specify the email address to which notifications are sent and the Strategy parameter. Valid values of the Strategy parameter: Start and Fail.

    Timeout(sec)

    The timeout period of the task. If the execution time of the task exceeds the value of this parameter, the task fails to run.

    Resource Queue

    The queue on which the task is deployed. For more information, see Manage queues. If you do not configure this parameter, the queue on which the workflow is deployed is used.

    Note

    The configuration of this parameter does not change the queue for existing batch deployments.

    Tags

    You can specify a tag key and a tag value for the task in the workflow.

  8. Optional. Click Add Task in the lower part of the task editing page to add more tasks.

  9. Save the configurations of the workflow.

    1. Click Save in the upper-right corner of the task editing page.

    2. In the Save Workflow dialog box, click OK.

Run a workflow

A workflow instance is generated in the Instance History section on the Overview tab of the workflow details page each time a workflow runs.

  • Manually run a workflow

    Click Execute in the Actions column of the workflow that you want to run. The workflow runs once each time you click Execute.

  • Run a periodic scheduling workflow

    Find the workflow that you want to run and turn on the switch in the State column to run the workflow at the time specified by the Scheduled Start Time parameter.image.png

View the status of a workflow

You can view the status of all workflow instances of a workflow in the Status column. For example, if a workflow runs once a day for five days, five workflow instances are generated. The Status column displays the status of each workflow instance and the number of times the workflow stays in each state.

image.png

Status

Description

Purple

Pending

Blue

Running

Green

Successful

Red

Failed

Edit a workflow

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Workflows.

  4. Find the workflow that you want to manage and click Edit Workflow in the Actions column.

    For more information about how to modify the parameters, see the Create a workflow section of this topic.

    Note

    If the switch in the State column of a workflow is turned on, you cannot edit the workflow.

References

  • For more information about the concepts related to the Workflows feature, see Workflows (public preview).

  • For more information about how to view a workflow instance and the logs of the task instance, see Manage workflow instances and task instances.

  • For more information about how to add queues to isolate and manage resources, see Manage queues.

  • For more information about how to create a batch deployment of the SQL, JAR, or Python type, see Create a deployment.

  • After the remote shuffle service is enabled for batch deployments, the shuffle data is stored in a high-performance Apache Celeborn cluster. Deployments are no longer limited by the disk capacity of Flink compute nodes. This service enhances the processing capability of ultra-large-scale data and maintains high stability and cost-effectiveness of deployments. For more information, see Enable the remote shuffle service in a batch deployment (public preview).