All Products
Search
Document Center

Realtime Compute for Apache Flink:Getting started with a Python deployment

Last Updated:Nov 01, 2024

This topic describes how to create and start a Python streaming deployment and a Python batch deployment in the development console of Realtime Compute for Apache Flink.

Prerequisites

  • A RAM user or RAM role has the required permissions. This prerequisite must be met if you want to use the RAM user or RAM role to access the development console of Realtime Compute for Apache Flink. For more information, see Permission management.

  • A Realtime Compute for Apache Flink workspace is created. For more information, see Activate Realtime Compute for Apache Flink.

Step 1: Prepare Python code files

Python packages cannot be developed in the management console of Realtime Compute for Apache Flink. Therefore, you must develop Python files in your on-premises environment. For more information about how to debug a deployment and use a connector, see Develop a Python API draft.

Important

The Flink version that is used when you develop a Python package must be the same as the Flink version in the engine version that is selected in Step 3: Create a Python deployment. You can use dependencies in Python deployments. The dependencies include custom Python virtual environments, third-party Python packages, JAR packages, and data files. For more information, see Use Python dependencies.

To help you quickly perform various operations on a Python deployment in the development console of Realtime Compute for Apache Flink, a test Python file and input data file are provided for subsequent operations. This test Python file is used to collect the number of times a word appears in the input data file.

  • Download a test Python file based on the type of your deployment.

Step 2: Upload the test Python file and input data file

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane of the development console of Realtime Compute for Apache Flink, click Artifacts.

  4. In the upper-left corner of the Artifacts page, click Upload Artifact and select the test Python file and the data file.

    In this topic, the test Python file and the input data file that are downloaded in Step 1 are uploaded. For more information about the directories of the files, see Artifacts.

Step 3: Create a Python deployment

Streaming deployment

  1. In the left-side navigation pane of the development console of Realtime Compute for Apache Flink, choose O&M > Deployments. In the upper-left corner of the Deployments page, choose Create Deployment > Python Deployment.

  2. In the Create Python Deployment dialog box, configure the parameters. The following table describes the parameters.

    py_流_zh.jpg

    Parameter

    Description

    Example

    Deployment Mode

    The mode that you want to use to deploy the Python deployment. Select Stream Mode.

    Stream Mode

    Deployment Name

    The name of the Python deployment.

    flink-streaming-test-python

    Engine Version

    The engine version that is used by the current deployment.

    We recommend that you use an engine version that has the RECOMMENDED or STABLE label. Versions with the labels provide higher reliability and performance. For more information, see Release notes and Engine version.

    vvr-8.0.9-flink-1.17

    Python Uri

    The Python file. Download the test Python file word_count_streaming.py and click the 上传 icon on the right side of the Python Uri field to select and upload the test Python file.

    -

    Entry Module

    The entry point class of the Python program.

    • If the file that you upload is a .py file, you do not need to configure this parameter.

    • If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.

    Not required

    Entry Point Main Arguments

    The parameters that you want to call in the main method.

    In this example, enter the directory in which the input data file Shakespeare is stored.

    --input oss://<Name of the associated OSS bucket>/artifacts/namespaces/<Name of the workspace>/Shakespeare

    You can go to the Artifacts page and click the name of the input data file Shakespeare to copy the complete directory.

    Deployment Target

    The destination in which the deployment is deployed. Select the queue or session cluster that you want to use from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Important

    Monitoring metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    For more information about other deployment parameters, see Create a deployment.

  3. Click Deploy.

Batch deployment

  1. In the left-side navigation pane of the development console of Realtime Compute for Apache Flink, choose O&M > Deployments. In the upper-left corner of the Deployments page, choose Create Deployment > Python Deployment.

  2. In the Create Python Deployment dialog box, configure the parameters. The following table describes the parameters.

    py_批_zh.jpg

    Parameter

    Description

    Example

    Deployment Mode

    The mode that you want to use to deploy the Python deployment. Select Batch Mode.

    Batch Mode

    Deployment Name

    The name of the deployment.

    flink-batch-test-python

    Engine Version

    The engine version that is used by the current deployment.

    We recommend that you use an engine version that has the RECOMMENDED or STABLE label. Versions with the labels provide higher reliability and performance. For more information, see Release notes and Engine version.

    vvr-8.0.9-flink-1.17

    Python Uri

    The Python file. Download the test Python file word_count_batch.py and click the 上传 icon on the right side of the Python Uri field to select and upload the test Python file.

    -

    Entry Module

    The entry point class of the Python program.

    • If the file that you upload is a .py file, you do not need to configure this parameter.

    • If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.

    Not required

    Entry Point Main Arguments

    The parameters that you want to call in the main method.

    In this example, enter the directory in which the input data file Shakespeare and the output data file batch-quickstart-test-output are stored.

    Note

    You need to only specify the directory of the output data file. You do not need to create an output data file in the specified directory in advance. The parent directory of the output data file is the same as the directory of the input data file.

    --input oss://<Name of the associated OSS bucket>/artifacts/namespaces/<Name of the workspace>/Shakespeare

    --output oss://<Name of the associated OSS bucket>/artifacts/namespaces/<Name of the workspace>/python-batch-quickstart-test-output

    You can go to the Artifacts page and click the name of the input data file Shakespeare to copy the complete directory.

    Deployment Target

    The destination in which the deployment is deployed. Select the queue or session cluster that you want to use from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Important

    Monitoring metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    For more information about other deployment parameters, see Create a deployment.

  3. Click Deploy.

Step 4: Start the deployment and view the computing result

Streaming deployment

  1. In the left-side navigation pane of the development console of Realtime Compute for Apache Flink, choose O&M > Deployments. On the Deployments page, find the desired deployment and click Start in the Actions column.

    py_部署_zh.jpg

  2. In the Start Job panel, select Initial Mode and click Start. For more information about how to start a deployment, see Start a deployment.

    After you click Start, the deployment enters the RUNNING or FINISHED state. This indicates that the deployment runs as expected. If you upload the test Python file to create the deployment, the deployment is in the FINISHED state.

  3. After the deployment enters the RUNNING state, view the computing result of the streaming deployment.

    Important

    If you upload the test Python file to create the deployment, the computing result of the streaming deployment is deleted when the streaming deployment enters the FINISHED state. You can view the computing result of the streaming deployment only when the streaming deployment is in the RUNNING state.

    On the Deployments page, find the desired deployment and click the name of the deployment. On the page that appears, click Logs. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.

    image.png

Batch deployment

  1. In the left-side navigation pane of the development console of Realtime Compute for Apache Flink, choose O&M > Deployments. On the Deployments page, find the desired deployment and click Start in the Actions column.

    py_部署批_zh.jpg

  2. In the Start Job panel, click Start. For more information about how to start a deployment, see Start a deployment.

  3. After the deployment enters the FINISHED state, view the computing result of the batch deployment.

    Log on to the OSS console and view the computing result in the oss://<Name of the associated OSS bucket>/artifacts/namespaces/<Name of the workspace>/batch-quickstart-test-output directory. Click the folder whose name is the start date and start time of the deployment and click the file that you want to manage. In the panel that appears, click Download.

    下载

    The computing result of the batch deployment is an .ext file. After you download the output data file, you can use Notepad or Microsoft Office Word to open the file. The following figure shows the computing result.result

Step 5: (Optional) Cancel the deployment

If you modify the SQL code for a deployment, add or delete parameters to or from the WITH clause, or change the version of a deployment, you must deploy the draft of the deployment, cancel the deployment, and then start the deployment to make the changes take effect. If a deployment fails and cannot reuse the state data to recover or you want to update the parameter settings that do not dynamically take effect, you must cancel and then start the deployment. For more information about how to cancel a deployment, see Cancel a deployment.

References