All Products
Search
Document Center

Realtime Compute for Apache Flink:Develop a Python API draft

Last Updated:Sep 12, 2024

This topic describes the background information, limits, development methods, debugging methods of Python API draft development in Realtime Compute for Apache Flink. This topic also describes how to use a connector.

Background information

You must develop a Python API draft in your on-premises environment. Then, you can deploy the draft and start the deployment for the draft in the development console of Realtime Compute for Apache Flink. For more information, see Getting started with a Flink Python deployment.

The following table lists the software packages that are installed in Realtime Compute for Apache Flink workspaces.

Software package

Version

apache-beam

2.23.0

avro-python3

1.9.1

certifi

2020.12.5

cloudpickle

1.2.2

crcmod

1.7

cython

0.29.16

dill

0.3.1.1

docopt

0.6.2

fastavro

0.23.6

future

0.18.2

grpcio

1.29.0

hdfs

2.6.0

httplib2

0.17.4

idna

2.10

jsonpickle

1.2

mock

2.0.0

numpy

1.19.5

oauth2client

3.0.0

pandas

0.25.3

pbr

5.5.1

pip

20.1.1

protobuf

3.15.3

py4j

0.10.8.1

pyarrow

0.17.1

pyasn1-modules

0.2.8

pyasn1

0.4.8

pydot

1.4.2

pymongo

3.11.3

pyparsing

2.4.7

python-dateutil

2.8.0

pytz

2021.1

requests

2.25.1

rsa

4.7.2

setuptools

47.1.0

six

1.15.0

typing-extensions

3.7.4.3

urllib3

1.26.3

wheel

0.36.2

Limits

Services provided by Realtime Compute for Apache Flink are subject to deployment environments and network environments. Therefore, when you develop Python API drafts, take note of the following points:

  • Only Apache Flink 1.13 and later are supported.

  • Python 3.7.9 is pre-installed in your Realtime Compute for Apache Flink workspace, and common Python libraries such as pandas, NumPy, and PyArrow are pre-installed in the Python environment. Therefore, you must develop code in Python 3.7 or later.

  • Java Development Kit (JDK) 1.8 is used in the runtime environment of Realtime Compute for Apache Flink. If your Python API draft depends on a third-party JAR package, make sure that the JAR package is compatible with JDK 1.8.

  • Realtime Compute for Apache Flink that uses VVR 4.X supports only open source Scala 2.11. Realtime Compute for Apache Flink that uses VVR 6.X or later supports only open source Scala 2.12. If your Python API draft depends on a third-party JAR package, make sure that the JAR package that you use is compatible with the version of the open source Scala.

Develop a draft

References

You can develop Flink business code in your on-premises environment. After the business code is developed, you must upload the business code to the development console of Realtime Compute for Apache Flink and publish the business code as a deployment.

  • For more information about how to develop business code of Apache Flink 1.17, see Python API.

  • Issues may occur when you develop code in Apache Flink. For more information about the issues and fixes, see FAQ.

Debug a deployment

In the code of Python user-defined functions (UDFs), you can use the logging method to generate logs and locate errors based on the logs. The following code shows an example.

@udf(result_type=DataTypes.BIGINT())
def add(i, j):    
  logging.info("hello world")    
  return i + j

After logs are generated, you can view the logs in the log file of TaskManager.

Use a connector

For more information about the connectors supported by Realtime Compute for Apache Flink, see Supported connectors. To use a connector, perform the following steps:

  1. Log on to the Realtime Compute for Apache Flink console.

  2. Find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Artifacts.

  4. On the Artifacts page, click Upload Artifact and select the Python package of the desired connector that you want to upload.

    You can upload the Python package of a connector that you develop or a connector provided by Realtime Compute for Apache Flink. You can visit Connectors to download the official Python packages provided by Realtime Compute for Apache Flink.

  5. On the O&M > Deployments page, click Create Deployment > Python Deployment. In the Create Deployment dialog box, select the Python package of the desired connector from the drop-down list of Additional Dependencies and configure other parameters to create a deployment.

  6. On the Deployments page, click the name of the created deployment. In the upper-right corner of the Parameters section on the Configuration tab, click Edit. Then, enter the location information of the Python package of the desired connector in the Other Configuration field.

    If your deployment depends on the Python packages of multiple connectors and the packages are named connector-1.jar and connector-2.jar, configure the following information:

    pipeline.classpaths: 'file:///flink/usrlib/connector-1.jar;file:///flink/usrlib/connector-2.jar'

Reference

  • For more information about how to develop a Python deployment of Realtime Compute for Apache Flink, see Getting started with a Flink Python deployment.

  • For more information about how to use custom Python virtual environments, third-party Python packages, JAR packages, and data files in Python deployments of Realtime Compute for Apache Flink, see Use Python dependencies.

  • Realtime Compute for Apache Flink supports SQL drafts and DataStream drafts. For more information about how to develop SQL drafts and DataStream drafts, see Develop an SQL draft and Develop an JAR draft.