All Products
Search
Document Center

Platform For AI:Python Script

Last Updated:Jan 13, 2026

Designer lets you create custom Python scripts. You can use the Python Script component to install dependency packages and run custom Python functions. This topic describes how to configure the Python Script component and provides usage examples.

Background information

The Python Script component is in the Custom Templates folder of Designer.

Prerequisites

  • You must grant the required permissions for Deep Learning Containers (DLC). For more information, see Product dependencies and authorizations: DLC.

  • The Python Script component depends on DLC as its underlying computing resource. You must associate a DLC computing resource with your Workspace. For more information, see Workspace Management.

  • The Python Script component depends on Object Storage Service (OSS) to store code. You must create an OSS Bucket. For more information, see Create a bucket.

    Important

    The OSS bucket must be in the same region as Designer and DLC.

  • Assign the Algorithm Developer role to the RAM User who will use this component. For more information, see Manage Workspace Members. If the RAM User also needs to use MaxCompute as a data source, you must also assign the MaxCompute Developer role.

Configure the component in the UI

  • Input stub

    The Python Script component has four input ports. Each port can connect to data from an OSS path or a MaxCompute table.

    • OSS path input

      The system mounts input from an upstream component's OSS path to the node where the Python script runs. The system automatically passes the mounted file path to the Python script as an argument. This is configured automatically. The argument format is python main.py --input1 /ml/input/data/input1, where --input1 represents the OSS path connected to the first input port. In your Python script, you can access the mounted files by reading from the local path /ml/input/data/input1.

    • MaxCompute table input

      MaxCompute table input is not mounted. Instead, the system passes the table information as a URI to the Python script as an argument. This is configured automatically. The argument format is python main.py --input1 odps://some-project-name/tables/table, where --input1 represents the MaxCompute table connected to the first input port. For inputs in the MaxCompute URI format, you can use the parse_odps_url function provided in the component's code template to parse metadata such as the project name, table name, and partition. For more information, see Usage examples.

  • Output ports

    The Python Script component has four output ports: two for OSS paths (Output Port 1 and Output Port 2) and two for MaxCompute tables (Table Output Port 1 and Table Output Port 2).

    • OSS path output

      The OSS path specified for the The system automatically mounts the OSS path that you configure for the Job Output Path parameter on the Code config tab to /ml/output/. The component's output ports OSS Output Port 1 and OSS Output Port 2 correspond to the subdirectories /ml/output/output1 and /ml/output/output2, respectively. In your script, you can write files to these directories as if they were local paths. You can then pass this data to downstream components.

    • MaxCompute table output

      If the current Workspace is configured with a MaxCompute project, the system automatically passes a temporary table URI to the Python script, for example: python main.py --output3 odps://<some-project-name>/tables/<output-table-name>. You can use PyODPS to create the table specified in the URI, write the processed data to it, and then pass the table to downstream components. For more information, see the following examples.

  • Component parameters

    Script Settings

    Parameter

    Description

    Task Output Path

    Select an OSS path for the task output.

    • The system mounts the configured OSS directory to the /ml/output/ path of the job container. The system persists data written to the /ml/output/ path to the corresponding OSS directory.

    • The component's output ports OSS Output Port 1 and OSS Output Port 2 correspond to the subdirectories output1 and output2 under the /ml/output/ path. When you connect an OSS output port to a downstream component, that component receives data from the corresponding subdirectory.

    Set Code Source

    (Select one)

    Submit in Editor

    • Python Code: Select the OSS path where the code is saved. Code written in the editor is saved to this path. The Python code file defaults to main.py.

      Important

      Before clicking Save for the first time, make sure that no file with the same name exists in the specified path to prevent the file from being overwritten.

    • Python Code Editor: The editor provides example code by default. For more information, see Usage examples. You can write code directly in the editor.

    Specify Git Configuration

    • Git repository: The URL of the Git repository.

    • Code Branch: The code branch. The default value is master.

    • Code Commit: The commit ID takes precedence over the branch. If you specify this parameter, the branch setting does not take effect.

    • Git Username: Required to access a private repository.

    • Git Access Token: Required to access a private repository. For more information, see Appendix: Obtain a token for a GitHub account.

    Select Code Configuration

    • Select Code Configuration: Select a created Code Build. For more information, see Code Configuration.

    • Code Branch: The code branch. Default value is master.

    • Code Commit: The commit ID takes precedence over the branch. If you specify this parameter, the branch setting does not take effect.

    Select a file or folder from OSS

    In OSS Path, select the path where the code is uploaded.

    Execution Command

    Enter the command that you want to run, such as python main.py.

    Note

    The system automatically generates the command based on the script name and the connections of the component's input and output ports. You do not need to configure this parameter manually.

    Advanced Options

    • Third-party Dependency Libraries: In the text box, you can install third-party libraries. The format is the same as a Python requirements.txt file. The system automatically installs these libraries before the node runs.

      cycler==0.10.0            # via matplotlib
      kiwisolver==1.2.0         # via matplotlib
      matplotlib==3.2.1
      numpy==1.18.5
      pandas==1.0.4
      pyparsing==2.4.7          # via matplotlib
      python-dateutil==2.8.1    # via matplotlib, pandas
      pytz==2020.1              # via pandas
      scipy==1.4.1              # via seaborn
    • Enable Fault Tolerance Monitoring: If you select this parameter, a text box for Error Monitoring appears. In the text box, you can add parameters to specify the content for error monitoring.

    Execution Settings

    Parameter

    Description

    Select Resource Group

    You can select a public DLC Resource Group::

    • If you select a public resource group, set the Select Machine Instance Type parameter. You can select CPU or GPU machine instances. The default value is ecs.c6.large.

    This defaults to the DLC native resource group in the current Workspace.

    VPC Configuration

    Select a Virtual Private Cloud (VPC) to mount.

    Security Group

    Select a security group to mount.

    Advanced Options

    If you select this parameter, you can configure the following parameters:

    • Instance Count: Specify the number of instances based on your business requirements. Default value: 1.

    • Job Image URI: The default value is the open source XGBoost 1.6.0. To use a deep learning framework, you must change the image.

    • Job Type: Modify this parameter only if your code is designed for distributed execution. Valid values:

      • XGBoost/LightGBM Job

      • TensorFlow Job

      • PyTorch Job

      • MPI Job

Usage examples

Default code template

The Python script component provides the following default code template.

import os
import argparse
import json

""" Example code for the Python V2 component. """

# Default MaxCompute execution environment in the current Workspace, including the MaxCompute project name and endpoint.
# Injected into the job's execution environment only when a MaxCompute project exists in the current Workspace.
# Example: {"endpoint": "http://service.cn.maxcompute.aliyun-inc.com/api", "odpsProject": "lq_test_mc_project"}.
ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"


def init_odps():
    from odps import ODPS

    # Information about the default MaxCompute project in the current Workspace.
    mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
    o = ODPS(
        access_id="<YourAccessKeyId>",
        secret_access_key="<YourAccessKeySecret>",
        # Select the endpoint based on the Region where your project is located, for example: http://service.cn-shanghai.maxcompute.aliyun-inc.com/api.
        endpoint=mc_execution["endpoint"],
        project=mc_execution["odpsProject"],
    )
    return o


def parse_odps_url(table_uri):
    from urllib import parse

    parsed = parse.urlparse(table_uri)
    project_name = parsed.hostname
    r = parsed.path.split("/", 2)
    table_name = r[2]
    if len(r) > 3:
        partition = r[3]
    else:
        partition = None
    return project_name, table_name, partition


def parse_args():
    parser = argparse.ArgumentParser(description="PythonV2 component script example.")
    parser.add_argument("--input1", type=str, default=None, help="Component input port 1.")
    parser.add_argument("--input2", type=str, default=None, help="Component input port 2.")
    parser.add_argument("--input3", type=str, default=None, help="Component input port 3.")
    parser.add_argument("--input4", type=str, default=None, help="Component input port 4.")
    parser.add_argument("--output1", type=str, default=None, help="Output OSS port 1.")
    parser.add_argument("--output2", type=str, default=None, help="Output OSS port 2.")
    parser.add_argument("--output3", type=str, default=None, help="Output MaxComputeTable 1.")
    parser.add_argument("--output4", type=str, default=None, help="Output MaxComputeTable 2.")
    args, _ = parser.parse_known_args()
    return args


def write_table_example(args):
    # Example: Copy data from a public PAI table by running an SQL statement and use it as the temporary table specified for Table Output Port 1.
    output_table_uri = args.output3
    o = init_odps()
    project_name, table_name, partition = parse_odps_url(output_table_uri)
    o.run_sql(f"create table {project_name}.{table_name} as select * from pai_online_project.heart_disease_prediction;")


def write_output1(args):
    # Example: Write the result data to the mounted OSS path (a subdirectory of OSS Output Port 1). The result can then be passed to downstream components.
    output_path = args.output1
    os.makedirs(output_path, exist_ok=True)
    p = os.path.join(output_path, "result.text")
    with open(p, "w") as f:
        f.write("TestAccuracy=0.88")


if __name__ == "__main__":
    args = parse_args()
    print("Input1={}".format(args.input1))
    print("Output1={}".format(args.output1))

    # write_table_example(args)
    # write_output1(args)

Function descriptions:

  • init_odps(): Initializes an ODPS instance to read MaxCompute table data. You must provide your AccessKeyId and AccessKeySecret. For more information about how to obtain an AccessKey, see Obtain an AccessKey.

  • parse_odps_url(table_uri): Parses the URI of an input MaxCompute table and returns the project name, table name, and partition. The table_uri format is odps://${your_projectname}/tables/${table_name}/${pt_1}/${pt_2}/. For example, in odps://test/tables/iris/pa=1/pb=1pa=1/pb=1 is a multi-level partition.

  • parse_args(): Parses the arguments passed to the script. Input and output data are passed to the script as command-line arguments.

Example 1: Use the Python Script component in series with other components

This example modifies the heart disease prediction template to show how to connect the Python script component with other Machine Learning Designer components. 组合使用 Pipeline configuration:

  1. Create a heart disease prediction pipeline and go to the pipeline designer. For more information, see Heart disease prediction.

  2. Drag the Python Script component to the canvas, rename it to SMOTE, and configure the following code.

    Important

    The imblearn library is not included in the default image. You must configure imblearn in the Third-party Dependencies field on the Code config tab of the component. The library is automatically installed before the node runs.

    import argparse
    import json
    import os
    from odps.df import DataFrame
    from imblearn.over_sampling import SMOTE
    from urllib import parse
    from odps import ODPS
    
    ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"
    
    
    def init_odps():
        # Information about the default MaxCompute project in the current Workspace.
        mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
        o = ODPS(
            access_id="<YourAccessKeyId>",
            secret_access_key="<YourAccessKeySecret>",
            # Select the endpoint based on the Region where your project is located, for example: http://service.cn-shanghai.maxcompute.aliyun-inc.com/api.
            endpoint=mc_execution["endpoint"],
            project=mc_execution["odpsProject"],
        )
        return o
    
    
    def get_max_compute_table(table_uri, odps):
        parsed = parse.urlparse(table_uri)
        project_name = parsed.hostname
        table_name = parsed.path.split('/')[2]
        table = odps.get_table(project_name + "." + table_name)
        return table
    
    
    def run():
        parser = argparse.ArgumentParser(description='PythonV2 component script example.')
        parser.add_argument(
            '--input1',
            type=str,
            default=None,
            help='Component input port 1.'
        )
        parser.add_argument(
            '--output3',
            type=str,
            default=None,
            help='Component input port 1.'
        )
        args, _ = parser.parse_known_args()
        print('Input1={}'.format(args.input1))
        print('output3={}'.format(args.output3))
        o = init_odps()
        imbalanced_table = get_max_compute_table(args.input1, o)
        df = DataFrame(imbalanced_table).to_pandas()
        sm = SMOTE(random_state=2)
        X_train_res, y_train_res = sm.fit_resample(df, df['ifhealth'].ravel())
        new_table = o.create_table(get_max_compute_table(args.output3, o).name, imbalanced_table.schema, if_not_exists=True)
        with new_table.open_writer() as writer:
            writer.write(X_train_res.values.tolist())
    
    
    if __name__ == '__main__':
        run()

    You must replace <YourAccessKeyId> and <YourAccessKeySecret> with your own AccessKey ID and AccessKey secret. For more information about how to obtain an AccessKey, see Obtain an AccessKey.

  3. Connect the SMOTE component downstream of the Split component. This step applies the classic SMOTE algorithm to oversample the training data from the Split component. This synthesizes new samples for the minority class in the training set to mitigate class imbalance.

  4. Connect the new data from the SMOTE component to the Logistic Regression for Binary Classification component for training.

  5. Connect the trained Model to the same prediction data and evaluation components as the Model in the left branch for a side-by-side comparison. After the components run successfully, click the 可视化 icon to go to the visualization page and view the final evaluation results.

Example 2: Orchestrate DLC Jobs with Machine Learning Designer

You can connect multiple custom Python script components in Machine Learning Designer to orchestrate a pipeline of DLC jobs and enable periodic scheduling. The following figure shows an example that starts four DLC Jobs in the sequence of a directed acyclic graph (DAG).

Note

If the execution code of a DLC job does not need to read data from upstream nodes or pass data to downstream nodes, the connections between nodes represent only the execution dependency and sequence.

DAG图You can then deploy the entire pipeline developed in Machine Learning Designer to DataWorks with a single click for periodic scheduling. For more information, see Use DataWorks for offline scheduling of Designer workflows.

Example 3: Pass Global Variables to a Python Script component

  1. Configure a global variable.

    On the Machine Learning Designer pipeline page, click the blank canvas and configure a Global Variables tab on the right.image.png

  2. Use one of the following methods to pass the configured Global Variable to the Python Script component.

    • Click the Python Script component. On the Code config tab, select Advanced Option and configure the Global Variable as an input parameter in the Command field.image.png

    • Modify the Python code to parse the arguments using argparser.

      The following code is an updated example based on the Global Variable configured in Step 1. You must update the code according to your actual Global Variable configuration. You can then replace the code in the code editor on the Code config tab of the Python Script component.

      import os
      
      import argparse
      import json
      
      """
      Python V2 component sample code
      """
      
      ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"
      
      
      def init_odps():
      
          from odps import ODPS
      
          mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
      
          o = ODPS(
              access_id="<YourAccessKeyId>",
              secret_access_key="<YourAccessKeySecret>",
              endpoint=mc_execution["endpoint"],
              project=mc_execution["odpsProject"],
          )
          return o
      
      
      def parse_odps_url(table_uri):
          from urllib import parse
      
          parsed = parse.urlparse(table_uri)
          project_name = parsed.hostname
          r = parsed.path.split("/", 2)
          table_name = r[2]
          if len(r) > 3:
              partition = r[3]
          else:
              partition = None
          return project_name, table_name, partition
      
      
      def parse_args():
          parser = argparse.ArgumentParser(description="PythonV2 component script example.")
      
          parser.add_argument("--input1", type=str, default=None, help="Component input port 1.")
          parser.add_argument("--input2", type=str, default=None, help="Component input port 2.")
          parser.add_argument("--input3", type=str, default=None, help="Component input port 3.")
          parser.add_argument("--input4", type=str, default=None, help="Component input port 4.")
      
          parser.add_argument("--output1", type=str, default=None, help="Output OSS port 1.")
          parser.add_argument("--output2", type=str, default=None, help="Output OSS port 2.")
          parser.add_argument("--output3", type=str, default=None, help="Output MaxComputeTable 1.")
          parser.add_argument("--output4", type=str, default=None, help="Output MaxComputeTable 2.")
          # Add code based on the configured global variables.
          parser.add_argument("--arg1", type=str, default=None, help="Argument 1.")
          parser.add_argument("--arg2", type=int, default=None, help="Argument 2.")
          args, _ = parser.parse_known_args()
          return args
      
      
      def write_table_example(args):
      
          output_table_uri = args.output3
      
          o = init_odps()
          project_name, table_name, partition = parse_odps_url(output_table_uri)
          o.run_sql(f"create table {project_name}.{table_name} as select * from pai_online_project.heart_disease_prediction;")
      
      
      def write_output1(args):
          output_path = args.output1
      
          os.makedirs(output_path, exist_ok=True)
          p = os.path.join(output_path, "result.text")
          with open(p, "w") as f:
              f.write("TestAccuracy=0.88")
      
      
      if __name__ == "__main__":
          args = parse_args()
      
          print("Input1={}".format(args.input1))
          print("Output1={}".format(args.output1))
          # Add code based on the configured global variables.
          print("Argument1={}".format(args.arg1))
          print("Argument2={}".format(args.arg2))
          # write_table_example(args)
          # write_output1(args)