All Products
Search
Document Center

MaxCompute:Custom images

Last Updated:Sep 20, 2024

MaxCompute introduces the custom image management feature to address the complexities of SQL or Python development, which often involve intricate business logic, numerous third-party package dependencies, and extensive resource references. The custom image management feature allows the flexible use of Docker images to construct the necessary development environments for MaxCompute SQL and Python (PyODPS or MaxFrame) development. This topic describes how to use custom image management feature.

Prerequisites

  • Docker is installed.

  • The corresponding account or user must be granted RAM role read permissions, Alibaba Cloud Container Registry (ACR) operation permissions, and MaxCompute custom image operation permissions. The following table shows the permission requirements:

    Authorization scenario

    Account type

    Permission requirements

    Guidance link

    RAM role read permissions

    Alibaba Cloud account (recommended)

    Alibaba Cloud accounts have RAM role read permissions by default. You do not need to authorize.

    N/A

    RAM user

    Grant the AliyunRAMReadOnlyAccess permission.

    ACR operation permissions

    Alibaba Cloud account (recommended)

    Alibaba Cloud accounts have all ACR operation permissions by default. You do not need to authorize.

    N/A

    RAM user

    Grant the RAM user the AliyunContainerRegistryReadOnlyAccess permission.

    Attach system policies to a RAM user

    MaxCompute custom image operation permissions

    Alibaba Cloud account (recommended)

    Alibaba Cloud accounts have all permissions for viewing, adding, and deleting MaxCompute custom images by default. You do not need to authorize.

    N/A

    RAM user

    Grant the RAM user the necessary permissions.

Limits

  • Image size: The maximum size for a single image in MaxCompute custom images is 10 GB.

  • Number of images: A tenant in MaxCompute can upload a maximum of 10 images.

  • ACR version requirements: Only Basic Edition or Advanced Edition ACR Enterprise Edition instances are supported.

  • CPU architecture requirements: Images must be built by using x86_64 architecture CPUs. ARM, other non-x86_64 architecture CPUs, and macOS M series are not supported.

  • Library version requirements: The MaxCompute job runtime environment is aligned with CentOS 7. When building images, use package versions compatible with CentOS 7. The yum source in base images is configured to the Alibaba Cloud CentOS 7 image source address.

  • File directory operation restrictions within the image: Avoid placing personal files in the /home/admin, /usr/local/lib, /usr/ali,and/apsara directories when installing packages with pip or yum. These directories are overwritten when the container starts, as MaxCompute mounts the runtime environment to these locations.

Step 1: Build a custom image in Docker

You can build a custom image by using the DockerFile method based on the MaxCompute base image. The MaxCompute base image address is registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest, which provides basic environments such as Python 3.7, Python 3.11, pip, and yum.

  1. Create a DockerFile for building a custom image based on the MaxCompute base image. The code is as follows:

    # Use MaxCompute base image
    From registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest
    
    # Install system dependencies
    RUN yum install vi -y
    
    # Install third-party libraries
    RUN /usr/ali/python3.7/bin/python3 -m pip install --no-cache-dir pandas
  2. Package the image by using the DockerFile.

    sudo docker build -f DockerFile -t <image_name>:<tag> .

    The parameter descriptions are as follows:

    • image_name: The custom image name.

    • tag: The custom image version.

Step 2: Upload the custom image to ACR

  1. Log on to the Container Registry console and create an image repository in ACR. For specific operations, see Use a Container Registry Enterprise Edition instance to build an image. The following table shows the key parameter configurations:

    Important

    Custom images can only be uploaded to ACR Enterprise Edition instances with either Basic Edition or Advanced Edition.

    Step

    Parameter name

    Description

    Create Enterprise Edition instance

    Instance type

    Select either Basic Edition or Advanced Edition.

    Create image repository

    Code source

    Choose Local repository.

  2. Upload the built custom image to ACR in the same account.

    1. In the left-side navigation pane, choose Repository > Repositories, and navigate to the corresponding image repository.

    2. On the Information page of the image repository, click Details in the left-side navigation pane, and click and follow the Instructions on Images tap to upload the custom image to the ACR image repository in the Docker environment.

    3. (Optional) If your machine is within a VPC network, perform the following steps:

      1. Configure the access control for the created Enterprise Edition instance to allow VPC connections. For details, see Configure a VPC ACL.

      2. Add vpc to the domain name when using the ACR Enterprise Edition instance in the Docker environment. For example, change acr-test-registry.cn-wulanchabu.cr.aliyuncs.com to acr-test-registry-vpc.cn-wulanchabu.cr.aliyuncs.com in the following command:

        $ docker login --username=***@test.aliyunid.com acr-test-registry.cn-wulanchabu.cr.aliyuncs.com

Step 3: Add the custom image to MaxCompute

Associate an existing image in ACR with MaxCompute for unified management of development images.

  1. Log on to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, choose Tenants > Images, and click the Custom Image tab.

  3. In the Custom Image tab, click Create Image, and configure the following parameters in the Add Image dialog box:

    Note

    When creating an image for the first time, click OK in the MaxCompute Service-linked Role dialog box that appears. The system automatically creates a service-linked role for accessing ACR resources.

    Parameter name

    Description

    Image Name

    The custom image name. It can be used in subsequent MaxCompute SQL, PyODPS, and MaxFrame development.

    Image Type

    The ACR image type. Only ACR Enterprise Edition images are supported.

    Enterprise Edition Image Instance

    Select the Enterprise Edition image instance created in ACR.

    Image Namespace

    Select the Enterprise Edition image namespace created in ACR.

    Image Repository

    Select the Enterprise Edition image repository created in ACR.

    Image Version

    Select the image version you uploaded to ACR.

    Image Description

    Provide a description for the image being added.

  4. Click OK. You can view the custom image in the custom image list.

Step 4: Use the custom image

You can use custom images in MaxCompute SQL UDF, PyODPS, and MaxFrame development.

Important

Ensure that each development job specifies only one image to prevent image conflict issues.

  • When calling UDFs, you can specify the dependent image and Python version at the SQL session level by using flags. The command is as follows:

    set odps.sql.python.version=cp37;
    set odps.session.image = <image_name>;
  • In PyODPS development, you can specify an existing image by using the image parameter of the execute or persist method. The command is as follows:

    Note

    If you need to reference an image in PyODPS development, ensure PyODPS is upgraded to version V0.11.5 or above.

    image='<image_name>'
  • In MaxFrame development, you can specify an existing image for the current job. The relevant parameters are as follows:

    config.options.sql.settings = {
        "odps.session.image": "<image_name>"
    }