MaxCompute introduces the custom image management feature to address the complexities of SQL or Python development, which often involve intricate business logic, numerous third-party package dependencies, and extensive resource references. The custom image management feature allows the flexible use of Docker images to construct the necessary development environments for MaxCompute SQL and Python (PyODPS or MaxFrame) development. This topic describes how to use custom image management feature.
Prerequisites
Docker is installed.
For a Linux environment, refer to the official Docker documentation for Docker installation instructions.
For macOS or Windows environments:
Individual developers can utilize Docker Desktop.
Enterprise users without a purchased license are recommended to use the open-source Rancher Desktop.
The corresponding account or user must be granted RAM role read permissions, Alibaba Cloud Container Registry (ACR) operation permissions, and MaxCompute custom image operation permissions. The following table shows the permission requirements:
Authorization scenario
Account type
Permission requirements
Guidance link
RAM role read permissions
Alibaba Cloud account (recommended)
Alibaba Cloud accounts have RAM role read permissions by default. You do not need to authorize.
N/A
RAM user
Grant the AliyunRAMReadOnlyAccess permission.
ACR operation permissions
Alibaba Cloud account (recommended)
Alibaba Cloud accounts have all ACR operation permissions by default. You do not need to authorize.
N/A
RAM user
Grant the RAM user the AliyunContainerRegistryReadOnlyAccess permission.
MaxCompute custom image operation permissions
Alibaba Cloud account (recommended)
Alibaba Cloud accounts have all permissions for viewing, adding, and deleting MaxCompute custom images by default. You do not need to authorize.
N/A
RAM user
Grant the RAM user the necessary permissions.
Limits
Image size: The maximum size for a single image in MaxCompute custom images is 10 GB.
Number of images: A tenant in MaxCompute can upload a maximum of 10 images.
ACR version requirements: Only Basic Edition or Advanced Edition ACR Enterprise Edition instances are supported.
CPU architecture requirements: Images must be built by using x86_64 architecture CPUs. ARM, other non-x86_64 architecture CPUs, and macOS M series are not supported.
Library version requirements: The MaxCompute job runtime environment is aligned with CentOS 7. When building images, use package versions compatible with CentOS 7. The yum source in base images is configured to the Alibaba Cloud CentOS 7 image source address.
File directory operation restrictions within the image: Avoid placing personal files in the
/home/admin, /usr/local/lib, /usr/ali,
and/apsara
directories when installing packages with pip or yum. These directories are overwritten when the container starts, as MaxCompute mounts the runtime environment to these locations.
Step 1: Build a custom image in Docker
You can build a custom image by using the DockerFile method based on the MaxCompute base image. The MaxCompute base image address is registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest
, which provides basic environments such as Python 3.7, Python 3.11, pip, and yum.
Create a DockerFile for building a custom image based on the MaxCompute base image. The code is as follows:
# Use MaxCompute base image From registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest # Install system dependencies RUN yum install vi -y # Install third-party libraries RUN /usr/ali/python3.7/bin/python3 -m pip install --no-cache-dir pandas
Package the image by using the DockerFile.
sudo docker build -f DockerFile -t <image_name>:<tag> .
The parameter descriptions are as follows:
image_name: The custom image name.
tag: The custom image version.
Step 2: Upload the custom image to ACR
Log on to the Container Registry console and create an image repository in ACR. For specific operations, see Use a Container Registry Enterprise Edition instance to build an image. The following table shows the key parameter configurations:
ImportantCustom images can only be uploaded to ACR Enterprise Edition instances with either Basic Edition or Advanced Edition.
Step
Parameter name
Description
Create Enterprise Edition instance
Instance type
Select either Basic Edition or Advanced Edition.
Create image repository
Code source
Choose Local repository.
Upload the built custom image to ACR in the same account.
In the left-side navigation pane, choose Repository > Repositories, and navigate to the corresponding image repository.
On the Information page of the image repository, click Details in the left-side navigation pane, and click and follow the Instructions on Images tap to upload the custom image to the ACR image repository in the Docker environment.
(Optional) If your machine is within a VPC network, perform the following steps:
Configure the access control for the created Enterprise Edition instance to allow VPC connections. For details, see Configure a VPC ACL.
Add vpc to the domain name when using the ACR Enterprise Edition instance in the Docker environment. For example, change
acr-test-registry.cn-wulanchabu.cr.aliyuncs.com
toacr-test-registry-vpc.cn-wulanchabu.cr.aliyuncs.com
in the following command:$ docker login --username=***@test.aliyunid.com acr-test-registry.cn-wulanchabu.cr.aliyuncs.com
Step 3: Add the custom image to MaxCompute
Associate an existing image in ACR with MaxCompute for unified management of development images.
Log on to the MaxCompute console and select a region in the upper-left corner.
In the left-side navigation pane, choose Tenants > Images, and click the Custom Image tab.
In the Custom Image tab, click Create Image, and configure the following parameters in the Add Image dialog box:
NoteWhen creating an image for the first time, click OK in the MaxCompute Service-linked Role dialog box that appears. The system automatically creates a service-linked role for accessing ACR resources.
Parameter name
Description
Image Name
The custom image name. It can be used in subsequent MaxCompute SQL, PyODPS, and MaxFrame development.
Image Type
The ACR image type. Only ACR Enterprise Edition images are supported.
Enterprise Edition Image Instance
Select the Enterprise Edition image instance created in ACR.
Image Namespace
Select the Enterprise Edition image namespace created in ACR.
Image Repository
Select the Enterprise Edition image repository created in ACR.
Image Version
Select the image version you uploaded to ACR.
Image Description
Provide a description for the image being added.
Click OK. You can view the custom image in the custom image list.
Step 4: Use the custom image
You can use custom images in MaxCompute SQL UDF, PyODPS, and MaxFrame development.
Ensure that each development job specifies only one image to prevent image conflict issues.
When calling UDFs, you can specify the dependent image and Python version at the SQL session level by using flags. The command is as follows:
set odps.sql.python.version=cp37; set odps.session.image = <image_name>;
In PyODPS development, you can specify an existing image by using the image parameter of the execute or persist method. The command is as follows:
NoteIf you need to reference an image in PyODPS development, ensure PyODPS is upgraded to version V0.11.5 or above.
image='<image_name>'
In MaxFrame development, you can specify an existing image for the current job. The relevant parameters are as follows:
config.options.sql.settings = { "odps.session.image": "<image_name>" }