Alibaba Cloud Platform for AI (PAI) provides official images based on different frameworks and CUDA versions. When you use DLC, EAS, or DSW, you can select a suitable official image to quickly build your AI development environment. This topic describes the capabilities of PAI's preconfigured official images and provides a list of core images.
Understand official images
Alibaba Cloud PAI official images follow a consistent naming convention, which lets you identify basic image information from the image name. Official image names typically include the following fixed fields. You should also use this naming convention when you create custom images.
Official image name example | Image name breakdown | Image types supported by each product |
|
| Check the "Supported sub-products" tag in the official image list to confirm product compatibility. |
|
|
DSW/DLC official images
PAI provides DSW/DLC official images based on various machine learning frameworks. You can view the complete list of official images on the AI Assets - Images page in the PAI console.
Python
Overview
Python is a simple yet powerful high-level programming language widely used in machine learning for data processing, model development, and training. It offers a rich set of libraries, such as NumPy, PyTorch, and TensorFlow, and provides high development efficiency. PAI provides two types of Python images:
CPU images: Built on the official Ubuntu base image for CPU computing.
GPU images: Built on the official CUDA base image for GPU computing.
Key features
Supports Ubuntu 22.04 and Ubuntu 24.04.
Supports Alibaba Cloud’s high-performance RDMA networking.
Supports Python versions 3.10 through 3.14.
Supports CUDA versions 12.4 through 13.0.
Includes common development tools such as curl, git, wget, rclone, and ping.
Uses Alibaba Cloud mirrors for pip and apt.
PyTorch
Overview
PAI provides two types of PyTorch images:
Built on PAI Python images with PyTorch, TorchVision, and TorchAudio packages pre-installed. These images inherit all the features of the Python images and cover official PyTorch releases from version 2.4.0 onward.
Built on NVIDIA NGC PyTorch images, with common development tools pre-installed and Alibaba Cloud mirrors configured for pip and apt.
Tag descriptions
-accl:
These images are pre-installed with the Alibaba Cloud High-Performance Collective Communication Library (ACCL). ACCL delivers higher communication performance than NCCL.
When developing or training with ACCL-based images, you should use the preconfigured Python environment. If you want to use a Python virtual environment, follow the installation guide to configure ACCL in your environment.
-ngc:
These images are built from NVIDIA NGC PyTorch images. The tag includes the NGC version. For example,
2.10.0-gpu-py312-cu130-ubuntu24.04-ngc25.11is based on NGC PyTorch 25.11.For more information about the features of NGC PyTorch images, see the NVIDIA official documentation.
Data-Juicer
Overview
Data-Juicer is a distributed framework for data cleaning and preprocessing. Built on the distributed capabilities of Ray, it enhances data quality for LLM training and supports multimodal data fusion. PAI provides two Data-Juicer image types to help you quickly build data pipelines and run distributed jobs in CPU or GPU environments. These images include built-in data processors, quality assessment tools, and visual analytics capabilities.
CPU images: Built on PAI’s CPU base image for large-scale CPU-only tasks such as text processing and data cleaning.
GPU images: Built on PAI’s CUDA base image for GPU-accelerated tasks such as model inference and quality scoring.
Key features
Based on Ubuntu 22.04.
Supports Alibaba Cloud RDMA for high-throughput, low-latency distributed data loading and processing.
Includes a full Data-Juicer runtime environment with built-in processors. You can start data processing tasks quickly and monitor them using the Ray Dashboard.
Supports CPU/GPU heterogeneous resource scheduling for diverse workloads such as data cleaning, quality evaluation, and multimodal data generation.
Uses Alibaba Cloud mirrors by default for pip and apt to accelerate dependency installation and improve stability.
Responsible-AI-Develop
Overview
Responsible AI encompasses core principles and practices applied throughout the AI model lifecycle to ensure safety, reliability, fairness, transparency, and compliance during development, training, fine-tuning, evaluation, and deployment. It helps enterprises build trustworthy AI systems, mitigate risks, and build user trust. PAI provides two base images to support Responsible AI practices:
CPU images: Built on the official Ubuntu image for general CPU computing, with Responsible AI toolchains integrated.
GPU images: Built on the official CUDA image for high-performance GPU scenarios, with Responsible AI toolchains integrated.
Key features
Supports Ubuntu 22.04 images.
Supports Python versions 3.11 through 3.14.
Supports CUDA 11.8.
Includes Responsible AI visual analytics tools with an interactive dashboard for multidimensional analysis, such as model fairness and error analysis, to help developers identify potential bias and errors.
Supports differential privacy training by injecting controlled noise during model training to prevent sensitive data leakage and meet compliance and privacy requirements.
Includes the RAI model encryption SDK (RAI_SAM_SDK) for sharded encrypted storage of LLMs and authorized decryption during inference.
Ray
Overview
Ray is a high-performance framework for distributed computing, widely used for large-scale machine learning training, hyperparameter tuning, reinforcement learning, and online inference. PAI provides two Ray image types to help you quickly set up Ray clusters and run distributed jobs in CPU or GPU environments. These images install Ray dependencies using ray[default], which includes the Ray Dashboard and common runtime components.
CPU images: Built on PAI’s CPU base image for CPU-only distributed computing and data processing.
GPU images: Built on PAI’s CUDA base image for GPU-accelerated training, inference, and large-scale parallel computing.
Key features
Based on Ubuntu 22.04 and Ubuntu 24.04.
Supports Alibaba Cloud RDMA for high-throughput, low-latency distributed communication.
Includes a full Ray runtime environment with common components that allow you to quickly start Ray Head and Worker nodes and run tasks.
Supports CPU/GPU heterogeneous resource scheduling for diverse workloads such as training, data processing, and inference.
Uses Alibaba Cloud mirrors by default for pip and apt to accelerate dependency installation and improve stability.
ModelScope
Overview
The ModelScope Library supports model and dataset management and enables model training and inference using deep learning frameworks such as PyTorch and TensorFlow. It has been tested and runs on Python 3.8+, PyTorch 1.11+, and TensorFlow. ModelScope provides official images that allow you to skip the environment setup and start using it immediately. For more information, see ModelScope official images.
TorchEasyRec
Overview
TorchEasyRec is an easy-to-use deep learning framework for recommendation systems. It covers common scenarios such as matching (recall), ranking, multitask learning, and generative recommendations. With simple configuration and flexible customization, it accelerates the development and deployment of high-performance recommendation models.
PAI provides official TorchEasyRec images pre-installed with dependencies such as pytorch, torchrec, fbgemm, and tensort. Two image types are available:
GPU version: Built on Ubuntu 22.04 with CUDA acceleration for high-performance large-scale recommendation model training (recommended).
CPU version: Built on Ubuntu 22.04 for development, debugging, and small-scale training (Note: Some operations are GPU-only).
TensorFlow
Framework version | CUDA version (GPU instances only) | Operating system |
|
|
|
DeepRec
Framework version | CUDA version (GPU instances only) | Operating system |
| CUDA 11.4 | Ubuntu 18.04 |
XGBoost
Framework version | CUDA version (GPU instances only) | Operating system |
XGBoost 1.6.0 | Not applicable; CPU instances only | Ubuntu 18.04 |
EAS Official Image
PAI provides EAS official images based on various machine learning frameworks. You can view the complete list of official images on the AI Assets - Images page in the PAI console.
TritonServer
Overview
Triton Inference Server (also known as Triton Server) is a high-performance inference server developed by NVIDIA to simplify and accelerate machine learning model deployment and inference. It supports multiple deep learning frameworks, such as TensorFlow, PyTorch, and ONNX Runtime, and provides a consistent interface for handling different models and data types.
Key features
Multi-framework support: Triton Server supports various deep learning frameworks and model formats, enabling unified deployment of diverse models.
High throughput and low latency: Triton improves inference performance through batching and parallel inference. It also leverages NVIDIA GPU acceleration to maximize compute power.
Dynamic model management: Triton allows dynamic loading and unloading of models, which enables flexible version control, A/B testing, and model updates.
Simple APIs and scalability: Triton offers REST and gRPC interfaces for easy integration. It also integrates seamlessly with container orchestration systems such as Kubernetes for large-scale inference deployments.
Heterogeneous hardware support: In addition to NVIDIA GPUs, Triton runs on CPUs and other accelerators, which supports deployment across diverse hardware platforms.
Custom post-processing: Users can apply custom logic to inference results to meet specific application needs.
ComfyUI
Overview
ComfyUI is a node-based graphical user interface designed for running and customizing diffusion models such as Stable Diffusion. It uses visual workflows that allow users to drag and drop components to build image generation pipelines without writing code, while supporting highly modular and reusable prompt engineering and model combinations.
Key features
Node-based workflow: Breaks down steps such as text encoding, sampling, model loading, and image post-processing into independent nodes that users can freely connect for precise control.
Efficient resource management: Loads only the models needed for the current workflow, which reduces VRAM usage and supports batch generation and complex pipeline optimization.
Highly extensible: Supports custom node plugins with a rich community ecosystem, such as ControlNet, LoRA, and Upscale, for easy integration of new models or features.
Workflow export and sharing: Entire generation workflows can be exported as JSON files for reproducibility, collaboration, or deployment to other environments.
PAI-RAG
Overview
PAI-RAG is an enterprise-grade retrieval-augmented generation (RAG) conversational system solution from PAI. Built on PAI-EAS, it delivers out-of-the-box RAG capabilities. PAI-RAG deeply integrates large language models (LLMs) with knowledge retrieval technology to enable rapid deployment of private knowledge Q&A and intelligent customer service applications. It also provides an open-source modular framework (GitHub: aigc-apps/PAI-RAG) for flexible customization.
Key features
Multiple vector database support: Natively compatible with Elasticsearch, Hologres, Tablestore, Milvus, and other mainstream vector databases to meet diverse enterprise needs.
Web search enhancement: Supports real-time web retrieval to overcome the timeliness limitations of model pretraining data and improve answer accuracy and freshness.
Flexible deployment and integration: Offers a WebUI, RESTful API, and OpenAI-compatible interface for quick integration into existing business systems.
End-to-end knowledge base management: Supports document upload and management using the WebUI or OSS, with one-stop capabilities for chunking, vectorization, version updates, and knowledge base operations.
vLLM
Overview
vLLM is an open-source large language model (LLM) inference and serving engine designed for the efficient deployment and execution of various open-source LLMs. Using advanced memory management and scheduling techniques, it significantly boosts throughput while maintaining low latency, making it a leading LLM inference framework.
Key features
PagedAttention: A core innovation inspired by OS paging mechanisms to dynamically manage KV Cache, eliminate VRAM fragmentation, and significantly improve VRAM utilization efficiency.
Continuous batching: Dynamically merges requests of varying lengths for parallel decoding, which greatly improves GPU utilization and throughput.
High throughput, low latency: Supports higher concurrency on the same hardware and is ideal for high-traffic production environments.
Developer-friendly: Provides a simple Python API and an OpenAI-compatible interface for rapid integration into existing applications.
Rich ecosystem: Natively supports advanced features such as LoRA fine-tuning inference, multimodal models, and tool calling (Function Calling).
EasyAnimate
Overview
EasyAnimate is an end-to-end high-definition long video generation framework developed by PAI based on the Diffusion Transformer (DiT) architecture. It supports the rapid generation of high-quality videos from text or images (text-to-video/image-to-video) and provides a complete solution that covers data preprocessing, VAE training, and DiT inference.
Key features
High-resolution long video generation: Generates coherent videos up to 1024×1024 resolution and 6 seconds or longer.
Multimodal input: Supports both text prompts (text-to-video) and image inputs (image-to-video) for dynamic video generation.
Complete training pipeline: Offers end-to-end training capabilities for VAE, DiT base models, and LoRA fine-tuning to support customized development.
Production-ready deployment: Officially supported by PAI inference services for seamless integration into cloud inference platforms and is suitable for production environments.
Kohya
Overview
Kohya is an ecosystem of tools derived from Stable Diffusion fine-tuning scripts. Its Gradio-based graphical interface greatly lowers the barrier to using techniques such as LoRA and DreamBooth for model fine-tuning.
Key features
Multiple training methods: Natively supports LoRA, DreamBooth, full-parameter fine-tuning, and SDXL model training.
Graphical interface: Provides an intuitive Web UI (based on Gradio) where users can configure parameters using forms instead of command-line coding.
Cross-platform compatibility: Primarily designed for Windows but also supports Linux and macOS.
End-to-end toolchain: Integrates data preprocessing, auto-captioning, training monitoring, and model export to cover the full fine-tuning lifecycle.
Open source and active community: Fully open-source with continuous community maintenance and compatibility with mainstream inference frameworks, such as Stable Diffusion WebUI, for direct deployment of trained models.
Stable-Diffusion-WebUI
Overview
Stable-Diffusion-WebUI is an open-source graphical interface for the local deployment and execution of Stable Diffusion models. It significantly lowers the barrier to using generative AI for tasks such as text-to-image and image-to-image generation.
Key features
Multimodal generation: Supports mainstream modes such as text-to-image (txt2img), image-to-image (img2img), inpainting, and outpainting.
Rich extension ecosystem: The built-in plugin system supports popular extensions such as ControlNet, LoRA, and T2I-Adapter to enhance generation control.
Integrated training and fine-tuning: Includes DreamBooth, LoRA, and Textual Inversion for custom model fine-tuning.
Cross-platform deployment: Runs on Windows, Linux, macOS, and Google Colab, and supports both CPU and GPU (NVIDIA/AMD) hardware.
User-friendly: The web interface built with Gradio offers visual parameter configuration and is suitable for users from beginners to professionals.
CosyVoice-frontend/CosyVoice-backend
CosyVoice is a next-generation high-fidelity speech synthesis model with voice cloning capabilities. It can clone a target voice from a prompt audio clip in under 30 seconds and supports cross-lingual voice replication. It is suitable for scenarios such as customer service dialogues, audiobook narration, and short video dubbing. The frontend/backend split version delivers high performance. The backend instances handle 80% of the total compute load. Using lossless acceleration technology, one backend instance can serve traffic from eight frontend instances. This increases throughput and reduces latency by 25%.
CosyVoice-WebUI
Overview
CosyVoice is a next-generation high-fidelity speech synthesis model with voice cloning capabilities. It can clone a target voice from a prompt audio clip in under 30 seconds and supports cross-lingual voice replication. It is suitable for scenarios such as customer service dialogues, audiobook narration, and short video dubbing. PAI-EAS packages this model with an integrated visual WebUI for the rapid deployment of cloud-based speech inference services.
Key features
Zero-shot voice cloning: Replicates target voices from just 3 to 10 seconds of reference audio for personalized speech generation.
Multilingual and cross-lingual synthesis: Supports Chinese, English, Japanese, Korean, and other languages while it maintains voice consistency across languages.
Emotion and fine-grained control: Precisely controls vocal details such as emotion, laughter, and breathing through natural language descriptions.
Highly human-like: Matches human speech in intonation, rhythm, and pauses, and significantly outperforms traditional TTS technologies.
Real-time streaming synthesis: Supports low-latency streaming text-to-speech output for real-time interactive scenarios.
Full-stack toolchain: Provides complete capabilities from inference and training to deployment for industrial-grade application integration.
SGLang
Overview
SGLang (Structured Generation Language) is a high-performance large language model inference and serving framework. It uses a co-designed "frontend language + backend runtime" architecture. The frontend provides a structured generation programming language for controllable output logic, while the backend is an optimized inference engine (SGLang Runtime) that delivers low-latency, high-throughput model serving.
Key features
Structured controllable generation: Natively supports precise output format control using JSON Schema, regular expressions, and other constraints to overcome the limitations of traditional prompt engineering.
High-performance inference: Uses innovative optimizations such as RadixAttention and Radix Cache to achieve 3 to 5 times higher throughput than mainstream solutions.
Multimodal support: Works with both text-only LLMs and vision-language models (VLMs), and supports multimodal inputs such as images and video.
Flexible integration: Offers a simple Python API that can replace the OpenAI API for complex prompt workflows, which lowers development barriers.
TensorFlow-Serving
Overview
TensorFlow Serving is a high-performance open-source machine learning model serving system. As a core component of the TensorFlow Extended (TFX) ecosystem, it rapidly deploys trained TensorFlow models in the SavedModel format as online inference services and exposes them using gRPC and RESTful APIs.
Key features
Model version management: Supports parallel loading of multiple model versions and seamless hot updates for phased releases and rollbacks.
High-performance inference: Production-optimized scheduling and batching mechanisms ensure low-latency, high-throughput service.
Out-of-the-box integration: Natively supports the TensorFlow SavedModel format without requiring additional conversion.
Extensible architecture: Offers pluggable components such as Servable, Source, and Manager for custom loading logic and serving policies.
Multi-protocol support: Provides both gRPC (high performance) and HTTP/REST (easy integration) interfaces to accommodate different client needs.
Core image list
Images for Lingjun resources (Serverless)
Image name | Framework | Instance type | CUDA | Operating system | Region | Language & version |
deepspeed-training:23.06-gpu-py310-cu121-ubuntu22.04 |
| GPU | 12.1 | ubuntu 22.04 | China (Ulanqab) | Python 3.10 |
megatron-training:23.06-gpu-py310-cu121-ubuntu22.04 |
| GPU | 12.1 | ubuntu 22.04 | China (Ulanqab) | Python 3.10 |
nemo-training:23.06-gpu-py310-cu121-ubuntu22.04 |
| GPU | 12.1 | ubuntu 22.04 | China (Ulanqab) | Python 3.10 |
AIGC images
Image name | Framework | Instance type | CUDA | Operating system | Supported regions | Language & version |
stable-diffusion-webui:4.2 | StableDiffusionWebUI 4.2 | GPU | 12.4 | ubuntu 22.04 |
| Python 3.10 |
stable-diffusion-webui:4.1 | StableDiffusionWebUI 4.1 | GPU | 12.4 | ubuntu 22.04 | Python 3.10 |