Features - Platform For AI

Platform for AI (PAI)

Module	Feature	Description	Reference
AI computing resource management	Lingjun resources	PAI provides Lingjun resources for large-scale and high-density computing. Lingjun resources provide heterogeneous computing power, which is required for high-performance AI training and computing. You can use Lingjun resources for trainings in PAI.	Lingjun resource quotas
	General training resources	General training resources are deep learning training resources based on Container Service for Kubernetes (ACK). The resources provides scalable, stable, easy-to-use, and high-performance runtimes for training deep learning models.	General computing resource quotas
	Other big data computing resources	Big data computing resources, such as MaxCompute and Realtime Compute for Apache Flink.	Overview of AI computing resources
Workspaces	Resource management	The workspace administrator can associate the AI computing resources in the current Alibaba Cloud account with the workspace to allow workspace members to use the resources for development and training.	Manage workspaces
	Workspace notification	PAI provides a notification mechanism for workspaces. You can create a notification rule to track and monitor Deep Learning Containers (DLC) jobs or Machine Learning Designer pipelines. You can also use notification rules to trigger events when the status of the model version changes.	Create a notification rule
	Workspace storage and SLS configuration	The workspace administrator can specify the default storage path for development training in the current workspace and the storage lifecycle of temporary tables.	Manage workspaces
	Member and permission management	PAI uses role-based access control that provides multiple roles, such as labeling administrators, algorithm developers, and algorithm O&M, to facilitate efficient collaboration. You can manage the visibility scope of AI assets in a workspace and manage access permissions for different roles.	Manage members of a workspace
QuickStart	Model Hub	PAI provides various pre-trained models from open-source communities, such as ModelScope and Hugging Face.	Deploy and train models
	Pre-trained model training	You can use the pre-trained models for training in PAI.	Deploy and train models
	Pre-trained model deployment	You can use the pre-trained models for deployment in PAI.	Deploy and train models
Machine Learning Designer	Pipeline building	Machine Learning Designer allows you to build and debug models by using pipelines. You can drag components to the canvas to build a pipeline based on your business requirements.	Pipeline overview
	Pipeline import and export	You can export a pipeline as a JSON file. You can also import a JSON file to a workspace to build a pipeline.	Export and import pipelines
	Pipeline scheduling	You can use DataWorks to periodically schedule pipelines in Machine Learning Designer.	Use DataWorks tasks to schedule pipelines in Machine Learning Designer
	Preset pipeline templates	PAI provides pipeline templates that for various industries, such as product recommendation, news classification, financial risk control, haze weather prediction, heart disease prediction, agricultural loan issuance, and population census. The templates are preset with complete datasets and documentation to facilitate usage.	General solutions that use Machine Learning Designer
	Custom pipeline templates	You can create a pipeline template based on algorithm workflows that you develop and share the template with your team. Your team member can directly perform modeling, deployment, and online verification based on the custom template.	Create a pipeline from a custom template
	Dashboards	Machine Learning Designer provides dashboards to help you visualize data analysis, model analysis, and model results.	Use dashboards to view analytical reports
	Preset algorithm component library	PAI provides hundreds of built-in algorithm components for various industries, such as data source, data preprocessing, feature engineering, statistical analysis, machine learning, time series, recommendation algorithms, anomaly detection, natural language processing, network analysis, finance, visual algorithms, speech algorithms, and custom algorithms.	Component reference: Overview of all components
	Custom algorithms	You can implement nodes by using multiple methods, such as SQL, Python, and PyAlink scripts.	Custom algorithm components
Data Science Workshop (DSW)	Cloud-native development environment	DSW provides a flexible, stable, easy-to-use, and high-performance environment for AI development and various CPU-accelerated and GPU-accelerated resources to facilitate training.	What is DSW?
	DSW Gallery	DSW Gallery provides easy-to-use cases from various industries and technical verticals to help improve development efficiency.	Notebook Gallery
	JupyterLab	DSW integrates open source JupyterLab and provides plug-ins for custom development. You can directly start Notebook to write, debug, and run Python code without O&M configurations.	Access a DSW instance
	WebIDE	DSW provides WebIDE in which you can install open source plug-ins for modeling.	Access a DSW instance
	Terminal	DSW supports character terminals to debug models.	Access a DSW instance
	Persistent instance environment	You can manage the lifecycle of the development environment, save the instance environment, mount and share data, and persist the environment image.	Mount datasets or OSS paths
	Resource usage monitoring	You can view real-time resource usage in a visualized manner.	Access a DSW instance
	Image creation	You can create an image and save the image to Container Registry for subsequent distributed training or inference.	Manage DSW instances
	SSH remote connection	DSW provides the following SSH connection methods: direct connection and proxy client connection. You can select a connection method based on the resource dependencies, usage methods, and limits of the connection methods to meet your business requirements.	Connect to a DSW instance over SSH
Deep Learning Containers (DLC)	Cloud-native distributed training environment	DLC is a deep learning platform developed based on Container Service for Kubernetes (ACK) that provides stable, easy-to-use, scalable, and high-performance runtimes for training deep learning models.	Before you begin
	Dataset mounting	You can mount multiple datasets, such as File Storage NAS or Object Storage Service (OSS) datasets, in DLC at the same time.	Before you begin
	Public and dedicated resource groups	DLC provides public and dedicated resource groups.	Before you begin
	Official and custom images	DLC allows you to use official images or custom images to submit training jobs.	Before you begin
	Distributed trainings	DLC provides a distributed deployment solution for implementing data parallelism, model parallelism, and hybrid parallelism.	Create a training job
	Training job management	DLC allows you to manage jobs during the entire lifecycle.	Manage training jobs
Elastic Algorithm Service (EAS)	Resource group management	EAS provides resources in resource groups for isolation. When you create a model service, you can deploy the model service in the public resource group provided by the system or a dedicated resource group that you created.	Overview of EAS resource groups
	Service and application deployment	You can deploy models that you downloaded from the open source community or models that you trained as inference services or AI-powered web applications in EAS. EAS provides multiple methods that you can use to deploy models. You can use the PAI console to deploy models as API services.	Deploy a model service in the PAI console
	Service debugging and stress testing	After you deploy the service, you can use the online debugging and stress testing feature to test whether the service runs as expected.	Service debugging and stress testing
	Auto scaling	You can configure automatic scaling, scheduled scaling, and elastic resource pools for EAS services.	Service Auto Scaling
	Service calls	EAS provides the following service call methods based on the network environment of the client: Internet access, VPC access, and VPC direct connection.	Service calls
	Asynchronous inference	EAS provides the asynchronous inference feature, which allows you to obtain inference results by subscribing to requests or polling.	Asynchronous inference services
	Integrated resource group and service management capabilities	EAS provides standard OpenAPI and SDKs that support integration.	List of operations by function
AI computing asset management	Datasets	PAI provides public datasets and supports dataset management during labeling and modeling. PAI also support OSS and NAS datasets and SDK calls.	Create and manage datasets
	Models	PAI allows you to manage versions, lineages, evaluation metrics, and associated services of models in a centralized manner.	Register and manage models
	Tasks	PAI supports management of distributed training tasks and PAIFlow pipeline runs.	Job management
	Images	PAI provides official images and supports image management.	View and add images
	Code builds	You can register code repositories to PAI to facilitate code version management in PAI modules.	Code builds
	Custom components	You can create custom algorithm components based on your business requirements. You can use custom components together with preset components in Machine Learning Designer to manage pipelines in a flexible manner.	-
AutoML	Automatic hyperparameter optimization (HPO)	HPO is used to automatically fine-tune model-related parameters and training parameters.	How AutoML works
Scenario-based solutions	Multimedia analysis	PAI provides ready-to-use image-related services such as image labeling, classification, and quality evaluation.	Overview of multimedia analysis
AI acceleration	Dataset Accelerator	DatasetAcc is a PaaS service developed by Alibaba Cloud to accelerate AI and datasets in the cloud. DatasetAcc provides dataset acceleration solutions for various cloud-native training engines by pre-analyzing and preprocessing training datasets used in machine learning training. This helps improve the overall training efficiency.	-
	Easy Parallel Library (EPL)	EPL is an efficient and easy-to-use framework for distributed model training. EPL uses multiple training optimization technologies and provides easy-to-use API operations that allow you to use parallelism strategies. You can use EPL to reduce costs and improve the efficiency of distributed model training.	Use EPL to accelerate AI model training
	PAI-Rapidformer	PAI-Rapidformer applies various technologies to optimize the training of PyTorch transformers and provide optimal training performance.	Pai-Megatron-Patch overview
	Blade	Blade integrates various optimization technologies. You can use PAI-Blade to optimize the inference performance of a trained model.	Overview of Blade
PAI-SDK	Distributed model training	PAI SDK for Python provides an easy-to-use HighLevel API that allows you to submit training jobs to PAI and run the jobs in the cloud.	Submit a training job
PAI-SDK	Service deployment	PAI SDK for Python provides an easy-to-use HighLevel API that allows you to deploy models to PAI and create inference services.	Deploy an inference service