Platform for AI (PAI) provides Machine Learning Designer as a one-stop machine learning visualized modeling tool that is developed based on the cloud-native PAIFlow tool. Machine Learning Designer is upgraded from Machine Learning Studio. In addition, Machine Learning Designer provides multiple proven built-in machine learning algorithms and supports large-scale distributed computing based on computing resources such as MaxCompute, Flink, and general training resources. Machine Learning Designer can meet your business requirements in different scenarios, such as product recommendation, financial risk management, and advertising prediction.
Architecture
Features
You can log on to the PAI console by using an Alibaba Cloud account or as a Resource Access Management (RAM) user to use Machine Learning Designer. If you want to log on as a RAM user, you must grant the required permissions to the RAM user by using your Alibaba Cloud account. For more information, see Grant the permissions that are required to use Machine Learning Designer.
Machine Learning Designer allows you to create a pipeline from scratch or by using a template. If you create a pipeline from a template, you can directly deploy the models generated by the pipeline after the pipeline successfully runs. For information about how to create and manage pipelines, see Pipeline overview.
Machine Learning Designer provides hundreds of components that encapsulate algorithms used in AI development and supports multiple data sources such as MaxCompute tables and Object Storage Service (OSS) buckets. In Machine Learning Designer, you can use these algorithm components to create models and deploy the models to Elastic Algorithm Service (EAS).
Machine Learning Designer also allows you to manage pipeline tasks and versions, and roll back a pipeline version. For more information, see Model training.
Machine Learning Designer provides a visualized dashboard feature to help you analyze the generated data, model information, and evaluation metrics during model training to obtain an optimal model.
In Machine Learning Designer, you can share pipelines with members of the current workspace, deploy pipelines to DataWorks as periodic tasks, and publish pipelines as custom templates. For more information, see Use DataWorks tasks to schedule pipelines in Machine Learning Designer and Create a pipeline from a custom template.
In Machine Learning Designer, you can register ready-to-deploy models to the model management module to deploy them as individual model services or composite models. For more information, see Overview of model prediction.
Pipeline components
Machine Learning Designer provides hundreds of components to meet your requirements in various scenarios. For more information, see Component reference: Overview of all components.
According to their use scenario, the components can be divided into the following categories:
Traditional machine learning components
These components are used in data preprocessing, feature engineering, statistical analysis, outlier detection, recommendation, time series analysis, and network analysis.
Deep learning framework components
These components provide visual, audio, or natural language processing algorithms based on the PAI-Easy framework and other deep learning frameworks such as TensorFlow and PyTorch.
Custom algorithm components
These components include SQL Script, Python Script, and PyAlink Script. You can use these components to create custom pipelines based on your business requirements.
According to their frameworks and supported computing resources, the components can be divided into Alink-based and PAICommand-based. Both frameworks have unique features for their components :
Alink-based algorithm components (marked with a purple dot) can run on MaxCompute compute engines, Realtime Compute for Apache Flink resources, and general training resources.
Alink-based algorithm components can be packed in pipelines for deployment. For more information, see Deploy a pipeline as an online service.
Alink components can be created and run in groups. For more information, see Advanced feature: Alink components run in groups.
PAI command-based components support can be used by specifying component parameters directly or by running PAI commands in the SQL Script component, DataWorks DataStudio, or MaxCompute client.
Use Machine Learning Designer
The following figure shows the process of using Machine Learning Designer.
Before you can use Machine Learning Designer to train a model, you must create a pipeline. A pipeline can be created by using multiple methods. You can choose one based on your business requirements.
On the pipeline configuration tab of Machine Learning Designer, drag components to the canvas, configure the computing resources for the components, and connect the components to create a pipeline. Then, run the pipeline to fine-tune the trained model. After you run the pipeline, you can schedule the pipeline as a periodic task to allow for automatic update of the models generated by the pipeline.
(Optional) View visualized reports.
After you train a model, you can view the analysis reports on the visualized dashboard to check if the model works as expected.
After you train a model, you can deploy the model in the production environment to perform inference on new data.
Pipeline scheduling engine: PAIFlow
PAIFlow is the pipeline scheduling engine of Machine Learning Designer. You can schedule a pipeline by submitting the pipeline task to PAIFlow.
The Pipeline Tasks page of PAIFlow displays all pipeline tasks that are manually executed by using Machine Learning Designer and periodically scheduled by using DataWorks. For more information, see Manage pipeline tasks.