SQLML is an SQL portal provided by MaxCompute for you to use the capabilities of Machine Learning Platform for AI (PAI). At the underlying layer, MaxCompute SQLML relies on PAI to create, predict, and evaluate models. This topic describes SQLML features. It also describes the machine learning models, model prediction functions, and model evaluation functions supported by SQLML.
Description
- MaxCompute: provides SQLML, an SQL portal for you to use PAI.
- Client: SQL operation platform. You can choose DataWorks (recommended), MaxCompute SDK (Java SDK or Python SDK), MaxCompute odpscmd, or MaxCompute Studio.
- PAI: provides machine learning models.
MaxCompute SQLML helps data developers, analysts, and data scientists use SQL to create, train, and apply machine learning models. It also helps SQL practitioners use their SQL skills to implement PAI capabilities without the need to migrate data.
Usage notes
- Activate MaxCompute, DataWorks Basic, and Pay-As-You-Go (PAI Studio, DSW, and EAS).
- Prepare a dataset.
The dataset is used to train and predict models.
- Create or configure a DataWorks workspace. Set Compute Engines to MaxCompute and Machine Learning Services to PAI Studio for the workspace.
- Use DataWorks to create a table and import the data in the dataset into the table.
- Process the imported data based on the requirements of the specific model and create training datasets and test datasets. Training datasets are used to train models. Test datasets are used to predict models.
- Create a machine learning model and make predictions by using the model prediction functions provided by MaxCompute.
- Evaluate the accuracy of the prediction results by using the model evaluation functions provided by MaxCompute.
For usage examples, see Quick start.
Supported machine learning models
- Logistic regression for binary classification: The model name is logisticregression_binary. For more information, see Logistic Regression for Binary Classification.
- Logistic regression for multiclass classification: The model name is logisticregression_multi. For more information, see Logistic Regression for Multiclass Classification.
- Linear regression: The model name is linearregression. For more information, see Linear Regression.
Supported model prediction functions
ml_predict
model prediction function. Syntax:ml_predict(model <model_name>, table <data_source>[, map<string, string> <parameters>])
- model_name: required. This parameter specifies the name of the model that you want to create.
- data_source: required. This parameter specifies the data source used for the prediction, which can be a table or a SELECT statement.
- parameters: optional. This parameter specifies the parameters used for the prediction. The parameters are the same as those in PAI. For more information about the parameters, see Logistic Regression for Binary Classification, Logistic Regression for Multiclass Classification, or Linear Regression.
Supported model evaluation functions
- Binary classification evaluation: implemented by using the built-in function
ml_evaluate
. You can evaluate the model by using indexes such as area under curve (AUC), Kolmogorov-Smirnov (KS), and F1 score. Syntax:ml_evaluate(table <data_source>[, map<string, string> <parameters>])
- Multiclass classification evaluation: implemented by using the built-in function
ml_multiclass_evaluate
. You can evaluate a multiclass classification model based on its prediction and actual results. The evaluation indexes include accuracy, kappa, and F1 score. Syntax:ml_multiclass_evaluate(table <data_source>[, map<string, string> <parameters>])
- Linear regression evaluation: implemented by using the built-in function
ml_regression_evaluate
. You can evaluate a linear algorithm model based on its prediction and actual results such as the indexes and residual histogram. The evaluation indexes include SST, SSE, SSR, R2, R, MSE, RMSE, MAE, MAD, MAPE, count, yMean, and predictMean. Syntax:ml_regression_evaluate(table <data_source>[, map<string, string> <parameters>])
- data_source: required. This parameter specifies the data to be evaluated. The label results and prediction results must be included. The value can be a table or a SELECT statement.
- parameters: optional. This parameter specifies the parameters used for the prediction. The parameters are the same as those in PAI. For more information about the parameters, see Logistic Regression for Binary Classification, Logistic Regression for Multiclass Classification, or Linear Regression.