This topic describes the syntax and parameters for model creation, upload, evaluation, and inference, and explains these operations with examples.
Overview
Built-in model
PolarDB for AI provides a built-in machine learning model named Tongyi Qianwen. You can use the PolarDB for AI feature provided by the model without the need for additional model configurations.
For more information, see the following topic:
Custom models
In actual business scenarios, you may need to optimize the algorithms and adjust the structures of models. A built-in model may not meet actual business requirements. PolarDB for AI allows you to build a custom model and run the model in confidential containers. This further ensures the data security of your model.
To build and use a custom model, do the following steps:
Externally trained models
The MLOps capabilities provided by PolarDB for AI allow you to upload, deploy, and use externally trained models. A wide array of external models are supported, such as sklearn-based LightGBM and GBDT and deep learning algorithms based on the TensorFlow or PyTorch framework.
To use an external model, do the following steps:
Create a model
This section describes how to create a model. A model is trained in an asynchronous manner. To check whether a model has been trained, query the status of the model.
Syntax
CREATE MODEL model_name WITH ( model_class = '', x_cols = '', y_cols='',model_parameter=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)
Parameters
Parameter | Description |
model_name | The name of the model. |
model_class | The type of the model. Valid values:
|
x_cols | The input columns for model training. Note Separate the names of multiple columns with commas (,). |
y_cols | The output columns for model training. |
model_parameter | The parameters for model training.
|
select_expr | The name of the column. |
table_reference | The name of the table. |
Examples
Create a model named airlines_gbm
.
/*polar4ai*/CREATE MODEL airlines_gbm WITH (model_class='lightgbm', x_cols ='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay',model_parameter=(boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)) as (SELECT * FROM db4ai.airlines_train)
Sample result:
Query OK, 0 rows affected (0.79 sec)
Upload a model
This section describes how to upload externally trained machine learning models to PolarDB for AI and use the platform to manage models. PolarDB for AI supports the PMML, ONNX, and Checkpoint (applicable to TensorFlow and PyTorch platforms) model formats.
Syntax
UPLOAD MODEL model_name WITH (model_location = '', req_location = '')
Parameters
Parameter | Description |
model_name | The name of the model. |
model_location | The file path of the model. |
req_location | The path of the file on which the model depends. |
Examples
Upload a model named my_model
to the PolarDB for AI platform.
/*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx');
Sample result:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to query the status of the model.
/*polar4ai*/ SHOW MODEL my_model;
Sample result:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
If the value of the modelStatus
parameter is saved
, the model has been uploaded.
Deploy a model
This section describes how to deploy an uploaded model on the PolarDB for AI platform. After the deployment is complete, the model is in the online state. A model in the online state can be quickly called for inferences.
Syntax
DEPLOY MODEL model_name
Parameters
Parameter | Description |
model_name | The name of the model. |
Examples
Deploy a model named my_model
on PolarDB for AI.
/*polar4ai*/ DEPLOY MODEL my_model;
Sample result:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to view the status of the model:
/*polar4ai*/ SHOW MODEL my_model;
Sample result:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| serving | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
If the value of the modelStatus
parameter is serving
, the model is deployed.
View a model
View a model list
This section describes how to view all models in the current cluster.
Syntax
SHOW MODELS
Examples
/*polar4ai*/SHOW MODELS
Sample result:
+-----------------------+-----------------+--------------+
| model_name | model_class | model_status |
+-----------------------+-----------------+--------------+
| airline_rfr | randomforestreg | saved |
| gbrt1 | gbrt | saved |
| airline_deepfm | deepfm | saved |
| airlines_gbm | lightgbm | saved |
| lgbm1 | lightgbm | saved |
| blackfriday_linearreg | linearreg | saved |
+-----------------------+-----------------+--------------+
6 rows in set (0.24 sec)
View the status of a model
You can view the current status of a model. Use the following syntax to check whether the model is trained. A model can be in one of the following states:
training: The model is being trained.
loading_data: Data is being loaded to the model.
trained: The model is trained.
saved: The model is saved.
serving: The model is in service.
deleted: The model is deleted.
Syntax
SHOW MODEL model_name
Parameters
Parameter | Description |
model_name | The name of the model. |
Examples
Query the status of the airlines_gbm
model.
/*polar4ai*/SHOW MODEL airlines_gbm;
Sample result:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| modelStatus | modelPath |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
| saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx |
+-------------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
The validity period of a modelPath
is 100 minutes. To determine whether the link is still valid, check the value of the Expires
parameter in the modelPath
. Make sure that you access the link within the validity period.
View model details
Syntax
DESCRIBE MODEL model_name
Parameters
Parameter | Description |
model_name | The name of the model. |
Examples
Query the details of the airlines_gbm
model.
/*polar4ai*/DESCRIBE MODEL airlines_gbm;
Sample result:
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| model_name | model_description |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| airlines_gbm | basic information:model_name:airlines_gbm model_class:lightgbm feature important:features,imp_gain,imp_splitAirline,0.3327,0.0376 AirportFrom,0.2178,0.1842 Time,0.1893,0.1999 AirportTo,0.1668,0.187 DayOfWeek,0.0384,0.1236 Length,0.0307,0.1269 Flight,0.0242,0.1408 |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.65 sec)
Evaluate a model
This section describes how to evaluate a trained model.
Syntax
SELECT select_expr [, select_expr] ... FROM EVALUATE (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference ) WITH (x_cols = '', y_cols='', metrics='')
Parameters
Parameter | Description |
select_expr | The name of the column displayed in the model evaluation result. |
model_name | The name of the model. |
select_expr_for_prediction | The name of the column used for model evaluation. |
table_reference | The name of the table. |
x_cols | The input columns for model training. Separate the names of multiple columns with commas (,). |
y_cols | The output columns for model training. |
metrics | The metrics for model evaluation. Valid values:
|
Examples
Evaluate a trained model named
airlines_gbm
./*polar4ai*/SELECT Delay FROM evaluate(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', metrics='acc');
Sample result:
+--------------------------------------+ | task_id | +--------------------------------------+ | df05244e-21f7-11ed-be66-xxxxxxxxxxxx | +--------------------------------------+ 1 row in set (0.95 sec)
Execute the following statement to view the status of the task:
/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`;
Sample result:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | taskStatus | filePath | results | startTime | endTime | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ 1 row in set (0.24 sec)
NoteThe validity period of a
filePath
is 100 minutes. To check whether the link is still valid, view the value of theExpires
parameter in thefilePath
. Make sure that you access the link within the validity period.
Use a model for inference
This section describes how to use a trained model for online or offline inference.
Syntax
SELECT select_expr [, select_expr] ... FROM PREDICT (MODEL model_name, SELECT select_expr_for_prediction [, select_expr_for_prediction] ... FROM table_reference LIMIT row_count) WITH (x_cols= '')
Parameters
Parameter | Description |
select_expr | The name of the column displayed in the model inference result. |
model_name | The name of the model. |
select_expr_for_prediction | The name of the column used for model inference. |
table_reference | The name of the table. |
mode | The inference mode. The value async specifies offline inference. If you leave this parameter empty, online inference is used. |
row_count | The number of samples used for model inference. |
x_cols | The input columns for model training. Separate the names of multiple columns with commas (,). |
Examples
Online inference
Use the
airlines_gbm
model for online inference./*polar4ai*/SELECT Delay FROM PREDICT(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test LIMIT 10) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay');
Sample result:
+-------+-------------------+ | Delay | predicted_results | +-------+-------------------+ | 1 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 0 | 0 | | 1 | 0 | | 0 | 0 | | 0 | 0 | | 1 | 0 | +-------+-------------------+ 10 rows in set (0.74 sec)
Offline inference
If the number of inference samples is greater than 1,000, PolarDB for AI prompts you to use an offline inference task.
ERROR 9050 (HY000): Please limit the SQL selected data length to less than '1000' or convert to offline prediction
Create an offline inference task.
/*polar4ai*/SELECT Delay FROM predict(MODEL airlines_gbm, SELECT * FROM db4ai.airlines_test) WITH (x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', mode='async');
Sample result:
+--------------------------------------+ | task_id | +--------------------------------------+ | bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | +--------------------------------------+ 1 row in set (0.75 sec)
The preceding query result shows
task_id
, which can be used to query the task status and the download URL of the corresponding result./*polar4ai*/SHOW TASK `bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx`
Sample result:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | taskStatus | filePath | results | startTime | endTime | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ | finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 | +------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+ 1 row in set (0.24 sec)
NoteThe validity period of a
filePath
is 100 minutes. To check whether the link is still valid, view the value of theExpires
parameter in thefilePath
. Make sure that you access the link within the validity period.
Manage offline evaluation and inference tasks
View the list of offline tasks
This section describes how to view all offline tasks in the current cluster.
Offline tasks are batch processing tasks that run for a long period of time and do not generate results in real time. Offline tasks include offline evaluation and offline inference tasks.
Syntax
SHOW TASKS
Examples
Execute the following statement to view all offline tasks in the current cluster:
/*polar4ai*/SHOW TASKS;
Sample result:
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| task_id | task_type | task_status | start_timestr | end_timestr |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
| 2cba0c74-1f8f-11ed-934a-xxxxxxxxxxxx | prediction | finish | 2022-08-19 15:18:51.206829 | |
| 77b3a186-1f94-11ed-8eaa-xxxxxxxxxxxx | evaluation | finish | 2022-08-19 15:56:44.465594 | |
| 972547a4-1fa3-11ed-9c6b-xxxxxxxxxxxx | evaluation | finish | 2022-08-19 17:44:59.790353 | 2022-08-19 17:45:23.750100 |
| bd0c1722-21e7-11ed-94a8-xxxxxxxxxxxx | prediction | finish | 2022-08-22 14:57:51.355240 | 2022-08-22 14:58:18.316818 |
| df05244e-21f7-11ed-be66-xxxxxxxxxxxx | evaluation | finish | 2022-08-22 16:53:20.381577 | 2022-08-22 16:53:37.189953 |
| ec956db8-21fb-11ed-8400-xxxxxxxxxxxx | evaluation | finish | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+--------------------------------------+------------+-------------+----------------------------+----------------------------+
9 rows in set (0.18 sec)
View the status of an offline task
This section describes how to view the status of an offline task. A task may be in one of the following states:
init: The task is being initialized.
running: The task is being executed.
finish: The task is complete.
fail: The task failed.
Syntax
SHOW TASK `task_id`
Parameters
Parameter | Description |
task_id | The ID of the task. |
Examples
Query the status of the task whose task ID is df05244e-21f7-11ed-be66-xxxxxxxxxxxx
.
/*polar4ai*/SHOW TASK `df05244e-21f7-11ed-be66-xxxxxxxxxxxx`
Sample result:
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| taskStatus | filePath | results | startTime | endTime |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
| finish | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/pc-xxxxxxx/airlines_gbm/xxxxx.csv?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | {"acc": 0.6694} | 2022-08-22 17:22:21.122101 | 2022-08-22 17:22:39.428811 |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------------+----------------------------+
1 row in set (0.24 sec)
The validity period of a filePath
is 100 minutes. To check whether the link is still valid, view the value of the Expires
parameter in the filePath
. Make sure that you access the link within the validity period.
Delete a model
This section describes how to delete a model that is no longer required.
Syntax
DROP MODEL model_name
Parameters
Parameter | Description |
model_name | The name of the model. |
Examples
Delete the airlines_gbm
model.
/*polar4ai*/DROP MODEL airlines_gbm
If a result similar to the following is returned, the airlines_gbm
model is deleted:
Query OK, 0 rows affected (0.57 sec)
Best practices for custom model management
This section walks you through custom model management. Offline model training and online inference are performed in the following examples.
Preform offline training.
The following script uses the LightGBM algorithm for offline training:
# coding: utf-8 from pathlib import Path import pandas as pd from sklearn.metrics import mean_squared_error import lightgbm as lgb import joblib def train_model(): print('Loading data...') # load or create your dataset df_train = pd.read_csv('regression.train', header=None, sep='\t') df_test = pd.read_csv('regression.test', header=None, sep='\t') y_train = df_train[0] y_test = df_test[0] X_train = df_train.drop(0, axis=1) X_test = df_test.drop(0, axis=1) # create dataset for lightgbm lgb_train = lgb.Dataset(X_train, y_train) lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train) # specify your configurations as a dict params = { 'boosting_type': 'gbdt', 'objective': 'regression', 'metric': {'l2', 'l1'}, 'num_leaves': 31, 'learning_rate': 0.05, 'feature_fraction': 0.9, 'bagging_fraction': 0.8, 'bagging_freq': 5, 'verbose': 0 } print('Starting training...') # train gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, callbacks=[lgb.early_stopping(stopping_rounds=5)]) print('Saving model...') # save model to file # gbm.save_model('model.txt') joblib.dump(gbm, 'lgb.pkl') print('Starting predicting...') # predict y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) # eval rmse_test = mean_squared_error(y_test, y_pred) ** 0.5 print(f'The RMSE of prediction is: {rmse_test}')
In the preceding script, the model file is exported in the
pkl
format. Thepredict
method is called to query the inference result. The python files on which the model depends are also required.The following example shows the content of the
requirements.txt
file:lightgbm==3.3.3
Upload a model.
Execute the following statement to upload the model to PolarDB for AI:
/*polar4ai*/UPLOAD MODEL my_model WITH (model_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx', req_location='https://xxxx.oss-cn-hangzhou.aliyuncs.com/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx')
The
model_location
parameter specifies the path of the model file. Thereq_location
parameter specifies the path of the file on which the model depends. You can prepare the two files in advance, upload the files to your Object Storage Service (OSS) bucket, and then execute the preceding statement to upload the files to PolarDB for AI.Sample result:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to query the status of the model.
/*polar4ai*/ SHOW my_model;
Sample result:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+ | modelStatus | modelPath | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ | saved | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.23 sec)
If the value of the
modelStatus
parameter issaved
, the model is uploaded.
Deploy a model.
Execute the following statement to deploy the model on PolarDB for AI:
/*polar4ai*/ DEPLOY MODEL my_model;
Sample result:
Query OK, 0 rows affected (0.29 sec)
Execute the following statement to query the status of the model.
/*polar4ai*/ SHOW MODEL my_model;
Sample result:
+-------------+-----------------------------------------------------------------------------------------------------------------------------+ | modelStatus | modelPath | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ | serving | http://db4ai-collie-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/xxxxx.pkl?OSSAccessKeyId=xxxxxx&Expires=xxxx&Signature=xxxxxx | +-------------+-----------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.23 sec)
If the value of the
modelStatus
parameter isserving
, the model is deployed.
Implement online inference.
Execute the following statement to use the model to perform an online inference task:
/*polar4ai*/ SELECT Y FROM PREDICT(MODEL my_model, SELECT * FROM db4ai.regression_test LIMIT 10) WITH (x_cols = 'x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28', y_cols='');
Sample result:
+------+---------------------+ | Y | predicted_results | +------+---------------------+ | 1.0 | 0.6262147669037363 | | 0.0 | 0.5082804008241021 | | 0.0 | 0.37533158372209957 | | 1.0 | 0.461974928099089 | | 0.0 | 0.3777339456553666 | | 0.0 | 0.35045096227525735 | | 0.0 | 0.4178165504012342 | | 1.0 | 0.40869795422774036 | | 1.0 | 0.6826481286570045 | | 0.0 | 0.47021259543154736 | +------+---------------------+ 10 rows in set (0.95 sec)