This topic describes how to use SQL to quickly deploy a behavior sequence transformer (BST) model in AnalyticDB for MySQL to execute machine learning jobs, including data processing and model creation, training, evaluation, and prediction.
Scenarios
You can use SQL queries to quickly deploy models to implement data preprocessing, training, and inference.
AnalyticDB for MySQL supports BST models. The BST models are suitable for scenarios in which user behavior patterns and preferences are analyzed to predict future trends and provide personalized recommendations. For example, in the gaming or e-commerce industry, long-term dependencies between user behaviors can be captured to accurately understand and predict user behavior patterns and preferences to provide personalized services and recommendations.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
The minor version of the AnalyticDB for MySQL cluster is 3.2.4.0 or later.
NoteTo query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.The AI resource group feature is enabled. The AI resource group feature is in public preview. To enable the feature, contact technical support.
Step 1: Create an AI resource group and associate a general resource group with the AI resource group
If you want to use SQL to execute machine learning jobs, the following types of resource groups are required:
AI resource group: manages the GPU resources required for machine learning jobs to support compute-intensive operations in model training and prediction.
General resource group: processes regular SQL queries, such as generating training data or executing prediction functions.
SQL statements are first submitted to a general resource group for execution. If AI computing is required to execute the SQL statements, the statements are automatically resubmitted to the AI resource group associated with the general resource group.
Perform the following steps to create an AI resource group and associate a general resource group with the AI resource group:
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Clusters page, click an edition tab. Find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Cluster Management > Resource Management. On the Resource Management page, click the Resource Groups tab.
In the upper-right corner of the Resource Groups tab, click Create Resource Group. In the Create Resource Group panel, configure the parameters that are described in the following table to create an AI resource group.
Parameter
Description
Resource Group Name
The name of the resource group. The resource group name must be 2 to 30 characters in length and can contain letters, digits, and underscores (_). The resource group name must start with a letter.
Job Type
The job type. Select AI from the drop-down list.
ImportantIf the AI resource group feature is disabled for your AnalyticDB for MySQL cluster, no AI option is available. In this case, contact technical support.
Specifications
The combination of GPU, CPU, and memory resources. Select ADB.MLLarge.24, ADB.MLLarge.2, or ADB.MLAdvavced.6.
Minimum Resources
The minimum number of resources.
Maximum Resources
The maximum number of resources.
Click OK. The AI resource group is created.
Find the general resource group that you want to manage and click Modify in the Actions column. In the Modify Resource Group panel, associate the general resource group with the created AI resource group in the ML Job Resubmission Rules section.
Step 2: Process data
Data processing is the process of converting raw business data into a format suitable for model training. Data processing is highly dependent on three key factors: the format of the raw data, storage location of the raw data, and input data format requirements for model training.
Table schema requirements for model training
Input feature column: The value of the column is a string of integers separated by commas (,). Each entry specifies a feature vector of a sample.
Result label column: The column uses one of two values, such as 0 and 1, to indicate the category label.
Table schema example: ('1,2,3',0), ('3,2,1',1).
Data processing method
If the format of your raw data is different from the format required for model training, use the following method to process your data:
Upload the JAR package that contains the Spark data processing program to an Object Storage Service (OSS) bucket.
When you submit a Spark job, configure the parameters required for the job. For more information, see Spark application configuration parameters.
If your raw data meets the required format for model training, you can skip this step.
Step 3: Create and train a machine learning model
Create a machine learning model, define the model parameters, specify training data, and then check the status of the model.
In the left-side navigation pane, choose
.On the SQLConsole tab, enter the following SQL statements to create and train a model:
-- Create a model.
-- Specify a resource group.
/*+resource_group=itrain*/
CREATE MODEL bstdemo.bst-- The name of the model.
OPTIONS (
model_type='bst_classification', -- The type of the model.
feature_cols=(event_list), -- The input feature columns.
target_cols=(target), -- The result label columns.
hyperparameters = ( -- Other model hyperparameters.
use_best_ckpt = 'False',
early_stopping_patience='0'
)
)
AS SELECT event_list, target FROM bstdemo.adb; -- The data source of the model. The results of the query are provided to the preceding variables for training.
-- Query the training status of the model. After the training is complete, the status changes to READY.
SHOW MODEL bstdemo.bst;
Step 4: Evaluate the model
Evaluate the created machine learning model to verify the model correctness.
-- Similar to the syntax for creating a model.
/*resource_group=rg1*/
EVALUATE MODEL bstdemo.bst
OPTIONS (
feature_cols=(event_list),
target_cols=(target),
)
AS SELECT event_list, target FROM bstdemo.adb01;
Step 5: Perform model prediction
Select feature columns from a table and pass the values of the columns to the trained model. Query the prediction results of the data.
-- Use the trained model to perform prediction.
-- In the ML_PREDICT() function, the first parameter specifies the name of the model, and the second parameter specifies the input feature columns.
SELECT ML_PREDICT('bstdemo.bst', event_list) FROM bstdemo.adb02;