This topic describes how to use PolarDB for AI to generate an AI-augmented table (AAT) that contains the inference results of AI models. AATs can be used in various scenarios to help you gain insights into data and make informed decisions.
What is an AAT
An AAT is a table where one or more columns contain the inference results of AI models. Example:
In the table in Figure 1, the TripID column is the primary key, and the Airline, Flight, AirportFrom, AirportTo, DayOfWeek, Time, and Length columns contain information about each flight. The table in Figure 2 is the AAT of the table in Figure 1. The result column in the table contains AI-generated results. The AAT contains data and the inference results of AI models. By using the AAT stored in PolarDB databases, travelers can check whether their flight is likely to be delayed, and airlines can optimize their flight scheduling to improve punctuality.
Figure 1. Original table
Figure 2. AAT
Introducing AATs in relational databases allows for higher efficiency. For example, you can easily categorize user groups, predict sales amounts, and obtain AI-generated content by using the data stored in AATs. AATs are used in the same way as ordinary external tables
Connect an AI model and a database by using an AAT
The following example shows how to use an AAT.
Enable the cold data archiving feature. For more information, see Enable cold data archiving.
To prevent the AI model from frequently writing data to databases, PolarDB stores AATs as external tables in Object Storage Service (OSS). Therefore, you need to enable the cold data archiving feature before you use AATs.
Create a database named polar4ai. The AATs that are automatically generated are stored in the polar4ai database by default. When you create the database, select the utf8mb4 character set. The authorized database account must be the same as the account used by PolarDB for AI. You must also grant read and write permissions to the account.
Execute the following SQL statement to create an AI model:
/*polar4ai*/ CREATE MODEL airlines_gbm_copy1 WITH (model_class='lightgbm', x_cols ='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', model_parameter=(boosting_type='gbdt', n_estimators=100, max_depth=8, num_leaves=256)) AS (SELECT * FROM airlines_train)
Query models in the database.
/*polar4ai*/SHOW MODELS;
The following output is returned:
Use the created model to perform an offline prediction task and generate an AAT.
/*polar4ai*/SELECT TripID,Delay FROM PREDICT ( MODEL airlines_gbm_copy1, SELECT * FROM airlines_train_1000_copy1) WITH (s_cols='TripID,Delay', x_cols = 'Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length', y_cols='Delay', primary_key='TripID', mode='async',into_type='db') INTO lightgbm_v2_predict82201;
In the preceding statement,
lightgbm_v2_predict82201
is the name of the AAT that is automatically created, and TripID is the primary key of the AAT. After the execution of the preceding statement is completed, a task ID is returned. Example:babc6d66-xxxx-yyyy-a4b8-1b1426ce8614
.View the status of the offline task.
Check the task status by using the task ID. If the task is in the finish state, the AI model has completed data inference.
/*polar4ai*/SHOW TASK `babc6d66-xxxx-yyyy-a4b8-1b1426ce8614`
The following output is returned:
Query the schema of the AAT that is automatically generated in the polar4ai database.
SHOW CREATE TABLE polar4ai.lightgbm_v2_predict82201;
The following output is returned:
CREATE TABLE `lightgbm_v2_predict82201` ( `TripID` bigint(20) NOT NULL, `Delay` bigint(20) DEFAULT NULL, `result` text, PRIMARY KEY (`TripID`) ) ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ CONNECTION='default_oss_server'
The types and length of TripID and Delay in the AAT are the same as those of the original table. The result column contains the inference results of the AI model and is written to the AAT.
Query 1,000 rows of data in the AAT.
SELECT * FROM polar4ai.lightgbm_v2_predict82201 LIMIT 1000;
The following output is returned:
The preceding query results show that the newly generated result column contains prediction results and their probability values.
Join the AAT with a table in the database.
The
lightgbm_v2_predict82201
table can be used like any other tables in the database. The table can also be used in join operations. Example:SELECT * FROM airlines_train_1000_copy1, polar4ai.lightgbm_v2_predict82204 WHERE airlines_train_1000_copy1.TripID=polar4ai.lightgbm_v2_predict82204.TripID AND airlines_train_1000_copy1.Delay=1
The following output is returned: