This topic describes the scenarios in which feature management is applicable in PolarDB for AI. This topic also describes how to create, update, and delete a feature, and view feature status and the feature list.
Scenarios
In most cases, it is difficult to build an AI model by using raw data. Before raw data is input into the model, the raw data is preprocessed or pre-calculated. The raw and preprocessed data are used as features.
For example, if a data set contains multiple items related to a house, such as the size, number of bedrooms, geographic location, and year of construction, each item can be considered a feature.
The selection and preparation of features are essential for building an effective AI model. Ideal features can greatly improve the performance of the model, while irrelevant or redundant features may adversely impact the learning process of the model and degrade model performance.
Feature engineering involves operations related to features, such as selecting the most relevant features, converting raw data into a format that is suitable for machine learning algorithms, and creating new features derived from the raw data.
Features are commonly used in the following scenarios:
Classification
Email classification: uses words and phrases as features to identify spam emails.
Image recognition: uses features such as pixel values, colors, textures, and shapes to classify images.
Customer churn prediction: uses features related to customers, such as their historical transactions, engagement level, and purchase frequency, to predict the retention rate of customers in the platform.
Regression
Housing price prediction: uses features such as the location, size, age, and amenities of a house to predict its future prices.
Stock price prediction: uses features such as historical prices, trading volume, and economic indicators to predict future stock prices.
Clustering
Market segmentation: uses features such as the purchase history and preferences of customers, and demographic information to group customers.
Social network analysis: uses features such as the social activity data and connectivity of users to discover groups in social networks.
Recommendation
Movie recommendation: uses features such as the movies watched by users and the ratings, type, and actors of the movies to recommend movies.
Product recommendation: uses features such as the purchase history and browsing behavior of users, and product attributes to recommend products.
Time series analysis
Weather forecast: uses features such as historical weather conditions, dates, and solar terms to predict future weather conditions.
Financial forecast: uses features such as the financial statements and market trends of a company to forecast the financial status of the company.
Anomaly detection
Credit card fraud detection: uses features such as the time, location, and amount of a transaction to identify unusual transactions.
Network security: uses features such as network traffic data, connection type, and traffic size to detect network attacks.
Natural language processing (NLP)
Sentiment analysis: uses features such as words, phrases, and sentence structures to analyze the emotional tendencies of text.
Machine translation: uses features such as words, grammatical structure, and context to assist in machine translation.
Computer vision
Facial recognition: uses features such as key points, contours, and textures of the face to identify individuals.
Object detection: uses features such as the shape, size, and color of an object to detect objects in an image.
Features are widely used in all fields. Feature selection and engineering are critical steps in the model building process. They directly affect the performance of models.
Create a feature
Features are created in an asynchronous manner. You can check whether a feature has been created by querying its status.
Syntax
Create a feature by using data in a database.
CREATE FEATURE feature_name WITH (feature_class= '',parameters=()) AS (SELECT select_expr[, select_expr] ... FROM table_reference)
Create a feature by using an existing feature.
CREATE FEATURE feature_name WITH (feature_class = '',parameters=()) USING FEATURE feature_reference
Parameters
Parameter | Description |
feature_name | The name of the feature. |
feature_class | The type of the feature. Valid values:
|
parameters | Custom parameters for creating the feature. |
select_expr | The name of the column used to create the feature. |
table_reference | The name of the table containing the column used to create the feature. |
feature_reference | The name of an existing feature. |
Examples
Create a feature by using data in a database.
/*polar4ai*/create feature test_graph_1 with (feature_class='graph') as (select ip, user_id from small_sample limit 30)
Sample result:
Query OK, 0 rows affected (0.79 sec)
Create a feature by using an existing feature.
/*polar4ai*/create feature test_subgraph_1 with (feature_class='connected_subgraph') using feature test_graph_1
Sample result:
Query OK, 0 rows affected (0.79 sec)
Update a feature
Features are updated in an asynchronous manner. You can check whether a feature has been updated by querying its status.
Syntax
UPDATE FEATURE feature_name WITH (feature_class = '',parameters=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)
Parameters
Parameter | Description |
feature_name | The name of the feature. |
feature_class | The type of the feature. Set the value to graph. |
parameters | Custom parameters for creating the feature. |
select_expr | The name of the column used to update the feature. Note
|
table_reference | The name of the table containing the column used to update the feature. |
Examples
Modify the columns of the test_subgraph_1
feature to the ip
and user_id
columns in the small_sample
table.
/*polar4ai*/update feature test_graph_1 with (feature_class='graph') as (select ip, user_id from small_sample limit 30)
Sample result:
Query OK, 0 rows affected (0.79 sec)
View feature status
View the current status of a feature. In most cases, you can check whether a feature is created or updated after you create or update the feature. A feature may be in one of the following states:
INIT: The feature is initialized.
PREPARING: The data for the feature is being prepared.
PREPARED: The data for the feature is prepared.
COMPUTING: The feature is being computed.
COMPUTED: The feature is computed.
RESTORING: The feature results are being stored.
RESTORED: The feature results are stored.
FINISHED: The feature computing is successful.
FAILED: The feature computing failed.
Syntax
SHOW FEATURE feature_name
Parameters
Parameter | Description |
feature_name | The name of the feature. |
Examples
View the status of the test_subgraph_1
feature.
/*polar4ai*/show feature test_subgraph_1;
Sample result:
+-----------+-------------------------------------------------------------------------------------------------------------------------------+
| CSGStatus | featurePath |
+-----------+-------------------------------------------------------------------------------------------------------------------------------+
| FINISHED | http://db4ai-collie-cn-hangzhou-dataset.oss-cn-hangzhou.aliyuncs.com/xxxx.txt?OSSAccessKeyId=xxxx&Expires=xxxx&Signature=xxxx |
+-----------+-------------------------------------------------------------------------------------------------------------------------------+
The validity period of featurePath
is 100 minutes. You can use the Expires
parameter in featurePath
to check whether the link is valid. You must access the link within the validity period.
View the feature list
You can view all features for the current cluster.
Syntax
SHOW FEATURES
Examples
/*polar4ai*/show features;
Sample result:
+--------------------+-----------------+----------------+
| feature_class | feature_name | feature_status |
+--------------------+-----------------+----------------+
| connected_subgraph | test_subgraph_1 | DELETED |
| graph | test_graph_1 | FINISHED |
+--------------------+-----------------+----------------+
2 rows in set (0.32 sec)
Delete a feature
You can delete an existing feature.
Syntax
DROP FEATURE feature_name
Parameters
Parameter | Description |
feature_name | The name of the feature. |
Examples
Deletes the existing feature test_subgraph_1
.
/*polar4ai*/drop feature test_subgraph_1
The following result indicates that the feature has been deleted:
Query OK, 0 rows affected (0.69 sec)