A FeatureStore project has an offline data store and an online data store. Each project is independent. Online and offline feature tables in a project can be shared among project members. This topic describes how to configure a FeatureStore project.
Prerequisite
Offline and online data stores are created. For information about how to create data stores, see Configure data stores.
A label table is stored in an offline data store.
A label table stores the labels used for model training. It contains the target attributes of model training and the join IDs of the feature entities. For a recommendation system, the label table is generally generated by grouping the data in the behavior table based on user_id, item_id, or request_id.
Create a project
Go to the FeatureStore page.
Log on to the PAI console. In the left-side navigation pane, choose .
On the FeatureStore page, select a workspace from the Select Workspace drop-down list and click Enter FeatureStore.
Click Create Project. On the Create Project page, configure the parameters.
The following table describes the key parameters.
Parameter
Description
Offline Store
Select the offline data store that you created.
Online Store
Select the online data store that you created.
Offline Table Lifecycle
Specify the lifecycle for tables automatically created by FeatureStore and stored in the offline MaxCompute data store.
Click Submit.
Create a feature entity
A feature entity is a collection of semantically related features that provide information about an object. For example, you can create two entities named user and item for a recommendation system.
In the project list, click the name of a project to go to the Project Details page.
On the Feature Entity tab, click Create Feature Entity. Configure the parameters in the right-side panel that appears.
The following table describes the key parameters.
Parameter
Description
Feature Entity Name
The name of the feature entity. For example, you can create two entities named user and item for a recommendation system.
Join ID
A field that associates a feature view with a feature entity. Each feature entity has a join ID. You can use join IDs to associate features across multiple feature views.
NoteEach feature view has a primary key (index) that can be used to retrieve features. The primary key can be different from the name of the join ID.
For example, you can set user_id (the primary key of the user table) and item_id (the primary key of the item table) as join IDs for a recommendation system.
Click Submit.
Create a feature view
A feature view contains a logical collection of features and their derived features. A feature view is a subset of a feature entity and helps ensure consistency between offline and online features.
On the Project Details page, click the Feature View tab, and then click Create Feature View.
Configure the parameters in the right-side panel that appears and click Submit.
You can create an offline feature view to register the offline feature data.
You can create a real-time feature view to register the real-time feature data.
For more information about real-time features, see the What is real-time feature section of this topic.
The following tables describe key parameters of offline and real-time feature views.
Create an offline feature view
The following table describes the key parameters of an offline feature view.
Parameter | Description |
Type | The type of feature view that you want to create. To create an offline feature view, set this parameter to Offline. |
Write Mode |
Configure the following field attributes:
|
Synchronize Online Feature Table | Specify whether to synchronize the data in the feature view to the online data store in the same project. |
Feature Entity | Select the feature entity that you want to associate with the feature view. Note A feature entity can be associated with multiple feature views. |
Feature Lifecycle | Specify the lifecycle of the feature view. The lifecycle determines how long the features written to the online data store are retained. |
Create a real-time feature view
The following table describes the key parameters of a real-time feature view.
Parameter | Description |
View Name | Specify a custom name according to the tips. |
Type | The type of feature view that you want to create. To create a real-time feature view, set this parameter to Real Time. |
Feature Entity | Select the feature entity that you want to associate with the feature view. Note A feature entity can be associated with multiple feature views. |
Write Mode | Only Customize Table Schema is supported. The feature view uses a custom table schema. If you select this mode, you must add fields and configure field attributes. Configure the following field attributes:
|
Feature Field | Specify feature fields based on your requirements.
|
Feature Lifecycle | Specify the lifecycle of the feature view. We recommend that you set the value to a number larger than 1. Default value: 30. Unit: days. |
Advanced Settings | You can configure advanced options by using JSON messages. Only the save_original_field field is supported
Note In a GraphCompute data store, a field name cannot exceed 30 characters in length. If a MaxCompute data store contains fields whose names exceed 30 characters, you must specify |
Create a label table
A label table stores the labels used for model training. It contains the target attributes of model training and the join IDs of the feature entities. For a recommendation system, the label table is generally generated by grouping the data in the behavior table based on user_id, item_id, or request_id.
On the Project Details page, click the Label Table tab, and then click Create Label Table.
In the right-side panel that appears, select the data store in which the label table you want to use is stored, and then select the table.
Configure the fields in the label table and click Submit. The following table describes the field attributes that you can configure.
Attribute
Description
Feature Field
A field that contains feature data in the label table.
FG Reserved Fields
No configuration is required.
Event Time
A field that contains the timestamps of events in the label table.
Label Field
A field that contains the labels in the label table.
Partition Field
A partition field that segments the label table.
Create a model feature
Model features are the inputs that models use for training and serving. After you build a model based on the selected features, FeatureStore creates a training dataset in the MaxCompute data store for offline training. You can specify model features in Elastic Algorithm Service (EAS) or PAI-REC to automatically pull feature data from FeatureStore for model inference.
On the Project Details page, click the Model Features tab and then click Create Model Feature.
In the right-side panel that appears, configure the parameters and click Submit.
The following table describes the key parameters.
Parameter
Description
Select Feature
Select a feature in the batch feature view and specify an alias.
Label Table Name
Select the label table that you created.
Export Table Name
By default, Automatically Created is selected, which indicates that a training dataset is automatically created in the MaxCompute data store for offline training.
What is real-time feature
Terms
A real-time feature is a feature that changes in real-time, even within milliseconds. Real-time features are usually generated or updated in systems such as servers and immediately used for processing and decision-making. The generation and use of real-time features usually occur in real-time data stream analysis or systems, characterized by high timeliness and rapid response.
Real-time features are usually extracted from data streams. Data stream systems such as Flink can calculate and geerate real-time features that best reflect the current status. Real-time features require the entire link to have high performance and low latency. Because real-time features are dynamically updated, the system needs to continuously recalculating the features.
Scenarios
Real-time features are used in the following typical scenarios:
Online advertising: Adjust advertisement content in real-time based on the current browsing behavior of the user.
Fraud detection: Detect suspicious behavior in financial transactions in real-time and trigger alerts or block transactions.
Personalized recommendation: Update recommendation list in real-time based on the current activities and historical data of the user.
IoT systems: Monitor and control devices in real-time in IoT systems. Real-time features are generated and used as response to changes in environment.
Use real-time features in recommendation and advertising systems
Write real-time features
After you create a real-time feature view in FeatureStore, a table with the same schema is automatically created in the online data engine for the writing and reading of the real-time features. When using data sources such as FeatureDB, TableStore, Hologres, or GraphCompute, the backend can connect to the DataHub message queue. Data can be transmitted to Flink through DataHub. Flink processes and calculates real-time features and write the features into the corresponding table in the online data source. You can view the specific table name on the details page of the real-time feature view.
Read online features
If you use EasyRec Processor, EasyRec Processor provides the built-in FeatureStore SDK for Cpp. You only needs to specify the model feature name (fs_model) to identify and read real-time features.
If you use FeatureStore SDK for Go or FeatureStore SDK for Java, you can read real-time features based on the SDK settings.
Export offline samples
FeatureStore automatically join and export the tables in the offline data engine corresponding to the feature view.
For real-time feature views:
If you use FeatureDB, FeatureDB automatically write the data written online into the corresponding offline table in the offline data engine.
If you do not use FeatureDB, you need to create a task to write the data into the corresponding offline table in the offline data engine. You can also use the customize recommendation solutions feature in PAI-Rec to simulate real-time data offline, which serves as the data in the corresponding offline table.
Real-time feature view in FeatureStore
Process of application
The real-time feature view in FeatureStore is designed to handle features that change in real-time. It writes online features in real-time through DataHub message queue and Flink. Then, it uses EasyRec Processor to poll and read features in real-time or read features in real-time through the FeatureStore SDK, enabling millisecond-level preception of changes in downstream.
Export procedure
You can select multiple real-time feature views and offline feature views to create a model feature to export. FeatureStore supports automatic export. The following table shows the source of the offline table corresponding to the real-time feature views in different scenarios:
Online data source | FeatureDB | Hologres/TableStore/GraphCompute | |
Recommendation engine | Does not matter | PAI-REC (use customize recommendation solutions) | Others |
Export procedure | Directly export through FeatureStore. | Import the data simulated by the recommendation algorithm into the corresponding offline table. Then, export through FeatureStore. | Manually export the corresponding offline table. Then, export through FeatureStore. |
Synchronize procedure
Synchronization operations can be divided into the following two types:
Write through SDK. For more information, see FeatureStore SDK reference.
What to do next
After configuring a FeatureStore project, you can use FeatureStore to manage features in a recommendation system.