All Products
Search
Document Center

Platform For AI:Configure FeatureStore projects

Last Updated:Nov 07, 2024

A FeatureStore project has an offline data store and an online data store. Each project is independent. Online and offline feature tables in a project can be shared among project members. This topic describes how to configure a FeatureStore project.

Prerequisite

  • Offline and online data stores are created. For information about how to create data stores, see Configure data stores.

  • A label table is stored in an offline data store.

    A label table stores the labels used for model training. It contains the target attributes of model training and the join IDs of the feature entities. For a recommendation system, the label table is generally generated by grouping the data in the behavior table based on user_id, item_id, or request_id.

    Sample label table (Click to view details)

    The following sample statement provides an example on how to create a label table that contains a common set of fields.

    CREATE TABLE IF NOT EXISTS rec_sln_demo_fs_rec_sln_demo_sorting_label_table_v3 
    (
        request_id string
        ,user_id string
        ,page string
        ,net_type string
        ,day_h bigint COMMENT 'The hour at which the behavior occurred'
        ,week_day bigint COMMENT 'The day of the week on which the behavior occurred'
        ,day_min string
        ,event_unix_time bigint
        ,item_id string
        ,playtime double
        ,is_click BIGINT
        ,ln_playtime DOUBLE
        ,is_praise BIGINT
    )
    PARTITIONED BY 
    (
        ds string
    )
    LIFECYCLE 90
    ;

Create a project

  1. Go to the FeatureStore page.

    1. Log on to the PAI console. In the left-side navigation pane, choose Data Preparation > FeatureStore.

    2. On the FeatureStore page, select a workspace from the Select Workspace drop-down list and click Enter FeatureStore.

  2. Click Create Project. On the Create Project page, configure the parameters.

    The following table describes the key parameters.

    Parameter

    Description

    Offline Store

    Select the offline data store that you created.

    Online Store

    Select the online data store that you created.

    Offline Table Lifecycle

    Specify the lifecycle for tables automatically created by FeatureStore and stored in the offline MaxCompute data store.

  3. Click Submit.

Create a feature entity

A feature entity is a collection of semantically related features that provide information about an object. For example, you can create two entities named user and item for a recommendation system.

  1. In the project list, click the name of a project to go to the Project Details page.

  2. On the Feature Entity tab, click Create Feature Entity. Configure the parameters in the right-side panel that appears.

    The following table describes the key parameters.

    Parameter

    Description

    Feature Entity Name

    The name of the feature entity. For example, you can create two entities named user and item for a recommendation system.

    Join ID

    A field that associates a feature view with a feature entity. Each feature entity has a join ID. You can use join IDs to associate features across multiple feature views.

    Note

    Each feature view has a primary key (index) that can be used to retrieve features. The primary key can be different from the name of the join ID.

    For example, you can set user_id (the primary key of the user table) and item_id (the primary key of the item table) as join IDs for a recommendation system.

  3. Click Submit.

Create a feature view

A feature view contains a logical collection of features and their derived features. A feature view is a subset of a feature entity and helps ensure consistency between offline and online features.

  1. On the Project Details page, click the Feature View tab, and then click Create Feature View.

  2. Configure the parameters in the right-side panel that appears and click Submit.

    • You can create an offline feature view to register the offline feature data.

    • You can create a real-time feature view to register the real-time feature data.

      For more information about real-time features, see the What is real-time feature section of this topic.

    The following tables describe key parameters of offline and real-time feature views.

Create an offline feature view

The following table describes the key parameters of an offline feature view.

Parameter

Description

Type

The type of feature view that you want to create. To create an offline feature view, set this parameter to Offline.

Write Mode

  • Use Offline Table: The feature view uses the schema of a feature table in an offline data store.

    If you select this mode, you must specify the feature table that you want to use and the data store in which the table is stored.

  • Customize Table Schema: The feature view uses a custom table schema.

    If you select this mode, you must add fields and configure field attributes.

Configure the following field attributes:

  • Primary Key: a field that is used as the primary key of the feature view.

  • Event Time and Partition Field: fields that contain the timestamps of events. Add at least one field as the event time field or partition field.

Synchronize Online Feature Table

Specify whether to synchronize the data in the feature view to the online data store in the same project.

Feature Entity

Select the feature entity that you want to associate with the feature view.

Note

A feature entity can be associated with multiple feature views.

Feature Lifecycle

Specify the lifecycle of the feature view. The lifecycle determines how long the features written to the online data store are retained.

Create a real-time feature view

The following table describes the key parameters of a real-time feature view.

Parameter

Description

View Name

Specify a custom name according to the tips.

Type

The type of feature view that you want to create. To create a real-time feature view, set this parameter to Real Time.

Feature Entity

Select the feature entity that you want to associate with the feature view.

Note

A feature entity can be associated with multiple feature views.

Write Mode

Only Customize Table Schema is supported. The feature view uses a custom table schema.

If you select this mode, you must add fields and configure field attributes.

Configure the following field attributes:

  • Primary Key: a field that is used as the primary key of the feature view.

  • Event Time and Partition Field: fields that contain the timestamps of events. Add at least one field as the event time field or partition field.

Feature Field

Specify feature fields based on your requirements.

  • If you use FeatureDB, you do not need to specify Event Time. FeatureDB creates the Event Time field by default and sets the value to the actual write time of features. You can also set the Event Time field and set the value when writing features.

  • If you do not use FeatureDB, you must specify Event Time and later set its value for offline export of samples.

Feature Lifecycle

Specify the lifecycle of the feature view. We recommend that you set the value to a number larger than 1. Default value: 30. Unit: days.

Advanced Settings

You can configure advanced options by using JSON messages. Only the save_original_field field is supported

  • {"save_original_field":"true"} indicates that tables in the MaxCompute data store have the same schema as those in the GraphCompute data store.

  • {"save_original_field":"false"} indicates that schema mapping is required by using FeatureStore.

Note

In a GraphCompute data store, a field name cannot exceed 30 characters in length. If a MaxCompute data store contains fields whose names exceed 30 characters, you must specify {"save_original_field":"false"} to map the schemas.

Create a label table

A label table stores the labels used for model training. It contains the target attributes of model training and the join IDs of the feature entities. For a recommendation system, the label table is generally generated by grouping the data in the behavior table based on user_id, item_id, or request_id.

  1. On the Project Details page, click the Label Table tab, and then click Create Label Table.

  2. In the right-side panel that appears, select the data store in which the label table you want to use is stored, and then select the table.

  3. Configure the fields in the label table and click Submit. The following table describes the field attributes that you can configure.

    Attribute

    Description

    Feature Field

    A field that contains feature data in the label table.

    FG Reserved Fields

    No configuration is required.

    Event Time

    A field that contains the timestamps of events in the label table.

    Label Field

    A field that contains the labels in the label table.

    Partition Field

    A partition field that segments the label table.

Create a model feature

Model features are the inputs that models use for training and serving. After you build a model based on the selected features, FeatureStore creates a training dataset in the MaxCompute data store for offline training. You can specify model features in Elastic Algorithm Service (EAS) or PAI-REC to automatically pull feature data from FeatureStore for model inference.

  1. On the Project Details page, click the Model Features tab and then click Create Model Feature.

  2. In the right-side panel that appears, configure the parameters and click Submit.

    The following table describes the key parameters.

    Parameter

    Description

    Select Feature

    Select a feature in the batch feature view and specify an alias.

    Label Table Name

    Select the label table that you created.

    Export Table Name

    By default, Automatically Created is selected, which indicates that a training dataset is automatically created in the MaxCompute data store for offline training.

What is real-time feature

Terms

A real-time feature is a feature that changes in real-time, even within milliseconds. Real-time features are usually generated or updated in systems such as servers and immediately used for processing and decision-making. The generation and use of real-time features usually occur in real-time data stream analysis or systems, characterized by high timeliness and rapid response.

Real-time features are usually extracted from data streams. Data stream systems such as Flink can calculate and geerate real-time features that best reflect the current status. Real-time features require the entire link to have high performance and low latency. Because real-time features are dynamically updated, the system needs to continuously recalculating the features.

Scenarios

Real-time features are used in the following typical scenarios:

  • Online advertising: Adjust advertisement content in real-time based on the current browsing behavior of the user.

  • Fraud detection: Detect suspicious behavior in financial transactions in real-time and trigger alerts or block transactions.

  • Personalized recommendation: Update recommendation list in real-time based on the current activities and historical data of the user.

  • IoT systems: Monitor and control devices in real-time in IoT systems. Real-time features are generated and used as response to changes in environment.

Use real-time features in recommendation and advertising systems

Write real-time features

After you create a real-time feature view in FeatureStore, a table with the same schema is automatically created in the online data engine for the writing and reading of the real-time features. When using data sources such as FeatureDB, TableStore, Hologres, or GraphCompute, the backend can connect to the DataHub message queue. Data can be transmitted to Flink through DataHub. Flink processes and calculates real-time features and write the features into the corresponding table in the online data source. You can view the specific table name on the details page of the real-time feature view.

Read online features

  • If you use EasyRec Processor, EasyRec Processor provides the built-in FeatureStore SDK for Cpp. You only needs to specify the model feature name (fs_model) to identify and read real-time features.

  • If you use FeatureStore SDK for Go or FeatureStore SDK for Java, you can read real-time features based on the SDK settings.

Export offline samples

FeatureStore automatically join and export the tables in the offline data engine corresponding to the feature view.

For real-time feature views:

  • If you use FeatureDB, FeatureDB automatically write the data written online into the corresponding offline table in the offline data engine.

  • If you do not use FeatureDB, you need to create a task to write the data into the corresponding offline table in the offline data engine. You can also use the customize recommendation solutions feature in PAI-Rec to simulate real-time data offline, which serves as the data in the corresponding offline table.

Real-time feature view in FeatureStore

Process of application

image

The real-time feature view in FeatureStore is designed to handle features that change in real-time. It writes online features in real-time through DataHub message queue and Flink. Then, it uses EasyRec Processor to poll and read features in real-time or read features in real-time through the FeatureStore SDK, enabling millisecond-level preception of changes in downstream.

Export procedure

You can select multiple real-time feature views and offline feature views to create a model feature to export. FeatureStore supports automatic export. The following table shows the source of the offline table corresponding to the real-time feature views in different scenarios:

Online data source

FeatureDB

Hologres/TableStore/GraphCompute

Recommendation engine

Does not matter

PAI-REC (use customize recommendation solutions)

Others

Export procedure

Directly export through FeatureStore.

Import the data simulated by the recommendation algorithm into the corresponding offline table. Then, export through FeatureStore.

Manually export the corresponding offline table. Then, export through FeatureStore.

Synchronize procedure

Synchronization operations can be divided into the following two types:

What to do next

After configuring a FeatureStore project, you can use FeatureStore to manage features in a recommendation system.