Filtering and Mapping - Platform For AI - Alibaba Cloud Documentation Center

The Filtering and Mapping component is a data pre-processing tool that uses user-defined expressions of filter conditions to filter data. This component allows you to modify the names of the columns that you want to filter. This is very useful in the data cleaning and feature engineering stages, as it can effectively clean the data and prepare datasets suitable for subsequent analysis and modeling.

Configure the component

Method 1: Configure the component on the pipeline page

Add an Filtering and Mapping component on the pipeline page and configure the following parameters:

Parameter	Description
Mapping Rules	The columns that you want to filter. By default, all columns are selected. You can also modify the names of the columns.
Filter Criteria	Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40. Note Only the following operators are supported: = != > < >= <= like rlike

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name Filter
    -project algo_public
    -DoutTableName="test_9"
    -DinputPartitions="pt=20150501"
    -DinputTableName="bank_data_partition"
    -Dfilter="age>=40";

Parameter	Required	Description
outputTableName	Yes	The name of the output table.
inputPartitions	No	The partitions that are selected from the input table for training. If you want to select the full table, set the parameter to None.
inputTableName	Yes	The name of the input table.
filter	No	Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40.