The Filtering and Mapping component is a data pre-processing tool that uses user-defined expressions of filter conditions to filter data. This component allows you to modify the names of the columns that you want to filter. This is very useful in the data cleaning and feature engineering stages, as it can effectively clean the data and prepare datasets suitable for subsequent analysis and modeling.
Configure the component
Method 1: Configure the component on the pipeline page
Add an Filtering and Mapping component on the pipeline page and configure the following parameters:
Parameter | Description |
Mapping Rules | The columns that you want to filter. By default, all columns are selected. You can also modify the names of the columns. |
Filter Criteria | Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40. Note Only the following operators are supported:
|
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name Filter
-project algo_public
-DoutTableName="test_9"
-DinputPartitions="pt=20150501"
-DinputTableName="bank_data_partition"
-Dfilter="age>=40";
Parameter | Required | Description |
outputTableName | Yes | The name of the output table. |
inputPartitions | No | The partitions that are selected from the input table for training. If you want to select the full table, set the parameter to None. |
inputTableName | Yes | The name of the input table. |
filter | No | Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40. |