All Products
Search
Document Center

Platform For AI:Filtering and Mapping

Last Updated:Nov 28, 2024

The Filtering and Mapping component is a data pre-processing tool that uses user-defined expressions of filter conditions to filter data. This component allows you to modify the names of the columns that you want to filter. This is very useful in the data cleaning and feature engineering stages, as it can effectively clean the data and prepare datasets suitable for subsequent analysis and modeling.

Configure the component

Method 1: Configure the component on the pipeline page

Add an Filtering and Mapping component on the pipeline page and configure the following parameters:

Parameter

Description

Mapping Rules

The columns that you want to filter. By default, all columns are selected. You can also modify the names of the columns.

Filter Criteria

Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40.

Note

Only the following operators are supported:

  • =

  • !=

  • >

  • <

  • >=

  • <=

  • like

  • rlike

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name Filter
    -project algo_public
    -DoutTableName="test_9"
    -DinputPartitions="pt=20150501"
    -DinputTableName="bank_data_partition"
    -Dfilter="age>=40";

Parameter

Required

Description

outputTableName

Yes

The name of the output table.

inputPartitions

No

The partitions that are selected from the input table for training. If you want to select the full table, set the parameter to None.

inputTableName

Yes

The name of the input table.

filter

No

Similar to the WHERE clause in SQL statements, the WHERE clause you specified is used to filter data. Example: age>40.