All Products
Search
Document Center

Platform For AI:Anomaly Detection

Last Updated:Dec 27, 2024

Anomaly Detection is used to identify data points or patterns in a dataset that significantly deviate from normal data points or patterns. It is suitable for detecting data with continuous or enumeration features. Anomaly Detection help users detect potential errors, frauds, or exceptions to improve the accuracy and reliability of data analysis.

Configure the component

Method 1: Configure the component on the pipeline page

On the pipeline details page in Machine Learning Designer, add the Anomaly Detection component to the pipeline and configure the parameters described in the following table.

Parameter

Description

Feature Columns

Specify the feature columns that you want to perform anomaly detection.

Anomaly Detection Method

The method used to detect anomalous data. Valid values:

  • Box Plot is used to detect data with continuous features.

  • Attribute Value Frequency (AVF) is used to detect data with enumeration features.

Method 2: Use PAI commands

Configure the component parameters by using Platform for AI (PAI) commands. You can use the SQL Script component to call PAI commands. For more information, see SQL ScriScenario 4: Execute PAI commands within the SQL script componentpt.

PAI -name fe_detect_runner -project algo_public
     -DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
     -Dlifecycle="28"
     -DdetectStrategy="boxPlot"
     -DmodelTable="pai_temp_2458_23565_2"
     -DinputTable="pai_bank_data"
     -DoutputTable="pai_temp_2458_23565_1";

Parameter

Required

Description

inputTable

Yes

The name of the input table.

inputTablePartitions

No

The partitions in the input table. By default, all partitions are selected.

  • Specify a single partition in the format of partition_name=value.

  • Specify multiple partitions in the format of name1=value1,name2=value2.

    Note

    Separate multiple partitions with commas (,).

  • Specify multi-level partitions in the format of name1=value1/name2=value2.

selectedCols

Yes

The input features. The data types of the features are not limited.

detectStrategy

Yes

The detection method. Box Plot and AVF are supported. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumeration features.

outputTable

Yes

The output table that contains data with anomalous features.

modelTable

Yes

The anomaly detection model.

lifecycle

No

The lifecycle of the output table. Default value: 7.

coreNum

No

The number of cores. This parameter must be used with the memSizePerCore parameter.

Note

The value of this parameter must be a positive integer. Valid values: 1 to 9999.

memSizePerCore

No

The memory size of each core. Unit: MB. Valid values: [2048,64 × 1024].