Anomaly Detection is used to identify data points or patterns in a dataset that significantly deviate from normal data points or patterns. It is suitable for detecting data with continuous or enumeration features. Anomaly Detection help users detect potential errors, frauds, or exceptions to improve the accuracy and reliability of data analysis.
Configure the component
Method 1: Configure the component on the pipeline page
On the pipeline details page in Machine Learning Designer, add the Anomaly Detection component to the pipeline and configure the parameters described in the following table.
Parameter | Description |
Feature Columns | Specify the feature columns that you want to perform anomaly detection. |
Anomaly Detection Method | The method used to detect anomalous data. Valid values:
|
Method 2: Use PAI commands
Configure the component parameters by using Platform for AI (PAI) commands. You can use the SQL Script component to call PAI commands. For more information, see SQL ScriScenario 4: Execute PAI commands within the SQL script componentpt.
PAI -name fe_detect_runner -project algo_public
-DselectedCols="emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m,nr_employed" \
-Dlifecycle="28"
-DdetectStrategy="boxPlot"
-DmodelTable="pai_temp_2458_23565_2"
-DinputTable="pai_bank_data"
-DoutputTable="pai_temp_2458_23565_1";
Parameter | Required | Description |
inputTable | Yes | The name of the input table. |
inputTablePartitions | No | The partitions in the input table. By default, all partitions are selected.
|
selectedCols | Yes | The input features. The data types of the features are not limited. |
detectStrategy | Yes | The detection method. Box Plot and AVF are supported. Box Plot is used to detect data with continuous features. AVF is used to detect data with enumeration features. |
outputTable | Yes | The output table that contains data with anomalous features. |
modelTable | Yes | The anomaly detection model. |
lifecycle | No | The lifecycle of the output table. Default value: 7. |
coreNum | No | The number of cores. This parameter must be used with the memSizePerCore parameter. Note The value of this parameter must be a positive integer. Valid values: 1 to 9999. |
memSizePerCore | No | The memory size of each core. Unit: MB. Valid values: [2048,64 × 1024]. |