One-Class Support Vector Machine (SVM) is an unsupervised machine learning algorithm that is different from traditional SVM algorithms. You can use the One-Class SVM Outlier component to detect outliers by learning a decision boundary. This topic describes how to configure the One-Class SVM Outlier component in Platform for AI (PAI).
Limits
You can use the One-Class SVM Outlier component based only on the computing resources of MaxCompute.
Configure the component
You can use one of the following methods to configure the parameters of the One-Class SVM Outlier component:
Method 1: Configure the component in the PAI console
Configure the component on the pipeline page of Machine Learning Designer. The following table describes the parameters.
Tab | Parameter | Description |
Field Setting | featureCols | An array of the names of feature columns. |
groupCols | An array of the names of group columns. | |
tensorCol | The tensor column. | |
vectorCol | The name of the vector column. | |
Parameter Setting | Prediction Result Column | The name of the prediction result column. |
coef0 | The coef0 parameter of the kernel function. Default value: 0.0. Note This parameter takes effect only if the type of the kernel function is polynomial or sigmoid. | |
degree | The degree of the polynomial. | |
epsilon | The value that you want to obtain from the training results before the iteration stops. Default value: 1.0E-6. | |
gamma | The gamma parameter of the kernel function. Default value: -1.0. Note This parameter takes effect only if the type of the kernel function is RBF, polynomial, or sigmoid. If you do not configure this parameter, the default value 1/data dimension is used. | |
kernelType | The type of the kernel function. Valid values:
| |
maxOutlierNumPerGroup | The maximum number of outliers per group. | |
maxOutlierRatio | The maximum ratio of outliers that are detected by the algorithm. | |
maxSampleNumPerGroup | The maximum number of samples per group. | |
nu | The nu parameter of the kernel function. This parameter is positively correlated with the number of support vectors. Valid values: (0,1). Default value: 0.01. | |
outlierThreshold | If the score exceeds the specified threshold, the data point is considered an anomalous point. | |
Column name of detail prediction information | The name of the prediction details column. | |
numThreads | The number of threads of the component. | |
Execute Tuning | Number of Workers | The number of worker nodes. The value must be a positive integer. This parameter must be used with the Memory per worker parameter. Valid values: 1 to 9999. |
Memory per worker | The memory size of each worker node. Unit: MB. The value must be a positive integer. You must specify a value from 1024 to 65536. |
Method 2: Configure the component by using Python code
You can configure the One-Class SVM Outlier component by using the PyAlink Script component to call Python code. For more information, see PyAlink script.
Parameter | Required | Description | Default value |
predictionCol | Yes | The name of the prediction results column. | N/A |
degree | No | The degree of the polynomial. | 2 |
epsilon | No | The value that you want to obtain from the training results before the iteration stops. | 1.0E-6 |
featureCols | No | An array of the names of feature columns. | Select All |
groupCols | No | The array of the names of group columns. | N/A |
maxOutlierNumPerGroup | No | The maximum number of outliers per group. | N/A |
maxOutlierRatio | No | The maximum ratio of outliers that are detected by the algorithm. | N/A |
maxSampleNumPerGroup | No | The maximum number of samples per group. | N/A |
outlierThreshold | No | If the score exceeds the specified threshold, the data point is considered an anomalous point. | N/A |
predictionDetailCol | No | The name of the prediction details column. | N/A |
tensorCol | No | The name of the tensor column. | N/A |
vectorCol | No | The name of the vector column. | N/A |
kernelType | No | The type of the kernel function. Valid values:
| RBF |
coef0 | No | The coef0 parameter of the kernel function. Note This parameter takes effect only if the type of the kernel function is polynomial or sigmoid. | 0.0 |
gamma | No | The gamma parameter of the kernel function. Note This parameter takes effect only if the type of the kernel function is RBF, polynomial, or sigmoid. If you do not configure this parameter, the default value 1/data dimension is used. | -1.0 |
nu | No | The nu parameter of the kernel function. This parameter is positively correlated with the number of support vectors. Valid values: (0,1). | 0.01 |
numThreads | No | The number of threads of the component. | 1 |
Sample Python code:
df = pd.DataFrame([
[0.730967787376657,0.24053641567148587,0.6374174253501083,0.5504370051176339],
[0.7308781907032909,0.41008081149220166,0.20771484130971707,0.3327170559595112],
[0.7311469360199058,0.9014476240300544,0.49682259343089075,0.9858769332362016],
[0.731057369148862,0.07099203475193139,0.06712000939049956,0.768156984078079],
[0.7306094602878371,0.9187140138555101,0.9186071189908658,0.6795571637816596],
[0.730519863614471,0.08825840967622589,0.4889045498516358,0.461837214623537],
[0.7307886238322471,0.5796252073129174,0.7780122870716483,0.11499709190022733],
[0.7306990420600421,0.7491696031336331,0.34830970303125697,0.8972771427421047]])
# load data
data = BatchOperator.fromDataframe(df, schemaStr="x1 double, x2 double, x3 double, x4 double")
OcsvmOutlierBatchOp() \
.setFeatureCols(["x1", "x2", "x3", "x4"]) \
.setGamma(0.5) \
.setNu(0.1) \
.setKernelType("RBF") \
.setPredictionCol("pred").linkFrom(data).print()