A parameter server (PS) is used to process a large number of offline and online training jobs. Scalable Multiple Additive Regression Tree (SMART) is an iterative algorithm that is implemented by using a PS-based gradient boosting decision tree (GBDT). The PS-SMART Multiclass Classification component of Platform for AI (PAI) supports training jobs for tens of billions of samples and hundreds of thousands of features. The component can run training jobs on thousands of nodes. The component also supports multiple data formats and optimization technologies, such as approximation by using histograms.
Limits
The input data of the PS-SMART Multiclass Classification component must meet the following requirements:
Data in the destination columns must be of numeric data types. If the data type in the MaxCompute table is STRING, the data must be converted into a numeric data type. For example, if the classification object is a string, such as Good/Medium/Bad, you must convert the string into 0/1/2.
If the data is in the key-value format, feature IDs must be positive integers and feature values must be real numbers. If the feature IDs are of the STRING type, you must use the serialization component to serialize the data. If the feature values are categorical strings, you must perform feature engineering, such as feature discretization, to process the values.
The PS-SMART Multiclass Classification component supports hundreds of thousands of feature-related jobs. However, these jobs are resource-intensive and time-consuming. To resolve this issue, you can use GBDT algorithms in the training. GBDT algorithms are suitable for scenarios in which continuous features are used for training. You can perform one-hot encoding on categorical features to filter low-frequency features. We recommend that you do not perform feature discretization on continuous features of numeric data types.
The PS-SMART algorithm may introduce randomness. For example, randomness may be introduced in the following scenarios: data and feature sampling based on data_sample_ratio and fea_sample_ratio, optimization of the PS-SMART algorithm by using histograms for approximation, and merging of a local sketch into a global sketch. The structures of trees vary when jobs run on multiple worker nodes in distributed mode. However, the training effect of the model is theoretically the same. You may obtain different results even if you use the same data and parameters during training.
If you want to accelerate training, you can set the Cores parameter to a larger value. The PS-SMART algorithm starts training jobs after the required resources are provided. The waiting period increases with the amount of the requested resources.
Usage notes
When you use the PS-SMART Multiclass Classification component, take note of the following items:
The PS-SMART Multiclass Classification component supports hundreds of thousands of feature-related jobs. However, these jobs are resource-intensive and time-consuming. To resolve this issue, you can use GBDT algorithms in the training. GBDT algorithms are suitable for scenarios in which continuous features are used for training. You can perform one-hot encoding on categorical features to filter low-frequency features. We recommend that you do not perform feature discretization on continuous features of numeric data types.
The PS-SMART algorithm may introduce randomness. For example, randomness may be introduced in the following scenarios: data and feature sampling based on data_sample_ratio and fea_sample_ratio, optimization of the PS-SMART algorithm by using histograms for approximation, and merging of a local sketch into a global sketch. The structures of trees vary when jobs run on multiple worker nodes in distributed mode. However, the training effect of the model is theoretically the same. You may obtain different results even if you use the same data and parameters during training.
If you want to accelerate training, you can set the Cores parameter to a larger value. The PS-SMART algorithm starts training jobs after the required resources are provided. The waiting period increases with the amount of the requested resources.
Configure the component
You can use one of the following methods to configure the PS-SMART Multiclass Classification component.
Method 1: Configure the component in the PAI console
Configure the component on the pipeline page of Machine Learning Designer. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Use Sparse Format | Specify whether the input data is in the sparse format. If the input data is sparse data in the key-value format, separate key-value pairs with spaces, and separate keys and values with colons (:). Example: 1:0.3 3:0.9. |
Feature Columns | Select the feature columns for training from the input table. If the data in the input table is in the dense format, only the columns of the BIGINT and DOUBLE types are supported. If the data in the input table is key-value pairs in the sparse format, and keys and values are of numeric data types, only columns of the STRING type are supported. | |
Label Column | The label column in the input table. Columns of the STRING and numeric data types are supported. However, only data of numeric data types can be stored in the columns. For example, column values can be 0 or 1 in binary classification. | |
Weight Column | Select the column that contains the weight of each row of samples. Columns of numeric data types are supported. | |
Parameters Setting | Classes | The number of classes for multiclass classification. If you set the parameter to n, the values of the label column are {0,1,2,...,n-1}. |
Evaluation Indicator Type | You can set this parameter to Multiclass Negative Log Likelihood or Multiclass Classification Error. | |
Trees | The number of trees. The value must be a positive integer. The value of Trees is proportional to the training duration. | |
Maximum Decision Tree Depth | The default value is 5, which indicates that up to 32 leaf nodes can be configured. | |
Data Sampling Ratio | The data sampling ratio when trees are built. The sample data is used to build a weak learner to accelerate training. | |
Feature Sampling Fraction | The feature sampling ratio when trees are built. The sample features are used to build a weak learner to accelerate training. | |
L1 Penalty Coefficient | The size of a leaf node. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value. | |
L2 Penalty Coefficient | The size of a leaf node. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value. | |
Learning Rate | Enter the learning rate. Valid values: (0,1). | |
Sketch-based Approximate Precision | Enter the threshold for selecting quantiles when you build a sketch. A smaller value indicates that more bins can be obtained. In most cases, the default value 0.03 is used. | |
Minimum Split Loss Change | Enter the minimum loss change required for splitting a node. A larger value indicates a lower probability of node splitting. | |
Features | Enter the number of features or the maximum feature ID. Configure this parameter if you want to assess resource usage. | |
Global Offset | Enter the initial prediction values of all samples. | |
Random Seed | Enter the random seed. The value of this parameter must be an integer. | |
Feature Importance Type | The type of feature. Valid values: Weight, Gain, and Cover. Weight indicates the number of splits of the feature. Gain indicates the information gain provided by the feature. Cover indicates the number of samples covered by the feature on the split node. | |
Tuning | Cores | The number of cores. By default, the system determines the value. |
Memory Size per Core (MB) | The memory size of each core. Unit: MB. In most cases, the system determines the memory size. |
Method 2: Configure the component by using PAI commands
The following table describes the parameters that are used in PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
# Training
PAI -name ps_smart
-project algo_public
-DinputTableName="smart_multiclass_input"
-DmodelName="xlab_m_pai_ps_smart_bi_545859_v0"
-DoutputTableName="pai_temp_24515_545859_2"
-DoutputImportanceTableName="pai_temp_24515_545859_3"
-DlabelColName="label"
-DfeatureColNames="features"
-DenableSparse="true"
-Dobjective="multi:softprob"
-Dmetric="mlogloss"
-DfeatureImportanceType="gain"
-DtreeCount="5";
-DmaxDepth="5"
-Dshrinkage="0.3"
-Dl2="1.0"
-Dl1="0"
-Dlifecycle="3"
-DsketchEps="0.03"
-DsampleRatio="1.0"
-DfeatureRatio="1.0"
-DbaseScore="0.5"
-DminSplitLoss="0"
# Prediction
PAI -name prediction
-project algo_public
-DinputTableName="smart_multiclass_input";
-DmodelName="xlab_m_pai_ps_smart_bi_545859_v0"
-DoutputTableName="pai_temp_24515_545860_1"
-DfeatureColNames="features"
-DappendColNames="label,features"
-DenableSparse="true"
-DkvDelimiter=":"
-Dlifecycle="28"
Module | Parameter | Required | Description | Default value |
Data parameters | featureColNames | Yes | The feature columns that are selected from the input table for training. If data in the input table is in the dense format, only the columns of the BIGINT and DOUBLE types are supported. If data in the input table is sparse data in the key-value format, and keys and values are of numeric data types, only columns of the STRING data type are supported. | N/A |
labelColName | Yes | The label column in the input table. Columns of the STRING type and numeric data types are supported. However, only data of numeric data types can be stored in the columns. For example, column values can be {0,1,2,…,n-1} in multiclass classification. n indicates the number of classes. | N/A | |
weightCol | No | Select the column that contains the weight of each row of samples. Columns of numeric data types are supported. | N/A | |
enableSparse | No | Specify whether the input data is in the sparse format. Valid values: true and false. If the input data is sparse data in the key-value format, separate key-value pairs with spaces, and separate keys and values with colons (:). Example: 1:0.3 3:0.9. | false | |
inputTableName | Yes | The name of the input table. | N/A | |
modelName | Yes | The name of the output model. | N/A | |
outputImportanceTableName | No | The name of the table that contains feature importance. | N/A | |
inputTablePartitions | No | The partitions that are selected from the input table for training. Format: ds=1/pt=1. | N/A | |
outputTableName | No | The MaxCompute table that is generated. The table is a binary file that cannot be read and can be obtained only by using the PS-SMART prediction component. | N/A | |
lifecycle | No | The lifecycle of the output table. | 3 | |
Algorithm parameters | classNum | Yes | The number of classes for multiclass classification. If you set this parameter to n, the values of the label column are {0,1,2,...,n-1}. | N/A |
objective | Yes | The type of the objective function. If you use multiclass classification for training, specify the multi:softprob objective function. | N/A | |
metric | No | The evaluation metric type of the training set, which is specified in stdout of the coordinator in a logview. Valid values:
| N/A | |
treeCount | No | The number of trees. The value is proportional to the amount of training time. | 1 | |
maxDepth | No | The maximum depth of a tree. Valid values: 1 to 20. | 5 | |
sampleRatio | No | The data sampling ratio. Valid values: (0,1]. If you set this parameter to 1.0, no data is sampled. | 1.0 | |
featureRatio | No | The feature sampling ratio. Valid values: (0,1]. If you set this parameter to 1.0, no data is sampled. | 1.0 | |
l1 | No | The L1 penalty coefficient. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value. | 0 | |
l2 | No | The L2 penalty coefficient. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value. | 1.0 | |
shrinkage | No | Valid values: (0,1). | 0.3 | |
sketchEps | No | The threshold for selecting quantiles when you build a sketch. The number of bins is O(1.0/sketchEps). A smaller value indicates that more bins can be obtained. In most cases, the default value is used. Valid values: (0,1). | 0.03 | |
minSplitLoss | No | The minimum loss change required for splitting a node. A larger value indicates a lower probability of node splitting. | 0 | |
featureNum | No | The number of features or the maximum feature ID. Configure this parameter if you want to assess resource usage. | N/A | |
baseScore | No | The initial prediction values of all samples. | 0.5 | |
randSeed | No | The random seed. The value of this parameter must be an integer. | N/A | |
featureImportanceType | No | The type of the feature importance. Valid values:
| gain | |
Tuning parameters | coreNum | No | The number of cores used in computing. The speed of the computing algorithm increases with the value of this parameter. | Automatically allocated |
memSizePerCore | No | The memory size of each core. Unit: MB. | Automatically allocated |
Examples
Create a table named smart_multiclass_input by using the ODPS SQL node. For more information, see Develop a MaxCompute SQL task. In this example, input data in the key-value format is generated.
drop table if exists smart_multiclass_input; create table smart_multiclass_input lifecycle 3 as select * from ( select '2' as label, '1:0.55 2:-0.15 3:0.82 4:-0.99 5:0.17' as features union all select '1' as label, '1:-1.26 2:1.36 3:-0.13 4:-2.82 5:-0.41' as features union all select '1' as label, '1:-0.77 2:0.91 3:-0.23 4:-4.46 5:0.91' as features union all select '2' as label, '1:0.86 2:-0.22 3:-0.46 4:0.08 5:-0.60' as features union all select '1' as label, '1:-0.76 2:0.89 3:1.02 4:-0.78 5:-0.86' as features union all select '1' as label, '1:2.22 2:-0.46 3:0.49 4:0.31 5:-1.84' as features union all select '0' as label, '1:-1.21 2:0.09 3:0.23 4:2.04 5:0.30' as features union all select '1' as label, '1:2.17 2:-0.45 3:-1.22 4:-0.48 5:-1.41' as features union all select '0' as label, '1:-0.40 2:0.63 3:0.56 4:0.74 5:-1.44' as features union all select '1' as label, '1:0.17 2:0.49 3:-1.50 4:-2.20 5:-0.35' as features ) tmp;
The following figure shows the generated data.
Create a pipeline as shown in the following figure. For more information, see Generate a model.
Configure the component parameters.
Click the Read Table -1 component on the canvas. On the Select Table tab on the right, set the Table Name parameter to smart_multiclass_input.
Configure the parameters for the PS-SMART Multiclass Classification component. The following table describes the parameters. Use the default values of the parameters that are not included in the table.
Tab
Parameter
Description
Fields Setting
Feature Columns
Select the features column.
Label Column
Select the label column.
Use Sparse Format
Select Use Sparse Format.
Parameters Setting
Classes
Set the parameter to 3.
Evaluation Indicator Type
Select Multiclass Negative Log Likelihood from the drop-down list.
Trees
Set the parameter to 5.
Configure the parameters for the Prediction-1 component. The following table describes the parameters. Use the default values of the parameters that are not included in the table.
Tab
Parameter
Description
Fields Setting
Feature Columns
Select the features column.
Reserved Columns
Select the label and features columns.
Sparse Matrix
Select Sparse Matrix.
KV Delimiter
Set the parameter to a colon (:).
KV Pair Delimiter
If you leave the field empty, a space is used as a delimiter.
Click the Write Table -1 component on the canvas. On the Select Table tab on the right, set the Table Name parameter to smart_multiclass_output.
Click the icon on the canvas to run the pipeline.
After you run the pipeline, right-click the Prediction -1 component and choose
to view the prediction results. Parameters:prediction_detail: the classes used for multiclass classification. Valid values: 0, 1, and 2.
prediction_result: the classes of the prediction results.
prediction_score: the probabilities of classes in the prediction_result column.
On the canvas, right-click the PS-SMART Multiclass Classification component and choose
to view the feature importance result.Parameters:
id: the ID of a passed feature. In this example, the input data is in the key-value format. The values in the id column indicate the keys in the key-value pairs.
value: the type of feature importance. The default value is gain, which indicates the sum of information gains provided by a feature for the model.
PS-SMART model deployment
If you want to deploy the model generated by the PS-SMART Binary Classification Training component to EAS as an online service, you must add the Model export component as a downstream node for the PS-SMART Binary Classification Training component and configure the Model export component. For more information, see Model export.
After the Model export component is successfully run, you can deploy the generated model to EAS as an online service on the EAS-Online Model Services page. For more information, see Model service deployment by using the PAI console.