多分類評估演算法用於評估一個模型在處理多於兩個類別的分類問題中的效能。該演算法計算諸如準確率、召回率、F1分數以及混淆矩陣等指標,以量化模型對不同類別的分類精度。混淆矩陣展示了模型預測的類別與真實類別之間的關係,而其他指標則提供了每個類別分類正確與否的細節資訊。這些度量協助瞭解模型在各個類別上的表現,指導後續的模型最佳化。
配置組件
方法一:可視化方式
在Designer工作流程頁面添加多分類評估組件,並在介面右側配置相關參數:
參數類型 | 參數 | 描述 |
欄位設定 | 原分類結果列 | 可以選擇原始標籤列,分類數量不能大於1000。 |
預測分類結果列 | 預測分類列,一般情況下,該參數的欄位名為prediction_result。 | |
進階選項 | 如果選中進階選項複選框,則預測結果機率列參數生效。 | |
預測結果機率列 | 用於計算模型的logloss,且僅對隨機森林模型有效,其他模型設定後可能會報錯;一般情況下,該參數的欄位名為prediction_detail。 | |
執行調優 | 核心數 | 與核記憶體配置搭配使用,預設為系統自動分配。 |
核記憶體配置 | 每個核心的記憶體,單位:MB,預設為系統自動分配。 |
方法二:PAI命令方式
使用PAI命令配置多分類評估組件參數。您可以使用SQL指令碼組件進行PAI命令調用,詳情請參見情境4:在SQL指令碼組件中執行PAI命令。
PAI -name MultiClassEvaluation -project algo_public
-DinputTableName="test_input"
-DoutputTableName="test_output"
-DlabelColName="label"
-DpredictionColName="prediction_result"
-Dlifecycle=30;
參數 | 是否必選 | 預設值 | 參數描述 |
inputTableName | 是 | 無 | 輸入表的名稱。 |
inputTablePartitions | 否 | 全表 | 輸入表的分區。 |
outputTableName | 是 | 無 | 輸出表的名稱。 |
labelColName | 是 | 無 | 輸入表原始標籤列名。 |
predictionColName | 是 | 無 | 預測結果的標籤列名。 |
predictionDetailColName | 否 | 空 | 預測結果的機率列,例如 |
lifecycle | 否 | 無 | 輸出表的生命週期。 |
coreNum | 否 | 系統自動計算 | 核心數量。 |
memSizePerCore | 否 | 系統自動計算 | 每個核心的記憶體。 |
使用樣本
添加SQL指令碼組件,輸入以下SQL語句產生訓練資料。
drop table if exists multi_esti_test; create table multi_esti_test as select * from ( select '0' as id,'A' as label,'A' as prediction,'{"A": 0.6, "B": 0.4}' as detail union all select '1' as id,'A' as label,'B' as prediction,'{"A": 0.45, "B": 0.55}' as detail union all select '2' as id,'A' as label,'A' as prediction,'{"A": 0.7, "B": 0.3}' as detail union all select '3' as id,'A' as label,'A' as prediction,'{"A": 0.9, "B": 0.1}' as detail union all select '4' as id,'B' as label,'B' as prediction,'{"A": 0.2, "B": 0.8}' as detail union all select '5' as id,'B' as label,'B' as prediction,'{"A": 0.1, "B": 0.9}' as detail union all select '6' as id,'B' as label,'A' as prediction,'{"A": 0.52, "B": 0.48}' as detail union all select '7' as id,'B' as label,'B' as prediction,'{"A": 0.4, "B": 0.6}' as detail union all select '8' as id,'B' as label,'A' as prediction,'{"A": 0.6, "B": 0.4}' as detail union all select '9' as id,'A' as label,'A' as prediction,'{"A": 0.75, "B": 0.25}' as detail )tmp;
添加SQL指令碼組件,輸入以下PAI命令進行訓練。
drop table if exists ${o1}; PAI -name MultiClassEvaluation -project algo_public -DinputTableName="multi_esti_test" -DoutputTableName=${o1} -DlabelColName="label" -DpredictionColName="prediction" -Dlifecycle=30;
右擊上一步的組件,選擇查看資料 > SQL指令碼的輸出,查看訓練結果。
| result | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | { "ActualLabelFrequencyList": [5, 5], "ActualLabelProportionList": [0.5, 0.5], "ConfusionMatrix": [[4, 1], [2, 3]], "LabelList": ["A", "B"], "LabelMeasureList": [{ "Accuracy": 0.7, "F1": 0.7272727272727273, "FalseDiscoveryRate": 0.3333333333333333, "FalseNegative": 1, "FalseNegativeRate": 0.2, "FalsePositive": 2, "FalsePositiveRate": 0.4, "Kappa": 0.3999999999999999, "NegativePredictiveValue": 0.75, "Precision": 0.6666666666666666, "Sensitivity": 0.8, "Specificity": 0.6, "TrueNegative": 3, "TruePositive": 4}, { "Accuracy": 0.7, "F1": 0.6666666666666666, "FalseDiscoveryRate": 0.25, "FalseNegative": 2, "FalseNegativeRate": 0.4, "FalsePositive": 1, "FalsePositiveRate": 0.2, "Kappa": 0.3999999999999999, "NegativePredictiveValue": 0.6666666666666666, "Precision": 0.75, "Sensitivity": 0.6, "Specificity": 0.8, "TrueNegative": 4, "TruePositive": 3}], "LabelNumber": 2, "OverallMeasures": { "Accuracy": 0.7, "Kappa": 0.3999999999999999, "LabelFrequencyBasedMicro": { "Accuracy": 0.7, "F1": 0.696969696969697, "FalseDiscoveryRate": 0.2916666666666666, "FalseNegative": 1.5, "FalseNegativeRate": 0.3, "FalsePositive": 1.5, "FalsePositiveRate": 0.3, "Kappa": 0.3999999999999999, "NegativePredictiveValue": 0.7083333333333333, "Precision": 0.7083333333333333, "Sensitivity": 0.7, "Specificity": 0.7, "TrueNegative": 3.5, "TruePositive": 3.5}, "MacroAveraged": { "Accuracy": 0.7, "F1": 0.696969696969697, "FalseDiscoveryRate": 0.2916666666666666, "FalseNegative": 1.5, "FalseNegativeRate": 0.3, "FalsePositive": 1.5, "FalsePositiveRate": 0.3, "Kappa": 0.3999999999999999, "NegativePredictiveValue": 0.7083333333333333, "Precision": 0.7083333333333333, "Sensitivity": 0.7, "Specificity": 0.7, "TrueNegative": 3.5, "TruePositive": 3.5}, "MicroAveraged": { "Accuracy": 0.7, "F1": 0.7, "FalseDiscoveryRate": 0.3, "FalseNegative": 3, "FalseNegativeRate": 0.3, "FalsePositive": 3, "FalsePositiveRate": 0.3, "Kappa": 0.3999999999999999, "NegativePredictiveValue": 0.7, "Precision": 0.7, "Sensitivity": 0.7, "Specificity": 0.7, "TrueNegative": 7, "TruePositive": 7}}, "PredictedLabelFrequencyList": [6, 4], "PredictedLabelProportionList": [0.6, 0.4], "ProportionMatrix": [[0.8, 0.2], [0.4, 0.6]]} |
附錄
如果您通過可視化方式運行多分類評估演算法,可右擊該組件,選擇可視化分析,查看結果詳情。