A Lorenz curve can be used to show the income distribution of a country or region. The slope of the curve indicates the degree of income inequality. The greater the slope, the more unequal the income distribution.
In a rectangle, the height represents the total wealth and is equally divided into N parts. The length represents the families arranged from least wealthy to most wealthy. The length is also equally divided into N parts. The first part indicates the least wealthy 1/N families. The points, each of which indicates a wealth proportion of 1/N families, are connected to form a Lorenz curve.
Configure the component
You can use one of the following methods to configure the Lorenz Curve component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Lorenz Curve component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Columns | N/A |
Parameters Setting | Quantile | Default value: 100. |
Tuning | Computing Cores | The number of cores used in computing. The value must be a positive integer. |
Memory Size per Core (Unit: MB) | The memory size of each core. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name LorenzCurve
-project algo_public
-DinputTableName=maple_test_lorenz_basic10_input
-DcolName=col0
-DoutputTableName=maple_test_lorenz_basic10_output -DcoreNum=20
-DmemSizePerCore=110;
Parameter | Required | Description | Default value |
inputTableName | Yes | The name of the input table. | No default value |
outputTableName | Yes | The name of the output table. | No default value |
ColName | No | The columns selected from the input table. You can select multiple columns and separate them with commas (,). | No default value |
N | No | The quantile. | 100 |
inputTablePartitions | No | The partitions that are selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | No default value |
lifecycle | No | The lifecycle of the output table. This value must be an integer. Unit: days. | 28 |
coreNum | No | This parameter is used with memSizePerCore. The value must be a positive integer. The system calculates the number of instances based on the amount of input data. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. The value must be a positive integer. Recommended values: (1024,64 × 1024). | Determined by the system |
Example
Generate the following test data:
col0:double
4
7
2
8
6
3
9
5
0
1
10
Run the following PAI command:
PAI -name LorenzCurve -project algo_public -DinputTableName=maple_test_lorenz_basic10_input -DcolName=col0 -DoutputTableName=maple_test_lorenz_basic10_output -DcoreNum=20 -DmemSizePerCore=110;
View the output as described in the following table.
quantile
col0
0
0
1
0.01818181818181818
2
0.01818181818181818
3
0.01818181818181818
4
0.01818181818181818
5
0.01818181818181818
6
0.01818181818181818
7
0.01818181818181818
8
0.01818181818181818
9
0.01818181818181818
10
0.01818181818181818
11
0.05454545454545454
12
0.05454545454545454
13
0.05454545454545454
14
0.05454545454545454
...
...
85
0.8181818181818182
86
0.8181818181818182
87
0.8181818181818182
88
0.8181818181818182
89
0.8181818181818182
90
1
91
1
92
1
93
1
94
1
95
1
96
1
97
1
98
1
99
1
100
1