Histogram (Multiple Columns) - Platform For AI - Alibaba Cloud Documentation Center

Visualized Modeling (Designer) of Platform for AI (PAI) provides the histogram component. A histogram is also known as a mass distribution profile. A histogram is a statistical report chart that consists of a series of vertical stripes or line segments with different heights to show the data distribution. The horizontal axis represents the data types, and the vertical axis represents the data distribution.

Configure the component

You can use one of the following methods to configure the Histogram (Multiple Columns) component.

Method 1: Configure the component on the pipeline page

On the pipeline page of Designer, search for Histogram (Multiple Columns) in the left-side pane. Drag it into the canvas and connect it to upstream nodes. Then, click the component to configure the parameters.

Tab	Parameter	Description
Fields Setting	Select Column	Select the columns to be analyzed. Only the DOUBLE and BIGINT types are supported. A maximum of 1,024 columns are supported.
Parameters Setting	Intervals	The number of intervals into which the data is divided.
Tuning	Cores	The number of cores that are used in computing. The value must be a positive integer. By default, the value is automatically selected.
Tuning	Memory Size per Core	The memory size of each core. Valid values: 1 to 65536. Unit: MB. By default, the value is automatically selected.

After the node is run, right-click the node and choose Visual Analysis or View Data to view its output.

Method 2: Use PAI commands

Configure the component parameters by using the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name histogram
      -project algo_public
      -DinputTableName=maple_histogram_1to20_input
      -DoutputTableName=maple_histogram_1to20_output
      -DselectedColNames=col0,col1 -DintervalNum=20;

Parameter	Required	Description	Default value
inputTableName	Yes	The name of the input table.	No default value
inputTablePartitions	No	The partitions that are selected from the input table for training. The following formats are supported: Partition_name=value name1=value1/name2=value2: multi-level partitions Note If you specify multiple partitions, separate them with commas (,).	No default value
outputTableName	Yes	The name of the output table.	No default value
selectedColNames	Yes	The names of the columns selected from the input table for training. Separate the names of multiple columns with commas (,). The INT and DOUBLE types are supported. A maximum of 1,024 columns are supported.	No default value
intervalNum	No	The number of intervals into which the data is divided.	100
lifecycle	No	The lifecycle of the table.	No default value
coreNum	No	The number of cores that are used in computing. The value must be a positive integer. Valid values: [1,9999].	Automatically selected by the system
memSizePerCore	No	The memory size of each core. Valid values: 1 to 65536. Unit: MB.	Automatically selected by the system

Example

Example for Method 2: Use PAI commands

Search for SQL Script in the left-side pane and drag it into the canvas.
Connect it to upstream nodes to obtain the data. Sample data:
Sample data
col0 (BIGINT)
col1 (DOUBLE)
1
1.0
2
2.0
3
3.0
4
4.0
5
5.0
6
6.0
7
7.0
8
8.0
9
9.0
10
10.0
11
11.0
12
12.0
13
13.0
14
14.0
15
15.0
16
16.0
17
17.0
18
18.0
19
19.0
20
20.0

Configure the following PAI command for the SQL script node.

PAI -name histogram -project algo_public  --Default parameter. You do not need to change it     
    -DinputTableName=maple_histogram_1to20_input  --Name of input table
    -DoutputTableName=maple_histogram_1to20_output  --Name of output table
    -DselectedColNames=col0,col1  --The selected columns
    -DintervalNum=20;  --The number of intervals

Right-click the SQL script node and choose Run Current Node.
If upstream nodes are not run, run them first to read the data.

View the result from the output table. Sample output:

colname	histogram
col0	[1, 1.95):1;[1.95, 2.9):1;[2.9, 3.85):1;[3.85, 4.8):1;[4.8, 5.75):1;[5.75, 6.7):1;[6.7, 7.65):1;[7.65, 8.6):1;[8.6, 9.55):1;[9.55, 10.5):1;[10.5, 11.45):1;[11.45, 12.4):1;[12.4, 13.35):1;[13.35, 14.3):1;[14.3, 15.25):1;[15.25, 16.2):1;[16.2, 17.15):1;[17.15, 18.1):1;[18.1, 19.05):1;[19.05, 20]:1
col1	[1, 1.95):1;[1.95, 2.9):1;[2.9, 3.85):1;[3.85, 4.8):1;[4.8, 5.75):1;[5.75, 6.7):1;[6.7, 7.65):1;[7.65, 8.6):1;[8.6, 9.55):1;[9.55, 10.5):1;[10.5, 11.45):1;[11.45, 12.4):1;[12.4, 13.35):1;[13.35, 14.3):1;[14.3, 15.25):1;[15.25, 16.2):1;[16.2, 17.15):1;[17.15, 18.1):1;[18.1, 19.05):1;[19.05, 20]:1

col0 (BIGINT)	col1 (DOUBLE)
1	1.0
2	2.0
3	3.0
4	4.0
5	5.0
6	6.0
7	7.0
8	8.0
9	9.0
10	10.0
11	11.0
12	12.0
13	13.0
14	14.0
15	15.0
16	16.0
17	17.0
18	18.0
19	19.0
20	20.0