All Products
Search
Document Center

Platform For AI:Empirical Probability Density Chart

Last Updated:May 17, 2024

This topic describes the Empirical Probability Density Chart component provided by Machine Learning Designer.

The component uses kernel distribution to estimate the probability density of sample data. Similar to the function of a histogram, kernel distribution indicates the distribution of sample data. However, kernel distribution overlays the contributions of all parts to generate a smooth and continuous distribution curve, whereas a histogram only generates discrete descriptions. If the kernel density estimation function is used, the probability density of non-sample data points is not zero. Instead, the probability density is an overlay of the weighted probability densities of all sampling points in a specific kernel distribution. The Empirical Probability Density Chart component uses Gaussian distribution as the kernel density estimation function.

Configure the component

You can configure the parameters of the Empirical Probability Density Chart component by using one of the following methods:

Method 1: Configure the component in the Machine Learning Platform for AI (PAI) console

Configure the component parameters on the pipeline page of Machine Learning Designer. The following section describes the parameters.

Tab

Parameter

Description

Field Setting

Input Columns

The input columns. You can select only columns of the BIGINT or DOUBLE data type.

Label Column

The label column.

If you configure this parameter, the input columns are aggregated based on the values of the label column. For example, if a label column has two values (0 and 1), two results are returned.

Parameter Setting

Number of Calculation Intervals

The number of calculation intervals. A greater value indicates higher accuracy. The value of this parameter is calculated based on the range of values in each column.

Execution Tuning

Cores

The number of cores that you want to use. The value must be a positive integer.

Memory Size

The memory size of each core. Valid values: 1 to 65536. Unit: MB.

Method 2: Configure the parameters by using PAI commands

Configure the component parameters by using PAI commands. The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

PAI -name empirical_pdf
-project algo_public
-DinputTableName="test_data"
-DoutputTableName="test_epdf_out"
-DfeatureColNames="col0,col1,col2"
-DinputTablePartitions="ds='20160101'"
-Dlifecycle=1
-DintervalNum=100

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

None

outputTableName

Yes

The name of the output table.

None

featureColNames

Yes

The feature columns that are selected from the input table for training.

None

labelColName

No

The name of the label column in the input table.

None

inputTablePartitions

No

The partition that is selected from the input table for training. The following formats are supported:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

If you specify multiple partitions, separate the partitions with commas (,).

None

intervalNum

No

The number of calculation intervals. A greater value indicates higher accuracy. Valid values: [1,1E14).

None

lifecycle

No

The lifecycle of the output table.

None

coreNum

No

The number of cores that you want to use. The value must be a positive integer.

Automatically allocated

memSizePerCore

No

The memory size of each core. Valid values: 1 to 65536. Unit: MB.

Automatically allocated

Sample command

Execute the following SQL statements to generate input data:

    drop table if exists epdf_test;
    create table epdf_test as
    select
      *
    from
    (
      select 1.0 as col1
        union all
      select 2.0 as col1
        union all
      select 3.0 as col1
        union all
      select 4.0 as col1
        union all
      select 5.0 as col1
    ) tmp;

Run the following PAI command:

PAI -name empirical_pdf
-project algo_public
-DinputTableName=epdf_test
-DoutputTableName=epdf_test_out
-DfeatureColNames=col1;
  • Input description

    You can select multiple columns that you want to calculate. You can also select label columns and group these columns by label value. For example, the label columns contain the values 0 and 1. The columns are divided into two groups: label=0 and label=1. Then, the probability density of each group is provided.

    Note

    You can specify up to 100 label columns.

  • Output description

    A diagram and a result table are generated. The following table describes the columns that are contained in the result table. If no label columns are specified, NULL is displayed for the label column in the output table.

    Column

    Data type

    Description

    colName

    string

    The input column.

    label

    string

    The label column.

    x

    double

    Indicates the value of the x-axis. The value is calculated based on the interpolation results, not the actual value.

    pdf

    double

    The probability density.

    Output table

        +------------+------------+------------+------------+
        | colname    | label      | x          | pdf        |
        +------------+------------+------------+------------+
        | col1       | NULL       | 1.0        | 0.12775155176809325 |
        | col1       | NULL       | 1.0404050505050506 | 0.1304256933829622 |
        | col1       | NULL       | 1.0808101010101012 | 0.13306325897429525 |
        | col1       | NULL       | 1.1212151515151518 | 0.1356613897616418 |
        | col1       | NULL       | 1.1616202020202024 | 0.1382173796574596 |
        | col1       | NULL       | 1.202025252525253 | 0.1407286844875733 |
        | col1       | NULL       | 1.2424303030303037 | 0.14319293014274642 |
        | col1       | NULL       | 1.2828353535353543 | 0.14560791960033242 |
        | col1       | NULL       | 1.3232404040404049 | 0.14797163876379316 |
        | col1       | NULL       | 1.3636454545454555 | 0.1502822610772349 |
        | col1       | NULL       | 1.404050505050506 | 0.1525381508819247 |
        | col1       | NULL       | 1.4444555555555567 | 0.1547378654919243 |
        | col1       | NULL       | 1.4848606060606073 | 0.1568801559764068 |
        | col1       | NULL       | 1.525265656565658 | 0.15896396664681753 |
        | col1       | NULL       | 1.5656707070707085 | 0.16098843325768245 |
        | col1       | NULL       | 1.6060757575757592 | 0.1629528799404685 |
        | col1       | NULL       | 1.6464808080808098 | 0.16485681490034038 |
        | col1       | NULL       | 1.6868858585858604 | 0.16669992491584543 |
        | col1       | NULL       | 1.727290909090911 | 0.16848206869138338 |
        | col1       | NULL       | 1.7676959595959616 | 0.17020326912168932 |
        | col1       | NULL       | 1.8081010101010122 | 0.17186370453638117 |
        | col1       | NULL       | 1.8485060606060628 | 0.17346369900080946 |
        | col1       | NULL       | 1.8889111111111134 | 0.17500371175692428 |
        | col1       | NULL       | 1.929316161616164 | 0.17648432589456017 |
        | col1       | NULL       | 1.9697212121212146 | 0.17790623634938396 |
        | col1       | NULL       | 2.0101262626262653 | 0.1792702373286898 |
        | col1       | NULL       | 2.050531313131316 | 0.18057720927022053 |
        | col1       | NULL       | 2.0909363636363665 | 0.18182810544221673 |
        | col1       | NULL       | 2.131341414141417 | 0.18302393829491406 |
        | col1       | NULL       | 2.1717464646464677 | 0.18416576567472337 |
        | col1       | NULL       | 2.2121515151515183 | 0.1852546770123305 |
        | col1       | NULL       | 2.252556565656569 | 0.18629177959496213 |
        | col1       | NULL       | 2.2929616161616195 | 0.18727818503109434 |
        | col1       | NULL       | 2.33336666666667 | 0.18821499601297229 |
        | col1       | NULL       | 2.3737717171717208 | 0.18910329347850022 |
        | col1       | NULL       | 2.4141767676767714 | 0.18994412426940221 |
        | col1       | NULL       | 2.454581818181822 | 0.19073848937711185 |
        | col1       | NULL       | 2.4949868686868726 | 0.19148733286168018 |
        | col1       | NULL       | 2.535391919191923 | 0.1921915315221827 |
        | col1       | NULL       | 2.575796969696974 | 0.19285188538972659 |
        | col1       | NULL       | 2.6162020202020244 | 0.19346910910630113 |
        | col1       | NULL       | 2.656607070707075 | 0.19404382424446043 |
        | col1       | NULL       | 2.6970121212121256 | 0.1945765526142701 |
        | col1       | NULL       | 2.7374171717171762 | 0.19506771059517916 |
        | col1       | NULL       | 2.777822222222227 | 0.19551760452158667 |
        | col1       | NULL       | 2.8182272727272775 | 0.19592642714194602 |
        | col1       | NULL       | 2.858632323232328 | 0.1962942551623821 |
        | col1       | NULL       | 2.8990373737373787 | 0.1966210478770638 |
        | col1       | NULL       | 2.9394424242424293 | 0.1969066468790639 |
        | col1       | NULL       | 2.97984747474748 | 0.19715077683721793 |
        | col1       | NULL       | 3.0202525252525305 | 0.19735304731663747 |
        | col1       | NULL       | 3.060657575757581 | 0.19751295561309964 |
        | col1       | NULL       | 3.1010626262626317 | 0.19762989056457925 |
        | col1       | NULL       | 3.1414676767676823 | 0.19770313729675995 |
        | col1       | NULL       | 3.181872727272733 | 0.19773188285349683 |
        | col1       | NULL       | 3.2222777777777836 | 0.19771522265793107 |
        | col1       | NULL       | 3.262682828282834 | 0.19765216774530828 |
        | col1       | NULL       | 3.303087878787885 | 0.19754165270453194 |
        | col1       | NULL       | 3.3434929292929354 | 0.19738254426210697 |
        | col1       | NULL       | 3.383897979797986 | 0.19717365043938664 |
        | col1       | NULL       | 3.4243030303030366 | 0.19691373021193162 |
        | col1       | NULL       | 3.4647080808080872 | 0.1966015035982942 |
        | col1       | NULL       | 3.505113131313138 | 0.19623566210464843 |
        | col1       | NULL       | 3.5455181818181885 | 0.19581487945135703 |
        | col1       | NULL       | 3.585923232323239 | 0.19533782250778076 |
        | col1       | NULL       | 3.6263282828282897 | 0.1948031623623475 |
        | col1       | NULL       | 3.6667333333333403 | 0.1942095854560816 |
        | col1       | NULL       | 3.707138383838391 | 0.19355580470939734 |
        | col1       | NULL       | 3.7475434343434415 | 0.19284057057394655 |
        | col1       | NULL       | 3.787948484848492 | 0.19206268194364004 |
        | col1       | NULL       | 3.8283535353535427 | 0.19122099686158253 |
        | col1       | NULL       | 3.8687585858585933 | 0.19031444296253852 |
        | col1       | NULL       | 3.909163636363644 | 0.1893420275936375 |
        | col1       | NULL       | 3.9495686868686946 | 0.18830284755928747 |
        | col1       | NULL       | 3.989973737373745 | 0.1871960984396676 |
        | col1       | NULL       | 4.030378787878796 | 0.18602108343567092 |
        | col1       | NULL       | 4.070783838383846 | 0.18477722169674377 |
        | col1       | NULL       | 4.111188888888897 | 0.1834640560916829 |
        | col1       | NULL       | 4.151593939393948 | 0.1820812603860928 |
        | col1       | NULL       | 4.191998989898998 | 0.18062864579383914 |
        | col1       | NULL       | 4.232404040404049 | 0.179106166873458 |
        | col1       | NULL       | 4.272809090909099 | 0.17751392674406796 |
        | col1       | NULL       | 4.31321414141415 | 0.17585218159888508 |
        | col1       | NULL       | 4.353619191919201 | 0.17412134449794325 |
        | col1       | NULL       | 4.394024242424251 | 0.1723219884250765 |
        | col1       | NULL       | 4.434429292929302 | 0.17045484859762067 |
        | col1       | NULL       | 4.4748343434343525 | 0.16852082402064342 |
        | col1       | NULL       | 4.515239393939403 | 0.1665209782808102 |
        | col1       | NULL       | 4.555644444444454 | 0.16445653957824907 |
        | col1       | NULL       | 4.596049494949504 | 0.16232889999798905 |
        | col1       | NULL       | 4.636454545454555 | 0.16013961402571825 |
        | col1       | NULL       | 4.6768595959596055 | 0.1578903963157465 |
        | col1       | NULL       | 4.717264646464656 | 0.15558311872216193 |
        | col1       | NULL       | 4.757669696969707 | 0.1532198066072439 |
        | col1       | NULL       | 4.798074747474757 | 0.1508026344442397 |
        | col1       | NULL       | 4.838479797979808 | 0.14833392073462115 |
        | col1       | NULL       | 4.878884848484859 | 0.14581612226291346 |
        | col1       | NULL       | 4.919289898989909 | 0.1432518277151203 |
        | col1       | NULL       | 4.95969494949496 | 0.1406437506896507 |
        | col1       | NULL       | 5.00010000000001 | 0.13799472213247665 |
        +------------+------------+------------+------------+