This topic describes the Scatter Plot component provided by Machine Learning Studio.
In regression analysis, a scatter chart shows the distribution of data points in a Cartesian coordinate system.
Configure the component
You can use one of the following methods to configure the Scatter Plot component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Scatter Plot component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Parameter | Description |
---|---|
Feature Columns | The columns to represent the features of data in training samples. |
Label Column | The label column. |
Samples | The number of samples. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name scatter_diagram -project algo_public
-DselectedCols=emp_var_rate,cons_price_rate,cons_conf_idx,euribor3m
-DlabelCol=y
-DmapTable=pai_temp_2447_22859_2
-DinputTable=scatter_diagram
-DoutputTable=pai_temp_2447_22859_1;
Parameter | Required | Description | Default value |
---|---|---|---|
inputTable | Yes | The name of the input table. | No default value |
inputTablePartitions | No | The partitions that are selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | No default value |
outputTable | Yes | The name of the output table. | No default value |
mapTable | Yes | The name of the output table that stores the maximum value, minimum value, and enumeration value of each feature. | No default value |
selectedCols | Yes | The columns selected from the input table and used to draw a scatter chart. A maximum of five columns can be selected. | No default value |
labelCol | Yes | The INT- or STRING- type column that you want to use as the label column. | Empty |
lifecycle | Yes | The lifecycle of the output table. Unit: days. | 28 |
Example
- Input data
create table scatter_diagram as select emp_var_rate,cons_price_rate, cons_conf_idx,euribor3m,y from pai_bank_data limit 10
emp_var_rate cons_price_rate cons_conf_idx euribor3m y 1.4 93.918 -42.7 4.962 0 -0.1 93.2 -42.0 4.021 0 -1.7 94.055 -39.8 0.729 1 -1.8 93.075 -47.1 1.405 0 -2.9 92.201 31.4 0.869 1 1.4 93.918 -42.7 4.961 0 -1.8 92.893 -46.2 1.327 0 -1.8 92.893 92.893 1.313 0 -2.9 92.963 -40.8 1.266 1 -1.8 93.075 -47.1 1.41 0 1.1 93.994 -36.4 4.864 0 1.4 93.444 -36.1 4.964 0 1.4 93.444 -36.1 4.965 1 -1.8 92.893 -46.2 1.291 0 1.4 94.465 -41.8 4.96 0 1.4 93.918 -42.7 4.962 0 -1.8 93.075 -47.1 1.365 1 -0.1 93.798 -40.4 4.86 1 1.1 93.994 -36.4 4.86 0 1.4 93.918 -42.7 4.96 0 -1.8 93.075 -47.1 1.405 0 1.4 94.465 -41.8 4.967 0 1.4 93.918 -42.7 4.963 0 1.4 93.918 -42.7 4.968 0 1.4 93.918 -42.7 4.962 0 -1.8 92.893 -46.2 1.344 0 -3.4 92.431 -26.9 0.754 0 -1.8 93.075 -47.1 1.365 0 -1.8 92.893 -46.2 1.313 0 1.4 93.918 -42.7 4.961 0 1.4 94.465 -41.8 4.961 0 -1.8 92.893 -46.2 1.327 0 -1.8 92.893 -46.2 1.299 0 -2.9 92.963 -40.8 1.268 1 1.4 93.918 -42.7 4.963 0 -1.8 92.893 -46.2 1.334 0 1.4 93.918 -42.7 4.96 0 -1.8 93.075 -47.1 1.405 0 1.4 94.465 -41.8 4.96 0 1.4 93.444 -36.1 4.962 0 1.1 93.994 -36.4 4.86 0 1.1 93.994 -36.4 4.857 0 1.4 93.918 -42.7 4.961 0 -3.4 92.649 -30.1 0.715 1 1.4 93.444 -36.1 4.966 0 -0.1 93.2 -42.0 4.076 0 1.4 93.444 -36.1 4.965 0 -1.8 92.893 -46.2 1.354 0 1.4 93.444 -36.1 4.967 0 1.4 94.465 -41.8 4.959 0 -1.8 92.893 -46.2 1.354 0 1.4 94.465 -41.8 4.958 0 -1.8 92.893 -46.2 1.354 0 1.4 94.465 -41.8 4.864 0 1.1 93.994 -36.4 4.859 0 1.1 93.994 -36.4 4.857 0 -1.8 92.893 -46.2 1.27 0 1.1 93.994 -36.4 4.857 0 1.1 93.994 -36.4 4.859 0 1.4 94.465 -41.8 4.959 0 1.1 93.994 -36.4 4.856 0 -1.8 93.075 -47.1 1.405 0 -1.8 92.843 -50.0 1.811 1 -0.1 93.2 -42.0 4.021 0 -2.9 92.469 -33.6 1.029 0 1.4 93.918 -42.7 4.962 0 -1.8 93.075 -47.1 1.365 0 1.1 93.994 -36.4 4.857 0 -1.8 92.893 -46.2 1.259 0 1.1 93.994 -36.4 4.857 0 1.4 94.465 -41.8 4.866 0 -2.9 92.201 -31.4 0.883 0 -0.1 93.2 -42.0 4.076 0 1.1 93.994 -36.4 4.857 0 1.4 93.918 -42.7 4.96 0 1.4 93.444 -36.1 4.962 0 1.1 93.994 -36.4 4.858 0 1.1 93.994 -36.4 4.857 0 1.1 93.994 -36.4 4.856 0 1.4 93.918 -42.7 4.968 0 1.4 93.444 -36.1 4.966 0 1.4 94.465 -41.8 4.962 0 1.4 93.444 -36.1 4.963 0 -1.8 92.843 -50.0 1.56 1 1.4 93.918 -42.7 4.96 0 1.4 93.444 -36.1 4.963 0 -3.4 92.431 -26.9 0.74 0 1.1 93.994 -36.4 4.856 0 1.4 93.918 -42.7 4.962 0 1.1 93.994 -36.4 4.856 0 -0.1 93.2 -42.0 4.245 1 1.1 93.994 -36.4 4.857 0 -1.8 93.075 -47.1 1.405 0 -1.8 92.893 -46.2 1.327 0 -0.1 93.2 -42.0 4.12 0 1.4 94.465 -41.8 4.958 0 -1.8 93.749 -34.6 0.659 1 1.1 93.994 -36.4 4.858 0 1.1 93.994 -36.4 4.858 0 1.4 93.444 -36.1 4.963 0 - Parameter settings
Select the y column as the optional label column for the scatter chart. Select the emp_var_rate, cons_price_rate, cons_conf_idx, and euribor3m columns as feature columns.
- Output
You can view the distribution of the objects specified by the label column for different features in the scatter chart.