The SQL Script component provided by Platform for AI (PAI) allows you to write custom SQL statements in the SQL script editor. You can submit the statements to MaxCompute for execution. This topic describes how to configure the component and the scenarios in which you can use the component.
Component configuration
You can configure the SQL Script component only in Machine Learning Designer. The following table describes the parameters that are used to configure the component.
Parameter | Description |
Use Script Mode | In Script mode, you need to create an output table. Specify the table name as ${o1}. This way, the system can use the table as the output of the component. For more information, see SQL in script mode. |
Input Source | The name of the input table. |
Whether the system adds a create table statement |
|
SQL Script | The custom SQL script. For information about the supported syntax, see Overview of MaxCompute SQL. The SQL Script component provides an auto-complete feature, which recommends code keywords, and built-in search templates to help you improve productivity. For more information, see Features that improve the efficiency of code development. |
Usage notes on the SQL Script component:
The component supports up to four inputs and one output.
When you draw a line between an upstream component and this component, the input port that you select determines the mapped name of the input table. The mapped name of the input table can be t1, t2, t3, or t4. You can directly use the ${t1}, ${t2}, ${t3}, ${t4} variable to specify the input table, without the need to specify the original table name.
You can use the ${o1} variable in the SQL script to specify the output table.
You can use the ${lifecycle} variable in the SQL script to specify the configured lifecycle of temporary tables in the current workspace. The default lifecycle is 28 days. For information about how to configure the lifecycle of temporary tables in a workspace, see Manage workspaces.
If you select the Whether the system adds a create table statement check box, make sure that the last statement of the SQL script that you specify is a SELECT statement. In this case, PAI automatically creates a temporary table to store the query results of the SELECT statement.
If you specify a CREATE TABLE statement in the SQL script to create a temporary table, you must configure the lifecycle of the table. For more information, see Manage workspaces.
Scenarios
You can use the SQL Script component in different ways in the following scenarios:
Scenario 1: The SQL script component reads data from a single upstream table and then performs some analysis
In this case, PAI automatically creates a temporary table whose name is prefixed with pai_temp as the output table. Sample SQL script:
select age,
(case sex when 'male' then 1 else 0 end) as sex,
(case cp when 'angina' then 0 when 'notang' then 1 else 2 end) as cp,
trestbps,
chol,
(case fbs when 'true' then 1 else 0 end) as fbs,
(case restecg when 'norm' then 0 when 'abn' then 1 else 2 end) as restecg,
thalach,
(case exang when 'true' then 1 else 0 end) as exang,
oldpeak,
(case slop when 'up' then 0 when 'flat' then 1 else 2 end) as slop,
ca,
(case thal when 'norm' then 0 when 'fix' then 1 else 2 end) as thal,
(case status when 'sick' then 1 else 0 end) as ifHealth
from ${t1};
Scenario 2: The SQL script component reads data from multiple upstream tables for combined analysis
In this case, you need to use the ${o1} and ${lifecycle} variables in the SQL script to manually create a temporary table as the output table. Sample SQL script:
create table if not exists ${o1} lifecycle ${lifecycle}
as
with
a as (select * from ${t1}),
b as (select * from ${t2}),
c as (select * from ${t3})
select * from a
UNION all
select * from b
UNION all
select * from c;
Scenario 3: The SQL script component has no upstream input and directly reads the contents of the source table through the table name
In this case, you need to use the ${o1} and ${lifecycle} variables in the SQL script to manually create a temporary table as the output table. Sample SQL script:
set odps.sql.mapper.split.size=128;
set odps.sql.executionengine.batch.rowcount=100;
drop table if exists test_a_5_1;
create table test_a_5_1
as
select * from pai_online_project.wumai_data;
drop table if exists test_a_5_2;
create table test_a_5_2
as
select * from pai_online_project.wumai_data;
create table if not exists ${o1}
LIFECYCLE ${lifecycle}
as
select a.pm2, a.pm10, a.so2, b.co, b.no2
from
test_a_5_1 a join test_a_5_2 b
on a.time = b.time
and a.hour = b.hour;
select * from ${o1};
The SET statements at the beginning of the SQL script are used to configure environment variables.
Scenario 4: Execute PAI commands within the SQL script component
In this case, you need to use the ${o1} and ${lifecycle} variables in the SQL script to manually create a temporary table as the output table. Sample SQL script:
drop table if exists ${o1};
PAI -name type_transform_v1 -project algo_public
-Dlifecycle=${lifecycle} -DoutputTable=${o1}
-Dcols_to_double="time,hour,pm2,pm10,so2,co,no2"
-Ddefault_double_value="0.0" -Ddefault_int_value="0"
-DreserveOldFeat="false" -DinputTable="pai_online_project.wumai_data";