By Haiqing
MaxCompute (formerly known as ODPS) is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.
Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).
The previous article introduced the script mode, which allows the script to be compiled and submitted as a whole. This generates an execution plan that enables queuing and execution, maximizing the utilization of MaxCompute computing resources. This process provides more optimization opportunities for developers and is more user-friendly for those accustomed to writing in common programming languages.
Furthermore, the script mode supports IF ELSE branch statements, enabling the program to automatically select the execution logic based on conditions. For example, the script can choose different logic based on even or odd dates, or adopt different strategies depending on the number of rows in a table. This feature effectively handles complex SQL statements in various business scenarios, providing SQL developers with more flexibility.
In this article, MaxCompute Studio is used as an example. For detailed information on installing and using MaxCompute Studio, please refer to the installation guides.
The IF syntax of Max Compute is as follows:
IF (condition) BEGIN
statement 1
statement 2
...
END
IF (condition) BEGIN
statements
END ELSE IF (condition2) BEGIN
statements
END ELSE BEGIN
statements
END
Note:
• If BEGIN and END contain only one statement, you can omit the statement. (similar to '{ }' in Java).
• Multiple IF ELSE statements can be nested within each other.
• There are two types of conditions: expressions and scalar subqueries. Both of them are of the BOOLEAN type.
• The IF ELSE statement of a Boolean expression can be used to determine which branch to execute at compile time.
From the execution graph of MaxCompute Studio, you can see that only the src1 branch is executed in the final job.
• The IF ELSE statement of a scalar subquery cannot determine which branch to execute at compile time. The compiler does not know the data in the tb_1 table, so it cannot decide whether the condition is true or false, but only at runtime can it be done. Therefore, multiple jobs need to be submitted.
The following execution graph shows that two jobs are submitted for the MaxCompute Job.
The first calculation
(SELECT count(*) FROM tb_1 ) > 1
The second calculates the remaining part.
The following figure shows the execution result:
For complex SQL statements, there is often logic that includes IF ELSE. The example is as follows:
select a.id
, greatest(b.c1, c.c1 ) as c1
, greatest(b.c2, c.c2 ) as c2
..
from (
select * from ta
) a
left outer join (
select * from (
select tx.id , ty.c1 ,ty.c2
(
select *
from foo ) tx
join
(
select *
from bar
) ty on tx.id2=ty.id2
) b on a.id= b.id
left outer join (
select * from tc
)c on a.id=c.id ;
Due to business changes, data that can be associated with table b is relatively small. Table b is often empty, but occasionally there is a certain amount of traffic. The table bar has a large amount of data, which means that the cost of foo join bar is high. Therefore, it is expected that the script can implement such logic: when the traffic of table b is 0, change the SQL execution logic to associate only with tc. You can use the IF ELSE branch statement as follows:
@a := select * from ta;
@b :=SELECT tx.id
,ty.c1
,ty.c2 ( SELECT * FROM foo ) tx
JOIN (
SELECT *
FROM bar
) ty
ON tx.id2 = ty.id2;
@c := select * from tc;
@select_expr table (id bigint ,c1 string,c2 string ...);
IF ( cast( (select count(*) as cnt from @b ) as int) == 0 ) BEGIN
@select_expr := select a.id
, c. c1
, c.c2
...
from @a
left outer join
@c on a.id=c.id;
END ELSE BEGIN
@select_expr := select a.id
, greatest(b.c1, c.c1 ) as c1
, greatest(b.c2, c.c2 ) as c2
..
from @a a left outer join @b on a.id= b.id
left outer join @c on a.id=c.id;
Leveraging the SQL engine of ODPS 3.0, MaxCompute introduces IF ELSE branch statements, enhancing the flexibility of programming for developers. The MaxCompute platform is dedicated to continuously enhancing the expressive capabilities of the SQL language.
Stay tuned.
MaxCompute Unleashed - Part 9: Script Mode and Parameter View
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - February 7, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - February 4, 2024
Alibaba Cloud MaxCompute - February 4, 2024
Alibaba Cloud MaxCompute - January 29, 2024
137 posts | 19 followers
FollowConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreMore Posts by Alibaba Cloud MaxCompute