By Zhenyu
MaxCompute (formerly known as ODPS) [1] is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.
Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).
In the first part of the article series, I will introduce the improvements in usability.
• Scenario 1
As an ODPS developer, I submitted an SQL script that contained two SQL statements. After waiting in the queue for a long time, I realized that there was a function parameter type error in the first statement, wasting my time. I made the necessary modifications to the script and resubmitted it, entering the queue once again. The first statement took two hours to complete, only to discover that the second statement was missing half of the brackets.
• Scenario 2
In my upstream data, there is a table named "my_upperstream" with columns including "id" and partitioned by certain criteria. My project requires a daily task to join "my_upperstream" based on the "id" column. Previously, there were no issues. However, recently, I noticed that some data was mysteriously lost. After several days of frustrating debugging, I finally discovered that my "id" column was of type BIGINT, but when comparing it with "u.id", my column was being treated as DOUBLE. Due to floating-point errors, certain columns failed to be joined. [2];
The MaxCompute compiler, built upon the new SQL engine of ODPS V2.0, works seamlessly with MaxCompute Studio and offers a range of features for error and warning prompts. By utilizing the compiler, the above problems can be completely avoided.
In order to give full play to the usability improvements of the MaxCompute compiler, it is better to use the MaxCompute together with MaxCompute Studio (The support of the D2 platform for the function of prompting errors and warnings provided by ODPS V2.0 is under active development). First, please install MaxCompute Studio, connect to a MaxCompute project, and create a new MaxCompute script file, as follows.
The results can be seen here.
If I do not modify the errors and submit them directly, they will be blocked by MaxCompute Studio, as shown in the following figure.
I modified the errors and the warning as prompted, as shown in the following figure.
I submitted it again, and it could run smoothly. I don't have to worry about waiting futilely because of grammar errors anymore!
In fact, MaxCompute Studio can help set all warnings to errors, as shown in the following figure.
This can ensure that you don't miss any possible mistakes!
The MaxCompute team recommends using MaxCompute Studio to perform static compilation checks on your script before submitting it. They also strongly advise setting warnings as errors and addressing all warnings before submission. This approach allows you to avoid errors that consume significant computing and manpower.
In addition to saving time, this operation also saves resources on the MaxCompute server. Currently, the MaxCompute SQL server expends substantial computing resources to compile SQL statements with errors, delaying the execution of error-free statements.
In addition, do you know that submitting a script with errors deducts from your compute health score and lowers the priority of your submitted tasks? Some warnings also result in health score deductions. By utilizing the MaxCompute Compiler and MaxCompute Studio, you can avoid such deductions and prevent downgrades.
In many cases, warnings arise from unsafe implicit type conversions. If the conversion is indeed intended, you can use the cast method (xxx as) to eliminate the warning. If you find it cumbersome, you can use the simplified method (xxx) provided by the MaxCompute compiler, as shown in the modified script above. You can choose the method that suits your preference. MaxCompute has made other SQL improvements, which will be introduced in subsequent articles of this series.
The MaxCompute compiler, based on the ODPS V2.0 SQL engine, works in conjunction with MaxCompute Studio to significantly enhance user productivity by providing comprehensive and accurate error and warning reporting. However, improving productivity doesn't solely rely on accurate error and warning prompts; rich and powerful SQL expression capabilities are equally important. Starting from the next article, we will introduce you to various SQL improvements in MaxCompute.
• [1]: MaxCompute is the same as ODPS; it's the ODPS brand on Alibaba Cloud. In this series, MaxCompute and ODPS are used interchangeably.
• [2]: Why is there a need to convert to DOUBLE when int = string? This behavior stems from Hive, as the initial version of MaxCompute (previously known as ODPS) had to maintain compatibility with Hive in order to replace widely-used Hive scripts at that time. However, with the warning feature in place, as long as you use MaxCompute in the recommended manner and address the prompted errors and warnings, you won't encounter this issue again!
• [3]: Regarding warning annotations, my preference is to set them as yellow wavy lines. You can customize this in the IntelliJ Settings as shown below.
Best Practices for the Artificial Intelligence Recommendation of Materialized Views in MaxCompute
MaxCompute Unleashed - Part 2: Basic Data Types and Built-in Functions
137 posts | 20 followers
FollowAlibaba Cloud MaxCompute - January 29, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 29, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 29, 2024
137 posts | 20 followers
FollowConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreMore Posts by Alibaba Cloud MaxCompute