×
Community Blog MaxCompute Unleashed - Part 1: Harnessing Compiler Errors and Warnings Effectively

MaxCompute Unleashed - Part 1: Harnessing Compiler Errors and Warnings Effectively

Part 1 of the “Unleash the Power of MaxCompute” series describes the improvements of MaxCompute in usability.

By Zhenyu

MaxCompute (formerly known as ODPS) [1] is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.

Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).

In the first part of the article series, I will introduce the improvements in usability.

Scenario 1

As an ODPS developer, I submitted an SQL script that contained two SQL statements. After waiting in the queue for a long time, I realized that there was a function parameter type error in the first statement, wasting my time. I made the necessary modifications to the script and resubmitted it, entering the queue once again. The first statement took two hours to complete, only to discover that the second statement was missing half of the brackets.

Scenario 2

In my upstream data, there is a table named "my_upperstream" with columns including "id" and partitioned by certain criteria. My project requires a daily task to join "my_upperstream" based on the "id" column. Previously, there were no issues. However, recently, I noticed that some data was mysteriously lost. After several days of frustrating debugging, I finally discovered that my "id" column was of type BIGINT, but when comparing it with "u.id", my column was being treated as DOUBLE. Due to floating-point errors, certain columns failed to be joined. [2];

The MaxCompute compiler, built upon the new SQL engine of ODPS V2.0, works seamlessly with MaxCompute Studio and offers a range of features for error and warning prompts. By utilizing the compiler, the above problems can be completely avoided.

Usability Improvements of the MaxCompute Compiler

In order to give full play to the usability improvements of the MaxCompute compiler, it is better to use the MaxCompute together with MaxCompute Studio (The support of the D2 platform for the function of prompting errors and warnings provided by ODPS V2.0 is under active development). First, please install MaxCompute Studio, connect to a MaxCompute project, and create a new MaxCompute script file, as follows.

1

The results can be seen here.

  1. In the first INSERT statement, the function wm_concat is used improperly.
  2. The second insert statement has an error and a warning. The error is that the column name is written incorrectly.
  3. The warning is the same as what has been mentioned in the above scenario 2. When comparing BIGINT and DOUBLE in ODPS, BIGINT will be implicitly converted into DOUBLE. As the conversion from STRING to DOUBLE may lead to errors in the runtime, the MaxCompute compiler will warn you to determine whether this is expected.
  4. If you stop the mouse over an error or warning [3], you will be prompted with a specific error or warning message.

If I do not modify the errors and submit them directly, they will be blocked by MaxCompute Studio, as shown in the following figure.

2

I modified the errors and the warning as prompted, as shown in the following figure.

3

I submitted it again, and it could run smoothly. I don't have to worry about waiting futilely because of grammar errors anymore!

In fact, MaxCompute Studio can help set all warnings to errors, as shown in the following figure.

4

This can ensure that you don't miss any possible mistakes!

The MaxCompute team recommends using MaxCompute Studio to perform static compilation checks on your script before submitting it. They also strongly advise setting warnings as errors and addressing all warnings before submission. This approach allows you to avoid errors that consume significant computing and manpower.

In addition to saving time, this operation also saves resources on the MaxCompute server. Currently, the MaxCompute SQL server expends substantial computing resources to compile SQL statements with errors, delaying the execution of error-free statements.

In addition, do you know that submitting a script with errors deducts from your compute health score and lowers the priority of your submitted tasks? Some warnings also result in health score deductions. By utilizing the MaxCompute Compiler and MaxCompute Studio, you can avoid such deductions and prevent downgrades.

In many cases, warnings arise from unsafe implicit type conversions. If the conversion is indeed intended, you can use the cast method (xxx as) to eliminate the warning. If you find it cumbersome, you can use the simplified method (xxx) provided by the MaxCompute compiler, as shown in the modified script above. You can choose the method that suits your preference. MaxCompute has made other SQL improvements, which will be introduced in subsequent articles of this series.

Summary

The MaxCompute compiler, based on the ODPS V2.0 SQL engine, works in conjunction with MaxCompute Studio to significantly enhance user productivity by providing comprehensive and accurate error and warning reporting. However, improving productivity doesn't solely rely on accurate error and warning prompts; rich and powerful SQL expression capabilities are equally important. Starting from the next article, we will introduce you to various SQL improvements in MaxCompute.

Annotations

• [1]: MaxCompute is the same as ODPS; it's the ODPS brand on Alibaba Cloud. In this series, MaxCompute and ODPS are used interchangeably.

• [2]: Why is there a need to convert to DOUBLE when int = string? This behavior stems from Hive, as the initial version of MaxCompute (previously known as ODPS) had to maintain compatibility with Hive in order to replace widely-used Hive scripts at that time. However, with the warning feature in place, as long as you use MaxCompute in the recommended manner and address the prompted errors and warnings, you won't encounter this issue again!

• [3]: Regarding warning annotations, my preference is to set them as yellow wavy lines. You can customize this in the IntelliJ Settings as shown below.

5

0 1 0
Share on

Alibaba Cloud MaxCompute

137 posts | 19 followers

You may also like

Comments

Alibaba Cloud MaxCompute

137 posts | 19 followers

Related Products