All Products
Search
Document Center

MaxCompute:MaxCompute Query Acceleration

最終更新日:Dec 19, 2024

This topic describes the system architecture, benefits, scenarios, and limits of the MaxCompute Query Acceleration (MCQA) feature.

Description

The MCQA feature of MaxCompute provides the following benefits:

  • Accelerates the execution of small- and medium-sized query jobs and reduces the time that is required to complete such a query job from minutes to seconds. The MCQA feature is compatible with other query features of MaxCompute.

  • Supports mainstream business intelligence (BI) tools to perform ad hoc queries or BI analysis.

  • Uses an independent resource pool that does not occupy offline computing resources. MCQA can automatically identify query jobs to reduce the load of the job queue and improve user experience.

  • Allows you to write the query results of MCQA jobs to a temporary cache. If you run the same query job later, MaxCompute preferentially returns the query results in the cache to accelerate the execution of the job.

Note

If an MCQA job is rolled back to an SQL query job, you are charged based on the billing method of the SQL query job.

Architecture

The following figure shows the architecture of MCQA.mcqa架构图

Scenarios

The following table describes the scenarios for which MCQA is suitable.

Scenario

Description

Applicable scope

Ad hoc query

You can use MCQA to optimize the query performance of small- and medium-sized datasets (less than 100 GB) and perform low-latency queries on MaxCompute tables. This accelerates data development and data analysis.

You can specify query criteria based on your business requirements, obtain query results, and adjust the query logic. In this scenario, the query latency must be within dozens of seconds. Users are data developers or data analysts who are skilled in SQL and prefer to use the client tools that they are familiar with to analyze queries.

BI analysis

If you use MaxCompute to build an enterprise-class data warehouse, MaxCompute performs extract, transform, load (ETL) operations to process data into business-oriented and consumable aggregate data. MCQA features low latency and supports elastic parallelism and data caching. You can use MCQA with the partitions and buckets in MaxCompute tables to run parallel jobs, generate reports, analyze statistics, and analyze fixed reports at a low cost.

In most cases, the query object is aggregate data. This scenario is suitable for multidimensional queries, fixed queries, or high-frequency queries that contain small amounts of data. In this scenario, queries are latency-sensitive, and the results are returned in seconds. For example, the latency for most queries is less than 5 seconds. The time that is required to complete each query varies based on the data size and query complexity.

Detailed queries and analysis of large amounts of data

MCQA can automatically identify the size of query jobs. MCQA can respond to and process small-sized jobs at the earliest opportunity, and can allocate the resources that are required for large-sized jobs. This helps analysts run query jobs of different sizes and complexities.

In this scenario, large amounts of historical data are queried. The size of the valid data that is queried is small, and the requirement for latency is low. Users are business analysts who want to gain business insights from data, explore potential business opportunities, and validate business assumptions.

Limits

  • MCQA supports only data query statements that start with SELECT. If you submit a statement that MCQA does not support from the MaxCompute client or by using a MaxCompute Java Database Connectivity (JDBC) driver or MaxCompute SDK, the MaxCompute client, MaxCompute JDBC driver, or MaxCompute SDK automatically rolls back to the common offline mode to execute the statement. Other tools cannot roll back to the common offline mode to execute the submitted statement that MCQA does not support. To roll back MCQA jobs to SQL query jobs by using the MaxCompute client, a MaxCompute JDBC driver, or MaxCompute SDK, make sure that the following related version requirement is met:

    • The version of the MaxCompute client must be V0.40.8 or later. You can click odpscmd to download the required version of the MaxCompute client.

    • The version of the MaxCompute JDBC driver must be V3.3.0 or later. You can click ODPS-JDBC to download the required version of the JDBC driver.

    • The version of MaxCompute SDK must be V0.40.8-public or later. You can click ODPS-SDK to download the required version of MaxCompute SDK.

  • By default, a maximum of 1 million rows of data can be queried. You can add the LIMIT keyword to SQL statements to eliminate the limit.

The following table describes the limits of MCQA.

Item

Description

Feature

  • The MCQA feature is available in MaxCompute Standard Edition that uses the pay-as-you-go billing method.

  • The MCQA feature supports the subscription billing method.

  • The MCQA feature is unavailable in MaxCompute Developer Edition. You must upgrade MaxCompute to Standard Edition.

Query

  • A maximum of 2,000 workers can run in parallel in a job.

  • If you submit an MCQA job from the MaxCompute client, the default timeout period is 30 seconds. If you submit an MCQA job from the ad hoc query module of DataWorks, the default timeout period is 20 seconds. If the MCQA job times out, the system automatically rolls back the MCQA job to an SQL query job.

  • MCQA can cache only the data that is stored in ALIORC tables into memory to accelerate queries.

Query parallelism

  • Requirements for the subscription billing method

    • Free edition: No quota groups of the interactive resource type are configured for MCQA jobs.

      A maximum of five MCQA jobs can run in parallel in a project every day and a maximum of 500 MCQA jobs can run in a project every day. By default, additional MCQA jobs are rolled back to SQL query jobs. If you configure to forbid the rollback of MCQA jobs, the following error is returned:

      ODPS-1800001: Session exception - Failed to submit sub-query in session because:Prepaid project run outoffree query quota.
    • Quota groups of the interactive resource type are configured for MCQA jobs.

      • A maximum of 120 MCQA jobs can run in parallel in each project. If the number of parallel MCQA jobs reaches the upper limit, additional MCQA jobs are rolled back to SQL query jobs.

      • When you configure a quota group of the interactive resource type, make sure that the minimum quota of reserved compute units (CUs) that are used for MCQA jobs is the same as the maximum quota of reserved CUs that are used for MCQA jobs. Otherwise, the quota group does not take effect.

      • The quota group of the interactive resource type must meet the following requirements. If the quota group of the interactive resource type does not meet the following requirements, you cannot submit MCQA jobs to this quota group.

        • The minimum number of reserved CUs [minCU] is the same as the maximum number of reserved CUs [maxCU].

        • The number of reserved CUs is greater than or equal to 50.

      • After you configure a quota group of the interactive resource type, you can submit only jobs that can be identified as MCQA jobs to the quota group of the interactive resource type for all projects and MCQA can no longer be used free of charge.

      • Quota groups of the interactive resource type cannot be used as the default quota group for projects. When you use MCQA, interactive quota groups automatically take effect for all projects. You do not need to associate the quota groups with projects.

  • Requirements for the pay-as-you-go billing method

    A maximum of 120 MCQA jobs can run in parallel in a MaxCompute project. If the number of parallel MCQA jobs reaches the upper limit, additional MCQA jobs are rolled back to SQL query jobs.

Caching mechanism

MaxCompute allows you to write the query results of MCQA jobs to a temporary cache. If you run the same query job later, MaxCompute preferentially returns the query results in the cache to accelerate the execution of the job.

MaxCompute creates a temporary dataset for each MCQA job to cache the query results. The owner of the temporary dataset is the user who runs the MCQA job to generate query results. Temporary datasets are invisible to users. Users cannot view the content of the temporary datasets. MaxCompute automatically grants users who run query jobs the permissions to access the temporary datasets.

MaxCompute deletes the cached results in the following scenarios:

  • The resource usage of a MaxCompute project is high.

  • The cached results become invalid due to changes in the tables or views that are referenced by the cached results.

  • The cached results expire.

Limits

The caching mechanism has the following limits:

  • If you want to query data from the cached results, the MCQA job that you run must be the same as the original MCQA job that generates the cached results and the context of the MCQA job that you run must be the same as that of the original MCQA job. When you rerun an MCQA job, MaxCompute directly reuses the cached results.

  • If the tables or views that are referenced by the cached results change, the cached results become invalid. If you rerun the same job, you cannot obtain the cached data.

  • The maximum cache size of a temporary dataset is 10 GB.

Billing rules

You are not charged for the storage or computations of cached results. This significantly reduces resource usage costs.

Verification

You can obtain the Logview URL of an MCQA job and use the URL to access Logview. On the Job Details tab, verify that the query results of the MCQA job are written to the cache. The following figure shows the Logview of the query job. For more information about Logview, see Use Logview V2.0 to view job information.Cache

Enable MCQA for subscription MaxCompute projects

Procedure

To enable MCQA to accelerate queries for subscription MaxCompute projects, perform the following steps:

Note

The subscription MCQA quota determines the scan parallelism for queries and the amount of data in the destination table that you want to scan. One CU can be used to scan 0.6 GB of data. For example, if you purchase an MCQA quota that provides 50 CUs, the amount of data that can be scanned at the same time is about 30 GB when MCQA is not enabled. When MCQA is enabled, the maximum amount of data that can be scanned is 300 GB.

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Workspace > Quotas.

  3. On the Quotas page, find the level-1 quota that you want to configure and click Quota Configuration in the Actions column.

  4. Configure a quota template.

    1. On the Quota Templates tab of the Quota Configuration page, click Add Quota.

    2. In the Add Quota dialog box, click + Add Quota and configure the Name and Type parameters.

      Enter a custom name in the Name field. Then, select Interactive from the Type drop-down list.

    3. Click OK.

  5. Configure a quota plan.

    1. On the Quota Plans tab of the Quota Configuration page, click Add Plan to create a quota plan. You can also find a desired plan and click Edit in the Actions column to edit the plan.

    2. In the Create Quota Plan or Edit Quota Plan dialog box, enter quota values in the Reserved CUs [minCU,maxCU] column.

      When you enter quota values in the Reserved CUs [minCU,maxCU] column, take note of the following points:

      • The minimum number of reserved CUs must be equal to the maximum number of reserved CUs.

      • The minimum number of reserved CUs must be greater than or equal to 50. If you do not require interactive resources, enter 0 for the quota of the interactive resource type.

      • Quotas of the interactive resource type do not support elastically reserved CUs.

    3. Click OK.

    4. On the Quota Plans tab, find the quota plan that you created or edited, and click Apply Immediately in the Actions column.

  6. Configure a time plan.

    You can configure a time plan to apply different quota plans at different time points every day. This implements the time-specific logic for quota configuration.

Scheduling policy

Interactive quota groups cannot be explicitly specified. The server automatically schedules MCQA jobs to interactive quota groups based on rules. The specific scheduling policy depends on the number of interactive quota groups under a tenant.

If only one interactive quota group is created under a tenant, all MCQA jobs under the tenant are scheduled to this quota group.

If multiple interactive quota groups are created under a tenant, the server selects interactive quota groups for MCQA jobs based on automatic routing rules and user configurations. For more information, see Quota rules.

Rollback policy

  • If an MCQA job is rolled back to an SQL query job due to usage limits, the quota that is dedicated to MCQA jobs for subscription projects is rolled back to the quota that is associated with the current project.

  • You can use the SDK whose version is later than 0.40.7 to specify a quota for a rollback job.

    SQLExecutorBuilder builder = SQLExecutorBuilder.builder();
    builder.quotaName("<OfflineQuotaName>");
  • You can use the JDBC connection string parameter fallbackQuota=XXX to specify a quota for a rollback job. You cannot specify an interactive quota group for a rollback job. If you specify an interactive quota group for a rollback job, an error is returned.

Tools and features for enabling MCQA

You can use one of the following tools or features to enable MCQA:

  • MaxCompute client

  • Ad hoc query or data analytics feature of DataWorks

  • MaxCompute JDBC driver

  • MaxCompute SDK

  • MaxCompute Studio

  • PyODPS

  • SQLAlchemy

Enable MCQA on the MaxCompute client

  1. Download the latest version of the MaxCompute client.

  2. Install and configure the MaxCompute client. For more information, see Install and configure the MaxCompute client.

  3. Modify the odps_config.ini configuration file in the conf folder, and add the following commands at the end of the configuration file:

    enable_interactive_mode=true -- Enable MCQA.
    interactive_auto_rerun=true  -- Allow an MCQA job to be automatically rolled back to an SQL query job if the MCQA job fails.
  4. Run the MaxCompute client in the bin folder. Run ./bin/odpscmd in Linux, and run ./bin/odpscmd.bat in Windows. If the following information appears, the MaxCompute client is running as expected.客户端运行成功

  5. If the returned results in Logview contain the following information after you run the query job, MCQA is enabled.logviiew

Enable MCQA by using the ad hoc query or data analytics feature of DataWorks

By default, MCQA is enabled on the Ad-Hoc Query or Manually Triggered Workflow page in the DataWorks console. Manual operations are not required. If you want to disable MCQA, fill in the DingTalk group application form to join the DingTalk group for processing.

Run a query job on the Ad-Hoc Query page. If the returned results contain the following information, MCQA is enabled. For more information about Ad-Hoc Query, see Create an ad hoc query.临时查询

Run a query job on the Manually Triggered Workflow page. If the returned results contain the following information, MCQA is enabled. For more information about Manually Triggered Workflow, see Create a manually triggered node.手动业务流程

Enable MCQA by using the MaxCompute JDBC driver

If you use the MaxCompute JDBC driver to connect to MaxCompute, perform the following steps to enable MCQA. For more information, see Usage notes.

  1. Download the JDBC_JAR that supports MCQA or download the source code that can be compiled.

  2. Add the following dependency to the pom.xml file in the Maven repository:

    <dependency>
      <groupId>com.aliyun.odps</groupId>
      <artifactId>odps-jdbc</artifactId>
      <version>3.3.0</version>
      <classifier>jar-with-dependencies</classifier>
    </dependency>
  3. Create a Java program based on the source code to adapt to your business information. For more information, see MaxCompute JDBC. Sample code:

    // The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. If you use an AccessKey pair to call API operations, risks may occur. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
    // In this example, the AccessKey ID and AccessKey secret are configured as environment variables. You can also save your AccessKey pair in the configuration file based on your business requirements.
    // To prevent AccessKey pair leaks, we recommend that you do not directly specify the AccessKey ID and AccessKey secret in the code.
    private static String accessId = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
    private static String accessKey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
    // Replace your_project_name with the name of the project for which you want to use the MCQA feature. 
    String conn = "jdbc:odps:http://service.<regionid>.maxcompute.aliyun.com/api?project=<YOUR_PROJECT_NAME>"&accessId&accessKey&charset=UTF-8&interactiveMode=true&alwaysFallback=false&autoSelectLimit=1000000000";
    Statement stmt = conn.createStatement();
    Connection conn = DriverManager.getConnection(conn, accessId, accessKey);
    Statement stmt = conn.createStatement();
    String tableName = "testOdpsDriverTable";
    stmt.execute("DROP TABLE IF EXISTS " + tableName);
    stmt.execute("CREATE TABLE " + tableName + " (key int, value string)");

    Configure the parameters in String conn to optimize the processing logic. The following table describes the parameters.

    Parameter

    Description

    enableOdpsLogger

    This parameter is used to display logs. If you do not configure Simple Logging Facade for Java (SLF4J), we recommend that you set this parameter to True.

    fallbackForUnknownError

    Default value: False. If this parameter is set to True, the system switches to the offline mode when an unknown error occurs.

    fallbackForResourceNotEnough

    Default value: False. If this parameter is set to True, the system switches to the offline mode when resources are insufficient.

    fallbackForUpgrading

    Default value: False. If this parameter is set to True, the system switches to the offline mode during an upgrade.

    fallbackForRunningTimeout

    Default value: False. If this parameter is set to True, the system switches to the offline mode when a connection times out.

    fallbackForUnsupportedFeature

    Default value: False. If this parameter is set to True, the system switches to the offline mode in scenarios in which MCQA is not supported.

    alwaysFallback

    Default value: False. If this parameter is set to True, the system switches to the offline mode in one of the preceding scenarios. This parameter is supported only for the MaxCompute JDBC driver of V3.2.3 or later.

Examples

  • Example 1: Enable MCQA on Tableau.

    Add interactiveMode=true to the URL of Tableau Server. We recommend that you also add enableOdpsLogger=true to display logs. For more information, see Configure MaxCompute JDBC on Tableau.

    Sample URL:

    http://service.cn-beijing.maxcompute.aliyun.com/api?
    project=****_beijing&interactiveMode=true&enableOdpsLogger=true&autoSelectLimit=1000000000"

    To enable MCQA for specific tables in a MaxCompute project, add the table_list=table_name1, table_name2 property to the URL of Tableau Server. Then, use this property to specify the tables for which you want to enable MCQA. Separate table names with commas (,). If you specify an excessive number of tables, access to the URL of Tableau Server becomes time-consuming. We recommend that you specify only the required tables in the URL of Tableau Server. If a table contains a large number of partitions, we recommend that you do not use data from all partitions as the data source. You can specify partitions to obtain required data or run custom SQL queries to obtain the required data. Sample code:

    http://service.cn-beijing.maxcompute.aliyun.com/api?project=****_beijing
    &interactiveMode=true&alwaysFallback=true&enableOdpsLogger=true&autoSelectLimit=1000000000"
    &table_list=orders,customers
  • Example 2: Enable MCQA on SQL Workbench/J.

    After you configure the MaxCompute JDBC driver, you can use MCQA on SQL Workbench/J by modifying the JDBC URL that you specified on the profile configuration page. For more information about how to configure the profile, see Configure MaxCompute JDBC on SQL Workbench/J.

    The following example shows the format of the URL that must be configured:

    jdbc:odps:<MaxCompute_endpoint>?
    project=<MaxCompute_project_name>&accessId=<AccessKey ID>&accessKey=<AccessKey Secret>
    &charset=UTF-8&interactiveMode=true&autoSelectLimit=1000000000"

    The following table describes the parameters in the SQL statement.

    Parameter

    Description

    MaxCompute_endpoint

    The endpoint of the region in which MaxCompute resides. For more information, see Endpoints.

    MaxCompute_project_name

    The name of the MaxCompute project.

    AccessKey ID

    The AccessKey ID of your Alibaba Cloud account that can access the project.

    You can obtain the AccessKey ID on the AccessKey Pair page.

    AccessKey Secret

    The AccessKey secret that corresponds to the AccessKey ID.

    You can obtain the AccessKey secret on the AccessKey Pair page.

    charset=UTF-8

    The encoding format of the character set.

    interactiveMode

    Specifies whether to enable MCQA. If this parameter is set to true, MCQA is enabled.

    autoSelectLimit

    This parameter is required when the number of data records exceeds 1 million.

Enable MCQA by using MaxCompute SDK for Java

For more information about MaxCompute SDK for Java, see SDK for Java. You must add a specified dependency to the pom.xml file in the Maven repository. Sample dependency:

<dependency>
  <groupId>com.aliyun.odps</groupId>
  <artifactId>odps-sdk-core</artifactId>
  <version>3.3.0</version>
</dependency>

The following sample code provides an example on how to create a Java program:

import com.aliyun.odps.Odps;
import com.aliyun.odps.OdpsException;
import com.aliyun.odps.OdpsType;
import com.aliyun.odps.account.Account;
import com.aliyun.odps.account.AliyunAccount;
import com.aliyun.odps.data.Record;
import com.aliyun.odps.data.ResultSet;
import com.aliyun.odps.sqa.*;

import java.io.IOException;
import java.util.*;

public class SQLExecutorExample {

    public static void SimpleExample() {
        // Specify the Alibaba Cloud account and project information. 
      	// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. If you use an AccessKey pair to call API operations, risks may occur. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
				// In this example, the AccessKey ID and AccessKey secret are configured as environment variables. You can also save your AccessKey pair in the configuration file based on your business requirements.
				// We recommend that you do not directly specify the AccessKey ID and AccessKey secret in code to prevent AccessKey pair leaks.
        Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
        Odps odps = new Odps(account);
        odps.setDefaultProject("<YOUR_PROJECT_NAME>");
        odps.setEndpoint("http://service.<regionid>.maxcompute.aliyun.com/api");

        // Prepare to build an SQLExecutor. 
        SQLExecutorBuilder builder = SQLExecutorBuilder.builder();

        SQLExecutor sqlExecutor = null;
        try {
            // run in offline mode or run in interactive mode
            if (false) {
                // Create an SQLExecutor that runs offline SQL queries by default. 
                sqlExecutor = builder.odps(odps).executeMode(ExecuteMode.OFFLINE).build();
            } else {
                // Create an SQLExecutor that runs SQL queries with MCQA enabled by default. Make sure that the system automatically rolls back to the offline query mode if an SQL query with MCQA enabled fails. 
                sqlExecutor = builder.odps(odps).executeMode(ExecuteMode.INTERACTIVE).fallbackPolicy(FallbackPolicy.alwaysFallbackPolicy()).build();
            }
            // Pass special query settings if required. 
            Map<String, String> queryHint = new HashMap<>();
            queryHint.put("odps.sql.mapper.split.size", "128");
            // Submit a query job. You can pass hints. 
            sqlExecutor.run("select count(1) from test_table;", queryHint);

            // List the System.out.println() interfaces that can be used to query common information. 
            // UUID
            System.out.println("ExecutorId:" + sqlExecutor.getId());
            // Query the Logview URL of the current query job. 
            System.out.println("Logview:" + sqlExecutor.getLogView());
            // Query the instance on which the current query job is run. In interactive mode, multiple query jobs may be run on the same instance. 
            System.out.println("InstanceId:" + sqlExecutor.getInstance().getId());
            // Query the progress of the current query job. You can check the progress bar in the console. 
            System.out.println("QueryStageProgress:" + sqlExecutor.getProgress());
            // Query the changelogs about the status of the current query job, such as rollback messages. 
            System.out.println("QueryExecutionLog:" + sqlExecutor.getExecutionLog());

            // Obtain the results of query jobs by calling one of the following API operations: 
            if(false) {
                // Query the results of all query jobs. The API operation that you called is a synchronous operation and may occupy a thread until the query succeeds or fails. 
                // Write the results of all query jobs to the memory at the same time. To prevent memory issues, we recommend that you do not perform this operation if the amount of data is large. 
                List<Record> records = sqlExecutor.getResult();
                printRecords(records);
            } else {
                // Query the ResultSet iterator of the query results. The API operation that you called is a synchronous operation and may occupy a thread until the query succeeds or fails. 
                // Read the results of query jobs in several batches. We recommend that you perform this operation if the amount of data is large. 
                ResultSet resultSet = sqlExecutor.getResultSet();
                while (resultSet.hasNext()) {
                    printRecord(resultSet.next());
                }
            }

            // run another query
            sqlExecutor.run("select * from test_table;", new HashMap<>());
            if(false) {
                // Query the results of all query jobs. The API operation that you called is a synchronous operation and may occupy a thread until the query succeeds or fails. 
                // Write the results of all query jobs to the memory at the same time. To prevent memory issues, we recommend that you do not perform this operation if the amount of data is large. 
                List<Record> records = sqlExecutor.getResult();
                printRecords(records);
            } else {
                // Query the ResultSet iterator of the query results. The API operation that you called is a synchronous operation and may occupy a thread until the query succeeds or fails. 
                // Read the results of query jobs in several batches. We recommend that you perform this operation if the amount of data is large. 
                ResultSet resultSet = sqlExecutor.getResultSet();
                while (resultSet.hasNext()) {
                    printRecord(resultSet.next());
                }
            }
        } catch (OdpsException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (sqlExecutor != null) {
                // Close the SQLExecutor and release related resources. 
                sqlExecutor.close();
            }
        }
    }

    // SQLExecutor can be reused by pool mode
    public static void ExampleWithPool() {
        // Specify the Alibaba Cloud account and project information. 
      	// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. If you use an AccessKey pair to call API operations, risks may occur. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
				// In this example, the AccessKey ID and AccessKey secret are configured as environment variables. You can also save your AccessKey pair in the configuration file based on your business requirements.
				// We recommend that you do not directly specify the AccessKey ID and AccessKey secret in code to prevent AccessKey pair leaks.
        Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
        Odps odps = new Odps(account);
        odps.setDefaultProject("your_project_name");
        odps.setEndpoint("http://service.<regionid>.maxcompute.aliyun.com/api");

        // Run query jobs by using a connection pool. 
        SQLExecutorPool sqlExecutorPool = null;
        SQLExecutor sqlExecutor = null;
        try {
            // Create a connection pool. Specify the connection pool size and the default execution mode. 
            SQLExecutorPoolBuilder builder = SQLExecutorPoolBuilder.builder();
            builder.odps(odps)
                    .initPoolSize(1) // init pool executor number
                    .maxPoolSize(5)  // max executors in pool
                    .executeMode(ExecuteMode.INTERACTIVE); // run in interactive mode

            sqlExecutorPool = builder.build();
            // Obtain an SQLExecutor from the connection pool. If no SQLExecutor is obtained from the connection pool and the total number of SQLExecutors does not exceed the upper limit, an SQLExecutor is automatically added. 
            sqlExecutor = sqlExecutorPool.getExecutor();

            // Use the SQLExecutor in the same manner as you use it in the preceding example. 
            sqlExecutor.run("select count(1) from test_table;", new HashMap<>());
            System.out.println("InstanceId:" + sqlExecutor.getId());
            System.out.println("Logview:" + sqlExecutor.getLogView());

            List<Record> records = sqlExecutor.getResult();
            printRecords(records);
        } catch (OdpsException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            sqlExecutor.close();
        }
        sqlExecutorPool.close();
    }

    private static void printRecord(Record record) {
        for (int k = 0; k < record.getColumnCount(); k++) {

            if (k != 0) {
                System.out.print("\t");
            }

            if (record.getColumns()[k].getType().equals(OdpsType.STRING)) {
                System.out.print(record.getString(k));
            } else if (record.getColumns()[k].getType().equals(OdpsType.BIGINT)) {
                System.out.print(record.getBigint(k));
            } else {
                System.out.print(record.get(k));
            }
        }
    }

    private static void printRecords(List<Record> records) {
        for (Record record : records) {
            printRecord(record);
            System.out.println();
        }
    }

    public static void main(String args[]) {
        SimpleExample();
        ExampleWithPool();
    }
}

Enable MCQA by using MaxCompute Studio

MCQA is supported in MaxCompute Studio V3.5.0 and later. We recommend that you install the latest version of MaxCompute Studio. For more information about how to install MaxCompute Studio, see Install MaxCompute Studio.

In the SQL editor of MaxCompute Studio, set the SQL execution mode to UseShortQueryAcceleration or FallbackWhenFailed and execute query statements to enable MCQA.

  • UseShortQueryAcceleration: Use MCQA to execute query statements.

  • FallbackWhenFailed: Use MCQA to roll back the system to the default offline mode and execute SQL query statements again if the SQL query statements fail to be executed.

Enable MCQA by using SQLAlchemy of PyODPS or by using a third-party tool that supports SQLAlchemy interface

PyODPS is integrated with SQLAlchemy that is used to query data in MaxCompute. To enable MCQA, you must specify the following parameters in the connection string:

  • interactive_mode=true: required. This parameter specifies whether to enable MCQA. Set this parameter to true.

  • reuse_odps=true: optional. This parameter specifies whether to forcibly reuse connections. Set this parameter to true. This value indicates that connections are forcibly reused. Connection reuse improves the performance of some third-party tools, such as Apache Superset.

You can configure the fallback_policy=<policy1>,<policy2>,... parameter in the connection string to optimize the processing logic. The rollback policies specified by this parameter are similar to the parameters that are configured when you enable MCQA by using the MaxCompute JDBC driver. These policies are used to specify the rollback operation that is performed when query acceleration fails.

  • generic: The default value is False. If this parameter is set to True, the system rolls back to the offline mode if an unknown error occurs.

  • noresource: The default value is False. If this parameter is set to True, the system rolls back to the offline mode if resources are insufficient.

  • upgrading: The default value is False. If this parameter is set to True, the system rolls back to the offline mode during an upgrade.

  • timeout: The default value is False. If this parameter is set to True, the system rolls back to the offline mode if a connection times out.

  • unsupported: The default value is False. If this parameter is set to True, the system rolls back to the offline mode when the MCQA feature is not supported.

  • default: If you specify this parameter, the system performs a rollback based on the settings of the unsupported, upgrading, noresource, or timeout parameters. If you do not specify fallback_policy in the connection string, the default rollback policy is used.

  • all: The default value is False. If this parameter is set to True, the system rolls back to the offline mode in the preceding scenarios.

You can use the following connection string if you want to enable MCQA and forcible connection reuse, and roll back the system to the offline mode in the scenario of an upgrade, MCQA not supported, or insufficient resources:

odps://<access_id>:<ACCESS_KEY>@<project>/?endpoint=<endpoint>&interactive_mode=true&reuse_odps=true&fallback_policy=unsupported,upgrading,noresource

FAQ

  • Issue 1: When I use a JDBC driver to connect to MaxCompute and run an SQL job that uses subscription resources, the error code ODPS-1800001 appears. What do I do? The following information shows the error details:

    sError:com.aliyun.odps.OdpsException: ODPS-1800001: Session exception - Failed to submit sub-query in session because:Prepaid project run out of free query quota.
    • This error may occur due to one of the following causes:

      MCQA is enabled. This feature is in public preview. If you have purchased a subscription resource plan, you can use this feature free of charge during the public preview. No additional operations are required. When you use this feature free of charge, the maximum number of parallel jobs supported by a MaxCompute project is 5. The number of jobs that can be accelerated free of charge per day is 500. If the number of jobs exceeds 500, the preceding error code appears.

    • Solution:

      Set the alwaysFallback parameter to true to enable MCQA for the JDBC driver. After the configuration is complete, MCQA can accelerate a maximum of 500 query jobs. If the number of jobs exceeds 500, extra MCQA jobs are rolled back to SQL query jobs. For more information about the configuration and parameters, see Tools and features for enabling MCQA.

  • Issue 2: The duration for sending requests and obtaining results when MCQA is enabled by using PyODPS is longer than the duration when MCQA is enabled by using DataWorks. What do I do?

    • This error may occur due to one of the following causes:

      • The wait_for_xxx method is used. As a result, the duration for sending requests and obtaining results is prolonged.

      • The polling interval is long.

    • Solution:

      • If requests can be sent at a fast speed, we recommend that you do not use the wait_for_xxx method. After a request is sent, use Tunnel commands to download the result.

      • Configure instance.wait_for_success(interval=0.1) to reduce the polling interval. Sample statement:

        from odps import ODPS, errors
        
        max_retry_times = 3
        
        def run_sql(odps, stmt):
            retry = 0
            while retry < max_retry_times:
                try:
                    inst = odps.run_sql_interactive(stmt)
                    print(inst.get_logview_address())
                    inst.wait_for_success(interval=0.1)
                    records = []
                    for each_record in inst.open_reader(tunnel=True):
                        records.append(each_record)
                    return records
                except errors.ODPSError as e:
                    retry = retry + 1
                    print("Error: " + str(e) + " retry: " + str(retry) + "/" + str(max_retry_times))
                    if retry >= max_retry_times:
                        raise e
        
        odps = ODPS(...)
        
        run_sql(odps, 'SELECT 1')
  • Issue 3: How do I use Logview to troubleshoot Java SDK errors when MCQA is enabled by using MaxCompute SDK?

    Solution: MaxCompute SDK for Java provides a Logview interface. You can run the following command to call the Logview interface to obtain logs:

    String logview = sqlExecutor.getLogView();
  • Issue 4: How do I obtain the Logview URL of MaxCompute when MCQA is enabled by using the MaxCompute JDBC driver?

    Solution: The MaxCompute JDBC driver is encapsulated based on MaxCompute SDK for Java. A Logview URL is generated when you execute SQL statements by using the MaxCompute client, MaxCompute Studio, or DataWorks. A Logview URL is also generated when you use the MaxCompute JDBC driver to execute SQL statements. You can view the task status, track the task progress, and obtain task results in Logview. You can configure the log output in the properties.log4j file to obtain the Logview URL. By default, the Logview URL is included in standard output logs that are displayed on the terminal screen.