Spark SQL execution modes - AnalyticDB - Alibaba Cloud Documentation Center

AnalyticDB for MySQL Data Lakehouse Edition (V3.0) provides two Spark SQL execution modes: batch and interactive. In each mode, you can read and write databases and tables of AnalyticDB for MySQL by using the metadata feature of AnalyticDB for MySQL. This topic describes the usage notes, scenarios, features, and startup methods of two Spark SQL execution modes.

Batch execution mode

Usage notes

When you execute an SQL statement in batch mode, you must execute the USE <database_name>; statement to select a database first.
When you specify a table in an SQL statement, you must specify the table in the database_name.table_name format.
When you execute a DML, DDL, or DQL statement in batch mode, the system returns a message of execution success or failure, but does not return data. Sample results of successful SQL statements are displayed in logs. For information about how to view the returned data of SQL statements, see the "View information about a Spark application" section of the Spark editor topic.

Scenarios

Mutually dependent SQL statements are executed.
Resource isolation is highly required for SQL statements.
A large amount of data is involved. For example, an SQL statement is executed to perform extract-transform-load (ETL) operations at one time.
Complex third-party dependency packages must be uploaded and may be repeatedly tested and replaced.

Features

An SQL statement that is submitted in batch mode runs in an individual Spark application to ensure stability.
You can execute an SQL statement to describe an independent configuration, such as SET spark.sql.adaptive.coalescePartitions.minPartitionSize = 2MB;.
If SELECT statements are contained in the SQL statements that are executed in batch mode, sample execution results of the SELECT statements are displayed in logs.

Startup methods

On the SQLConsole tab, select the Spark engine and a job resource group. When you execute an SQL statement, click Proceed in the message that appears.

Interactive execution mode

Usage notes

When you execute a DDL or DML statement in interactive mode, the system returns up to 1,000 rows of result data.
When you execute a DDL statement in interactive mode, the system returns a message of execution success or failure but does not return data. For example, if you execute the CREATE TABLE statement, the system returns a message of execution success or failure, but does not return the table data. This is consistent with the open source SQL.
If you enter multiple SQL statements that end with semicolons (;), the Spark engine executes only the last SQL statement.
A period of time is required to start a Thrift server. If the Thrift server fails to be started, you can wait a while and try again.

Scenarios

A data computing operation that does not require all data to be returned is performed.
A large number of DDL statements must be executed.
A DQL statement must be executed immediately after it is submitted. The execution of the DQL statement does not impose high requirements on resource isolation and allows resource isolation to fail.

Features

The SQL statements in the same resource group are executed by a shared Thrift server, and up to 1,000 rows of result data are displayed in the AnalyticDB for MySQL console.
Resources are isolated at the thread level. If multiple users execute SQL statements in the same Spark application, the SQL statements may intervene with each other.
After you configure SQL statements, thread-level configurations take effect.
Application-level configurations take effect only after you restart the Thrift server. To modify application-level configurations, stop the Thrift server, reconfigure the parameters, and then restart the Thrift server. For more information, see Start and stop a Thrift server.

Startup methods

If you use a job resource group to execute SQL statements in interactive mode, you must start a Thrift server. To start a Thrift server, use one of the following methods:

On the SQLConsole tab, select the Spark engine and a job resource group. When you execute an SQL statement, click Start Thrift Server in the message that appears.
On the Resource Groups tab, find the job resource group for which you want to start a Thrift server and click Configure Thrift Server in the Actions column. Configure the parameters and click Start. For more information, see the "Start a Thrift server" section of the Start and stop a Thrift server topic.

After you start a Thrift server for a job resource group, SQL statements are executed in interactive mode.