All Products
Search
Document Center

AnalyticDB:Use Spark SQL to read and write Delta external tables

Last Updated:Sep 05, 2024

Delta is a data lake table format that can be used based on Object Storage Service (OSS) and supports the UPDATE, DELETE, and INSERT operations. AnalyticDB for MySQL is integrated with Delta. This allows you to use Spark SQL to read and write Delta external tables. This topic describes how to use Spark SQL to read and write Delta external tables.

Prerequisites

  • An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.

  • A job resource group is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster. For more information, see Create a resource group.

  • A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition cluster.

Usage notes

  • The XIHE engine cannot be used to read and write Delta tables.

  • AnalyticDB for MySQL Spark is integrated with and updates the corresponding Delta version, but is not responsible for troubleshooting Delta issues and incompatibility between different Delta versions.

Read and write Delta external tables

The Delta package that is built into AnalyticDB for MySQL can help you read and write Delta external tables by using Spark SQL. If the version of the built-in Delta package (2.0.2) cannot meet your read and write requirements, you can configure a custom Delta package version.

Use the built-in Delta package of AnalyticDB for MySQL

Step 1: Go to the SQL Development page

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and a job resource group.

Step 2: Create an external database and a Delta external table

Note

You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.

  1. Execute the following statement to create a database. If a database already exists, skip this step.

    CREATE DATABASE if not exists external_delta_db
    location "oss://<bucket_name>/test/";      /* The location where you want to create the database. Replace the parameter value with your OSS path. */
  2. Execute the following statement to create a Delta external table:

    CREATE TABLE if not exists external_delta_db.delta_test_tbl (
      id int, 
      name string, 
      age int
    ) using delta 
    partitioned by (age) 
    location "oss://<bucket_name>/test/delta_test_tbl";

Step 3: Write data to the Delta external table

Note

You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.

INSERT

Use one of the following methods to write data to the Delta external table:

  • Method 1: Execute the INSERT INTO statement

    INSERT INTO external_delta_db.delta_test_tbl values(1, 'lisa', 10),(2, 'jams', 10);
  • Method 2: Execute the INSERT OVERWRITE statement

    INSERT OVERWRITE external_delta_db.delta_test_tbl values (2, 'zhangsan', 10), (4, 'lisi', 30);
  • Method 3: Execute the INSERT OVERWRITE statement to write data to a static partition

    INSERT OVERWRITE external_delta_db.delta_test_tbl partition(age=17) values(3, 'anna');
  • Method 4: Execute the INSERT OVERWRITE statement to write data to a dynamic partition

    INSERT OVERWRITE external_delta_db.delta_test_tbl partition (age) values (1, 'bom', 10);

UPDATE

Execute the following statement to update data. In this example, the data in the name column with its id value being 1 is updated to box.

UPDATE external_delta_db.delta_test_tbl set name = 'box' where id = 1;

DELETE

Execute the following statement to delete data. In this example, the row whose id value is 1 is deleted.

DELETE FROM external_delta_db.delta_test_tbl where id = 1;

Step 4: Query data

Note
  • You can execute the following SQL statement in batch or interactive mode. For more information, see Spark SQL execution modes.

  • When you execute a Spark SQL statement, the system returns a message indicating the success or failure of the execution but does not return data. To view data, you can go to the Spark JAR Development page, click the Applications tab, and then click Log in the Actions column corresponding to your application. For more information, see the "View information about a Spark application" section of the Spark editor topic.

Execute the following statement to query data of the Delta external table:

SELECT * FROM external_delta_db.delta_test_tbl;

Use custom Delta packages

Important

To prevent version incompatibility, we recommend that you use a Delta package version that is compatible with AnalyticDB for MySQL Spark 3.2.0.

Step 1: Go to the SQL Development page

  1. Log on to the AnalyticDB for MySQL console.
  2. In the upper-left corner of the page, select a region.
  3. In the left-side navigation pane, click Clusters.
  4. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the Cluster ID.
  5. In the left-side navigation pane, choose Job Development > SQL Development.

  6. On the SQLConsole tab, select the Spark engine and a job resource group.

Step 2: Create an external database and a Delta external table

Note

You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.

  1. Execute the following statement to create a database. If a database already exists, skip this step.

    add jar oss://<bucket_name>/path/to/delta-core_xx.jar;    /* A custom Delta package. You must manually upload the package to OSS. */
    
    add jar oss://<bucket_name>/path/to/delta-storage-xx.jar; /* A custom Delta package. You must manually upload the package to OSS. */
    
    SET spark.adb.connectors=oss;   /* The connector of AnalyticDB for MySQL Spark. Set the connector to OSS. */ 
    
    SET spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension;  /* An open source Spark parameter. */
    
    SET spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog;   /* An open source Spark parameter. */
    
    CREATE DATABASE if not exists external_delta_db
    location "oss://<bucket_name>/test/";          /* The location where you want to create the database. Replace the parameter value with your OSS path. */
  2. Execute the following statement to create a Delta external table:

    CREATE TABLE if not exists external_delta_db.delta_test_tbl (
      id int, 
      name string, 
      age int
    ) using delta 
    partitioned by (age) 
    location "oss://<bucket_name>/test/delta_test_tbl";

Step 3: Write data to the Delta external table

Note

You can execute the following SQL statements in batch or interactive mode. For more information, see Spark SQL execution modes.

INSERT

Use one of the following methods to write data to the Delta external table:

  • Method 1: Execute the INSERT INTO statement

    INSERT INTO external_delta_db.delta_test_tbl values(1, 'lisa', 10),(2, 'jams', 10);
  • Method 2: Execute the INSERT OVERWRITE statement

    INSERT OVERWRITE external_delta_db.delta_test_tbl values (2, 'zhangsan', 10), (4, 'lisi', 30);
  • Method 3: Execute the INSERT OVERWRITE statement to write data to a static partition

    INSERT OVERWRITE external_delta_db.delta_test_tbl partition(age=17) values(3, 'anna');
  • Method 4: Execute the INSERT OVERWRITE statement to write data to a dynamic partition

    INSERT OVERWRITE external_delta_db.delta_test_tbl partition (age) values (1, 'bom', 10);

UPDATE

Execute the following statement to update data. In this example, the data in the name column with its id value being 1 is updated to box.

UPDATE external_delta_db.delta_test_tbl set name = 'box' where id = 1;

DELETE

Execute the following statement to delete data. In this example, the row whose id value is 1 is deleted.

DELETE FROM external_delta_db.delta_test_tbl where id = 1;

Step 4: Query data

Note
  • You can execute the following SQL statement in batch or interactive mode. For more information, see Spark SQL execution modes.

  • When you execute a Spark SQL statement, the system returns a message indicating the success or failure of the execution but does not return data. To view data, you can go to the Spark JAR Development page, click the Applications tab, and then click Log in the Actions column corresponding to your application. For more information, see the "View information about a Spark application" section of the Spark editor topic.

Execute the following statement to query data of the Delta external table:

SELECT * FROM external_delta_db.delta_test_tbl;