All Products
Search
Document Center

AnalyticDB:Use Spark SQL to read MySQL data

Last Updated:Aug 30, 2024

AnalyticDB for MySQL allows you to submit a Spark SQL job to access self-managed MySQL databases, ApsaraDB RDS for MySQL, or PolarDB for MySQL in view mode or catalog mode. This topic describes how to use Spark SQL to access ApsaraDB RDS for MySQL data.

Prerequisites

Step 1: Prepare data

Create a database in the ApsaraDB RDS for MySQL instance, create a table in the database, and then insert data into the table. Sample statements:

CREATE DATABASE `db`;

CREATE TABLE `db`.`test` (
  `id` int(11) DEFAULT NULL,
  `first_name` varchar(32) DEFAULT NULL,
  `laster_name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

INSERT INTO test VALUES(1,'a','b',5);
INSERT INTO test VALUES(2,'c','d',6);
INSERT INTO test VALUES(3,'e','f',7);

Step 2: (Optional) Upload a CA certificate and the ApsaraDB RDS for MySQL driver

Note

If you do not want to access ApsaraDB RDS for MySQL data over an SSL connection, you can skip this step and submit a Spark SQL job. For more information, see the "Step 3: Submit a Spark SQL job" section of this topic.

  1. Enable SSL encryption for the ApsaraDB RDS for MySQL instance and download a CA certificate. For more information, see Configure the SSL encryption feature.

    Important
    • When you enable SSL encryption, you can select whether to encrypt an internal connection or a public connection. In this example, an internal connection is encrypted.

    • The default validity period of a CA certificate is one year. If the CA certificate expires, you must generate another certificate. You cannot use an expired CA certificate to access ApsaraDB RDS for MySQL over an SSL connection.

  2. Download the driver that is compatible with the engine version of the ApsaraDB RDS for MySQL instance from https://dev.mysql.com/downloads/connector/j/.

    In this example, the mysql-connector-java-8.0.29.jar package is used.

  3. Decompress the CA certificate package. Upload the JKS file and the ApsaraDB RDS for MySQL driver to Object Storage Service (OSS). For more information, see Upload objects.

Step 3: Submit a Spark SQL job

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.

  2. In the left-side navigation pane, choose Job Development > SQL Development.

  3. On the SQLConsole tab, select the Spark engine and a job resource group.

  4. On the SQLConsole tab, configure the parameters based on different access modes.

    View mode

    If you access data in view mode, a temporary view is created. Every time you start a job, you must create a view. In view mode, you can select whether to access ApsaraDB RDS for MySQL data over an elastic network interface (ENI) or an SSL connection.

    Access ApsaraDB RDS for MySQL data over an ENI

    set spark.adb.eni.enabled=true;
    set spark.adb.eni.vswitchId=<vsw-bp1sxxsodv28ey5dl****>;   
    set spark.adb.eni.securityGroupId=<sg-bp19mr685pmg4ihc****>;    
    
    CREATE TEMPORARY VIEW table_tmp
    USING org.apache.spark.sql.jdbc
    OPTIONS (
      url '<jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306>',     
      dbtable '<db.test>',    
      user '<user>',       
      password '<password>'      
    );
    
    select * from table_tmp;

    The following table describes the parameters.

    Parameter

    Description

    spark.adb.eni.enabled

    Specifies whether to enable ENI.

    When you access data, you must set the spark.adb.eni.enabled parameter to true.

    spark.adb.eni.vswitchId

    The vSwitch ID of the ApsaraDB RDS for MySQL instance. To view the vSwitch ID, move the pointer over VPC on the Database Connection page.

    spark.adb.eni.securityGroupId

    The ID of the security group to which the ApsaraDB RDS for MySQL instance is added. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance.

    table_tmp

    The name of the view. In this example, table_tmp is used.

    USING org.apache.spark.sql.jdbc

    The value of this parameter is automatically set to USING org.apache.spark.sql.jdbc.

    OPTIONS

    • url: the internal endpoint and port number of the ApsaraDB RDS for MySQL instance. Format: jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306.

    • dbtable: the name of the table in the ApsaraDB RDS for MySQL instance. Format: db_name.table_name. In this example, db_1.table_11 is used.

    • user: the name of the database account of the ApsaraDB RDS for MySQL instance.

    • password: the password of the database account.

    Access ApsaraDB RDS for MySQL data over an SSL connection

    add jar oss://<bucketname>/mysql-connector-java-8.0.11.jar;
    set spark.app.name=SSL_RDS_SQL;
    set spark.adb.eni.enabled=true;
    set spark.adb.eni.vswitchId=<vsw-bp1sxxsodv28ey5dl****>;
    set spark.adb.eni.securityGroupId=<sg-bp19mr685pmg4ihc****>;
    set spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES=oss://<bucketname>/ApsaraDB-CA-Chain.jks;
    set spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES=oss://<bucketname>/ApsaraDB-CA-Chain.jks;
    
    CREATE TEMPORARY VIEW table_view
    USING org.apache.spark.sql.jdbc
    OPTIONS (
      url '<jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306/?useSSL=true&clientCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&clientCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&trustCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreType=JKS&clientCertificateKeyStoreType=JKS>',
      dbtable '<db.test>',
      user '<user>',
      password '<password>'
    );

    The following table describes the parameters.

    Parameter

    Description

    add jar

    The OSS path of the ApsaraDB RDS for MySQL driver.

    In this example, the OSS path of the mysql-connector-java-8.0.11.jar package is used.

    spark.app.name

    The name of the Spark SQL job.

    spark.adb.eni.enabled

    Specifies whether to enable ENI.

    When you access data, you must set the spark.adb.eni.enabled parameter to true.

    spark.adb.eni.vswitchId

    The vSwitch ID of the ApsaraDB RDS for MySQL instance. To view the vSwitch ID, move the pointer over VPC on the Database Connection page.

    spark.adb.eni.securityGroupId

    The ID of the security group to which the ApsaraDB RDS for MySQL instance is added. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance.

    spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES

    The Spark driver parameter, which specifies the OSS path of the JKS certificate file. Separate multiple JKS certificate files with commas (,). Example: spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES: "oss://testBucketName/a.jks,oss://testBucketName/b.jks".

    spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES

    The Spark executor parameter, which specifies the OSS path of the JKS certificate file. Separate multiple JKS certificate files with commas (,). Example: spark.executor.ADB_SPARK_DOWNLOAD_FILES: "oss://testBucketName/a.jks,oss://testBucketName/b.jks".

    table_view

    The name of the view. In this example, table_view is used.

    USING org.apache.spark.sql.jdbc

    The value of this parameter is automatically set to USING org.apache.spark.sql.jdbc.

    OPTIONS

    • url: the internal endpoint, port number, and SSL parameters of the ApsaraDB RDS for MySQL instance. Format: jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306/?useSSL=true&clientCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&clientCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreUrl=file:///tmp/<bucketname>/folder/ApsaraDB-CA-Chain.jks&trustCertificateKeyStorePassword=apsaradb&trustCertificateKeyStoreType=JKS&clientCertificateKeyStoreType=JKS.

      For information about the SSL parameters, see the "Parameters" section of the Access ApsaraDB RDS for MySQL topic.

    • dbtable: the name of the table in the ApsaraDB RDS for MySQL instance. Format: db_name.table_name. In this example, db.test is used.

    • user: the name of the database account of the ApsaraDB RDS for MySQL instance.

    • password: the password of the database account.

    Catalog mode

    set spark.adb.eni.enabled=true;
    set spark.adb.eni.vswitchId=<vsw-bp1d14ddiw46fkgu1****>;
    set spark.adb.eni.securityGroupId=<sg-bp19varsa8j0hyb****>;
    set spark.sql.catalog.jdbc=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog;
    set spark.sql.catalog.jdbc.url=<jdbc:mysql://rm-bp11mpql1e01l****.mysql.rds.aliyuncs.com:3306>;
    set spark.sql.catalog.jdbc.user=<user>;
    set spark.sql.catalog.jdbc.password=<password>;
    use jdbc;
    select * from db.test;

    The following table describes the parameters.

    Parameter

    Description

    spark.sql.catalog.jdbc

    The method that is used by Spark SQL to configure data sources.

    The value of this parameter is automatically set to org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.

    Note

    You can configure a custom catalog_name value for the spark.sql.catalog.catalog_name parameter. In this example, jdbc is used.

    spark.sql.catalog.jdbc.url

    The internal endpoint and port number of the ApsaraDB RDS for MySQL instance. Format: jdbc:mysql://rm-bp1k87323a7ia****.mysql.rds.aliyuncs.com:3306.

    spark.sql.catalog.jdbc.user

    The name of the database account of the ApsaraDB RDS for MySQL instance.

    spark.sql.catalog.jdbc.password

    The password of the database account.

    Note

    For information about more parameters that are required to access data sources by using Java Database Connectivity (JDBC), see JDBC To Other Databases.