AnalyticDB for MySQL Data Lakehouse Edition (V3.0) allows you to access ApsaraDB RDS for MySQL within an Alibaba Cloud account or across Alibaba Cloud accounts. You can access ApsaraDB RDS for MySQL over an elastic network interface (ENI) or an SSL connection. Compared with the ENI-based access, the SSL-based access provides network connection encryption and ensures higher data security. This topic describes how to access ApsaraDB RDS for MySQL over an ENI and an SSL connection.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster and an ApsaraDB RDS for MySQL instance are created in the same region For more information, see Create a cluster and Create an ApsaraDB RDS for MySQL instance.
A job resource group is created. For more information, see Create a resource group.
A database account is created.
If you use an Alibaba Cloud account, you must create a privileged database account. For more information, see the "Create a privileged account" section of the Create a database account topic.
If you use a Resource Access Management (RAM) user, you must create a privileged account and a standard account and associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user.
The ApsaraDB RDS for MySQL instance is added to a security group. The inbound and outbound rules of the security group allow requests from the port of the ApsaraDB RDS for MySQL instance. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance and Add a security group rule.
Object Storage Service (OSS) is activated and a bucket is created in the same region as the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Activate OSS and Create buckets.
Authorization is complete. For more information, see Perform authorization.
ImportantTo access data within an Alibaba Cloud account, you must have the AliyunADBSparkProcessingDataRole permission. To access data across Alibaba Cloud accounts, you must perform authorization for other Alibaba Cloud accounts.
Prepare data
Create a database in the ApsaraDB RDS for MySQL instance, create a table in the database, and then write data to this table. Sample statements:
CREATE DATABASE `test`;
CREATE TABLE `test`.`persons` (
`id` int(11) DEFAULT NULL,
`first_name` varchar(32) DEFAULT NULL,
`laster_name` varchar(32) DEFAULT NULL,
`age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;
INSERT INTO persons VALUES(1,'a','b',5);
INSERT INTO persons VALUES(2,'c','d',6);
INSERT INTO persons VALUES(3,'e','f',7);
Access ApsaraDB RDS for MySQL over an ENI
Upload the driver and the JAR package that is required for the Spark job
Write a program that is used to access a table in the ApsaraDB RDS for MySQL instance. Then, compile the program into a JAR package that is required for the Spark job. In this example, the JAR package is named
rds_test.jar
. Sample code:package com.aliyun.spark import org.apache.spark.sql.SparkSession object SparkRDS { def main(args: Array[String]): Unit = { val sparkSession = SparkSession.builder() .appName("rds mysql test") .getOrCreate() // The internal endpoint of the ApsaraDB RDS for MySQL instance. For more information, see View and change the endpoints and port numbers. val url = "jdbc:mysql://rm-bp11mpql1e01****.mysql.rds.aliyuncs.com" // The name of the table in the ApsaraDB RDS for MySQL instance. Specify the name in the db_name.table_name format. val dbtable = "test.persons" // The database account that is used to connect to the ApsaraDB RDS for MySQL database. val user = "mysql_username" // The password of the database account. val password = "mysql_password" val jdbcDF = sparkSession.read .format("jdbc") .option("url", url) .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", dbtable) .option("user", user) .option("password", password) .load() jdbcDF.show() } }
Download the driver that is compatible with the engine version of the ApsaraDB RDS for MySQL instance from https://dev.mysql.com/downloads/connector/j/.
In this example, mysql-connector-java-8.0.11.jar is downloaded.
Upload the JAR package and the driver to OSS. For more information, see Upload objects.
Access ApsaraDB RDS for MySQL within an Alibaba Cloud account
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose
.Select a job resource group and a job type for the Spark job. In this example, the Batch type is selected.
Run the following code in the Spark editor:
{ "name": "rds-mysql-example", "jars": [ "oss://testBucketName/mysql-connector-java-8.0.11.jar" ], "file": "oss://testBucketName/rds_test.jar", "className": "com.aliyun.spark.SparkRDS", "conf": { "spark.adb.eni.enabled": "true", "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****", "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****", "spark.driver.resourceSpec": "small", "spark.executor.instances": 1, "spark.executor.resourceSpec": "small" } }
The following table describes the parameters.
Parameter
Description
name
The name of the Spark job.
jars
The OSS path of the ApsaraDB RDS for MySQL driver.
In this example, the OSS path of the mysql-connector-java-8.0.11.jar package is used.
file
The OSS path of the JAR package that is required for the Spark job.
className
The entry class of the Java or Scala application.
In this example, com.aliyun.spark.SparkRDS is used.
spark.adb.eni.enabled
Specifies whether to enable ENI.
spark.adb.eni.vswitchId
The vSwitch ID of the ApsaraDB RDS for MySQL instance. To view the vSwitch ID, move the pointer over VPC on the Database Connection page.
spark.adb.eni.securityGroupId
The ID of the security group to which the ApsaraDB RDS for MySQL instance is added. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance.
conf
The configuration parameters that are required for the Spark job, which are similar to those of Apache Spark. The parameters must be in the
key:value
format. Separate multiple parameters with commas (,). For more information, see Spark application configuration parameters.Click Run Now.
After the Spark job is run, view the data of the ApsaraDB RDS for MySQL table in the logs of the Spark job. For more information, see the "View information about a Spark application" section of the Spark editor topic.
Access ApsaraDB RDS for MySQL across Alibaba Cloud accounts
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
ImportantIf you want to access ApsaraDB RDS for MySQL across Alibaba Cloud accounts, you must log on to the AnalyticDB for MySQL console as the RAM user that is created for the new Alibaba Cloud account. For more information, see Perform authorization.
In the left-side navigation pane, choose
.Select a job resource group and a job type for the Spark job. In this example, the Batch type is selected.
Run the following code in the Spark editor:
{ "name": "rds-mysql-example", "jars": [ "oss://testBucketName/mysql-connector-java-8.0.11.jar" ], "file": "oss://testBucketName/rds_test.jar", "className": "com.aliyun.spark.SparkRDS", "conf": { "spark.adb.eni.enabled": "true", "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****", "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****", "spark.driver.resourceSpec": "small", "spark.executor.instances": 1, "spark.executor.resourceSpec": "small" "spark.adb.eni.roleArn":"acs:ram::testAccountID:role/testUserName" } }
The following table describes the parameters.
Parameter
Description
spark.adb.eni.roleArn
The RAM role that is used to access ApsaraDB RDS for MySQL across Alibaba Cloud accounts. Separate multiple roles with commas (,). Specify the parameter in the
acs:ram::testAccountID:role/testUserName
format.testAccountID
: the ID of the Alibaba Cloud account that owns the ApsaraDB RDS for MySQL data source.testUserName
: the name of the RAM role that is created when you perform authorization across Alibaba Cloud accounts. For more information, see the "Perform authorization across Alibaba Cloud accounts" section of the Perform authorization topic.
For more information about the parameters, see the preceding parameter table.
Click Run Now.
After the Spark job is run, view the data of the ApsaraDB RDS for MySQL table in the logs of the Spark job. For more information, see the "View information about a Spark application" section of the Spark editor topic.
Access ApsaraDB RDS for MySQL over an SSL connection
Before you access ApsaraDB RDS for MySQL over an SSL connection, make sure that SSL encryption is enabled for the ApsaraDB RDS for MySQL instance and the connection to the internal endpoint is encrypted. For more information, see Configure the SSL encryption feature.
Download the CA certificate from the ApsaraDB RDS console and upload the certificate to OSS
Log on to the ApsaraDB RDS console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Instances. On the page that appears, click the ID of the instance that you want to view.
In the left-side navigation pane, click Data Security.
Click Download CA Certificate.
ImportantThe default validity period of a CA certificate is one year. If the CA certificate expires, you must generate another certificate. You cannot use an expired CA certificate to access ApsaraDB RDS for MySQL over an SSL connection.
Decompress the CA certificate package and upload the extracted JKS certificate file to OSS. For more information, see Upload objects.
Upload the driver and the JAR package that is required for the Spark job
Write a program that is used to access a table in the ApsaraDB RDS for MySQL instance. Then, compile the program into a JAR package that is required for the Spark job. In this example, the JAR package is named
test.jar
. Sample code:package org.example import org.apache.spark.sql.SparkSession object Test { def main(args: Array[String]): Unit = { // The OSS path of the JKS certificate file, which is in the oss://testBucketName/folder/ApsaraDB-CA-Chain.jks format. val JKS_FILE_PATH = args(0) // The database account that is used to connect to the ApsaraDB RDS for MySQL database. val USERNAME = args(1) // The password of the database account. val PASSWORD = args(2) The name of the ApsaraDB RDS for MySQL database. val DATABASE_NAME = args(3) // The name of the table in the ApsaraDB RDS for MySQL instance. val TABLE_NAME = args(4) // The internal endpoint of the ApsaraDB RDS for MySQL instance. val mysqlUrl = "jdbc:mysql://rm-bp11mpql1e01****.mysql.rds.aliyuncs.com:3306/?" + "useSSL=true" + s"&trustCertificateKeyStoreUrl=file:///tmp/testBucketName/folder/ApsaraDB-CA-Chain.jks" + "&trustCertificateKeyStorePassword=apsaradb" + "&trustCertificateKeyStoreType=JKS" + val spark = SparkSession.builder().getOrCreate() spark.read.format("jdbc") .option("driver", "com.mysql.cj.jdbc.Driver") .option("url", mysqlUrl) .option("user", USERNAME) .option("password", PASSWORD) .option("dbtable", s"${DATABASE_NAME}.${TABLE_NAME}") .load() .show() } }
The following table describes the parameters.
Parameter
Description
useSSL
Specifies whether to enable SSL encryption. Valid values:
true
false (default)
In this example, true is selected.
trustCertificateKeyStoreUrl
The on-premises path of the JKS certificate file, which is in the
file:///tmp/<JKS_FILE_PATH>
format.JKS_FILE_PATH
specifies the OSS path of the JKS certificate file.For example, if the OSS path of the JKS certificate file is
oss://testBucketName/folder/ApsaraDB-CA-Chain.jks
, the on-premises path of the JKS certificate file isfile:///tmp/testBucketName/folder/ApsaraDB-CA-Chain.jks
.trustCertificateKeyStorePassword
The password of the JKS certificate file. The value of this parameter is apsaradb.
trustCertificateKeyStoreType
The storage format of the certificate file. The value of this parameter is JKS.
Upload the
test.jar
package to OSS. For more information, see Upload objects.
Access ApsaraDB RDS for MySQL within an Alibaba Cloud account
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose
.Select a job resource group and a job type for the Spark job. In this example, the Batch type is selected.
Run the following code in the Spark editor:
{ "file": "oss://testBucketName/test.jar", "className": "org.example.Test", "name": "MYSQL PEM Test", "conf": { "spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks", "spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks", "spark.driver.resourceSpec": "small", "spark.executor.instances": 1, "spark.executor.resourceSpec": "small", "spark.adb.eni.enabled": "true", "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****", "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****" } }
The following table describes the parameters.
Parameter
Description
name
The name of the Spark job.
file
The OSS path of the JAR package that is required for the Spark job.
className
The entry class of the Java or Scala application.
In this example, org.example.Test is used.
spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES
The Spark driver parameter, which specifies the OSS path of the JKS certificate file. Separate multiple JKS certificate files with commas (,). Example:
spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES: "oss://testBucketName/a.jks,oss://testBucketName/b.jks"
.spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES
The Spark executor parameter, which specifies the OSS path of the JKS certificate file. Separate multiple JKS certificate files with commas (,). Example:
spark.executor.ADB_SPARK_DOWNLOAD_FILES: "oss://testBucketName/a.jks,oss://testBucketName/b.jks"
.spark.adb.eni.enabled
Specifies whether to enable ENI.
spark.adb.eni.vswitchId
The vSwitch ID of the ApsaraDB RDS for MySQL instance. To view the vSwitch ID, move the pointer over VPC on the Database Connection page.
spark.adb.eni.securityGroupId
The ID of the security group to which the ApsaraDB RDS for MySQL instance is added. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance.
conf
The configuration parameters that are required for the Spark job, which are similar to those of Apache Spark. The parameters must be in the
key:value
format. Separate multiple parameters with commas (,). For more information, see Spark application configuration parameters.Click Run Now.
After the Spark job is run, view the data of the ApsaraDB RDS for MySQL table in the logs of the Spark job. For more information, see the "View information about a Spark application" section of the Spark editor topic.
Access ApsaraDB RDS for MySQL across Alibaba Cloud accounts
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
ImportantIf you want to access ApsaraDB RDS for MySQL across Alibaba Cloud accounts, you must log on to the AnalyticDB for MySQL console as the RAM user that is created for the new Alibaba Cloud account. For more information, see Perform authorization.
In the left-side navigation pane, choose
.Select a job resource group and a job type for the Spark job. In this example, the Batch type is selected.
Run the following code in the Spark editor:
{ "file": "oss://testBucketName/test.jar", "className": "org.example.Test", "name": "MYSQL PEM Test", "conf": { "spark.kubernetes.driverEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks", "spark.executorEnv.ADB_SPARK_DOWNLOAD_FILES": "oss://testBucketName/folder/ApsaraDB-CA-Chain.jks", "spark.driver.resourceSpec": "small", "spark.executor.instances": 1, "spark.executor.resourceSpec": "small", "spark.adb.eni.enabled": "true", "spark.adb.eni.vswitchId": "vsw-bp17jqw3lrrobn6y****", "spark.adb.eni.securityGroupId": "sg-bp163uxgt4zandx****", "spark.adb.eni.roleArn":"acs:ram::testAccountID:role/testUserName", } }
The following table describes the parameters.
Parameter
Description
spark.adb.eni.roleArn
The RAM role that is used to access ApsaraDB RDS for MySQL across Alibaba Cloud accounts. Separate multiple roles with commas (,). Specify the parameter in the
acs:ram::testAccountID:role/testUserName
format.testAccountID
: the ID of the Alibaba Cloud account that owns the ApsaraDB RDS for MySQL data source.testUserName
: the name of the RAM role that is created when you perform authorization across Alibaba Cloud accounts. For more information, see the "Perform authorization across Alibaba Cloud accounts" section of the Perform authorization topic.
For more information about the parameters, see the preceding parameter table.
Click Run Now.
After the Spark job is run, view the data of the ApsaraDB RDS for MySQL table in the logs of the Spark job. For more information, see the "View information about a Spark application" section of the Spark editor topic.