AnalyticDB for MySQL allows you to submit a Spark SQL job to access self-managed MySQL databases, ApsaraDB RDS for MySQL, or PolarDB for MySQL in view mode or catalog mode. This topic describes how to use Spark SQL to access ApsaraDB RDS for MySQL data.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
The AnalyticDB for MySQL cluster resides in the same region as an ApsaraDB RDS for MySQL instance.
A job resource group is created for the AnalyticDB for MySQL cluster. For more information, see Create a resource group.
A database account is created for the AnalyticDB for MySQL cluster.
If you use an Alibaba Cloud account, you must create a privileged account. For more information, see the "Create a privileged account" section of the Create a database account topic.
If you use a Resource Access Management (RAM) user, you must create both a privileged account and a standard account and associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user.
The ApsaraDB RDS for MySQL instance is added to a security group. The inbound and outbound rules of the security group allow requests from the port of the ApsaraDB RDS for MySQL instance. For more information, see Configure a security group for an ApsaraDB RDS for MySQL instance and Add a security group rule.
Step 1: Prepare data
Create a database in the ApsaraDB RDS for MySQL instance, create a table in the database, and then insert data into the table. Sample statements:
CREATE DATABASE `db`;
CREATE TABLE `db`.`test` (
`id` int(11) DEFAULT NULL,
`first_name` varchar(32) DEFAULT NULL,
`laster_name` varchar(32) DEFAULT NULL,
`age` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;
INSERT INTO test VALUES(1,'a','b',5);
INSERT INTO test VALUES(2,'c','d',6);
INSERT INTO test VALUES(3,'e','f',7);
Step 2: (Optional) Upload a CA certificate and the ApsaraDB RDS for MySQL driver
If you do not want to access ApsaraDB RDS for MySQL data over an SSL connection, you can skip this step and submit a Spark SQL job. For more information, see the "Step 3: Submit a Spark SQL job" section of this topic.
Enable SSL encryption for the ApsaraDB RDS for MySQL instance and download a CA certificate. For more information, see Configure the SSL encryption feature.
ImportantWhen you enable SSL encryption, you can select whether to encrypt an internal connection or a public connection. In this example, an internal connection is encrypted.
The default validity period of a CA certificate is one year. If the CA certificate expires, you must generate another certificate. You cannot use an expired CA certificate to access ApsaraDB RDS for MySQL over an SSL connection.
Download the driver that is compatible with the engine version of the ApsaraDB RDS for MySQL instance from https://dev.mysql.com/downloads/connector/j/.
In this example, the mysql-connector-java-8.0.29.jar package is used.
Decompress the CA certificate package. Upload the JKS file and the ApsaraDB RDS for MySQL driver to Object Storage Service (OSS). For more information, see Upload objects.
Step 3: Submit a Spark SQL job
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Job Development > SQL Development.
On the SQLConsole tab, select the Spark engine and a job resource group.
On the SQLConsole tab, configure the parameters based on different access modes.
View mode
Catalog mode
NoteFor information about more parameters that are required to access data sources by using Java Database Connectivity (JDBC), see JDBC To Other Databases.