Configure a self-managed ApsaraDB RDS for MySQL database - E-MapReduce

This topic describes how to configure a self-managed ApsaraDB RDS for MySQL database as the metadatabase of an E-MapReduce (EMR) DataLake cluster, a custom cluster, or a Hadoop cluster.

Prerequisites

An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance. All EMR clusters support MySQL 5.7, and only clusters of a minor version that is later than EMR V3.35.0 or V5.0.0 support MySQL 8.0.

Note

In this topic, an ApsaraDB RDS for MySQL instance that runs MySQL 5.7 is used.

Limits

When you create an ApsaraDB RDS for MySQL instance, you must set Database Engine to MySQL 5.7 and Edition to High-availability.

Procedure

Step 1: Prepare a metadatadase
Prepare a metadatabase.
Step 2: Create a cluster
Create a cluster in the EMR console and associate the cluster with the metadatabase.
(Optional) Step 3: Initialize the Metastore service
Important
- If you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.
- EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.

Step 1: Prepare a metadatabase

Create a database.
For more information, see Create a database.
Create a standard account and grant read and write permissions to the account.
For more information, see Create an account.
Note
Record the username and password of the account, which are required in Step 2: Create a cluster.
Obtain the internal endpoint of the database.
1. Configure an IP address whitelist. For more information, see Configure an IP address whitelist.
2. In the left-side navigation pane of the instance details page, click Database Connection.
3. On the Database Connection page, click the Copy icon on the right of the internal endpoint to copy the internal endpoint.
  Record the internal endpoint. The endpoint is required in Step 2: Create a cluster.

Step 2: Create a cluster

In the Software Configuration step, configure the parameters described in the following table. For more information about other parameters, see Create a cluster.

DataLake and custom cluster parameter	Hadoop cluster parameter	Description
Metadata		Select Self-managed RDS. Note The Metadata parameter is available only if you select the HDFS, YARN, and Hive services for a custom cluster.
javax.jdo.option.ConnectionURL	RDS Endpoint	Specify an endpoint in the `jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name>` format. `rm-xxxxxx.mysql.rds.aliyuncs.com`: Enter the internal endpoint obtained in Step 1: Prepare a metadatabase. `<Database name>`: Enter the database name specified in Step 1: Prepare a metadatabase.
javax.jdo.option.ConnectionUserName	RDS Username	Enter the username recorded in Step 1: Prepare a metadatabase.
javax.jdo.option.ConnectionPassword	RDS Password	Enter the password recorded in Step 1: Prepare a metadatabase.

(Optional) Step 3: Initialize the Metastore service

Important

If you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.
EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.

Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to switch to the hadoop user:
```
su - hadoop
```
Run the following command to initialize the Metastore service:
```
schematool -initSchema -dbType mysql
```
After the service is initialized, you can use the self-manage ApsaraDB RDS for MySQL database as the Hive metadatabase.
Note
The HiveMetaStore and HiveServer2 components of Hive and the ThriftServer component of Spark may be in abnormal status before the Metastore service is initialized. The components are recovered after the Metastore service is initialized.