This topic describes how to configure a self-managed ApsaraDB RDS for MySQL database as the metadatabase of an E-MapReduce (EMR) DataLake cluster, a custom cluster, or a Hadoop cluster.
Prerequisites
An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance. All EMR clusters support MySQL 5.7, and only clusters of a minor version that is later than EMR V3.35.0 or V5.0.0 support MySQL 8.0.
In this topic, an ApsaraDB RDS for MySQL instance that runs MySQL 5.7 is used.
Limits
When you create an ApsaraDB RDS for MySQL instance, you must set Database Engine to MySQL 5.7 and Edition to High-availability.
Procedure
Step 1: Prepare a metadatadase
Prepare a metadatabase.
Create a cluster in the EMR console and associate the cluster with the metadatabase.
(Optional) Step 3: Initialize the Metastore service
ImportantIf you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.
EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.
Step 1: Prepare a metadatabase
Create a database.
For more information, see Create a database.
Create a standard account and grant read and write permissions to the account.
For more information, see Create an account.
NoteRecord the username and password of the account, which are required in Step 2: Create a cluster.
Obtain the internal endpoint of the database.
Configure an IP address whitelist. For more information, see Configure an IP address whitelist.
In the left-side navigation pane of the instance details page, click Database Connection.
On the Database Connection page, click the Copy icon on the right of the internal endpoint to copy the internal endpoint.
Record the internal endpoint. The endpoint is required in Step 2: Create a cluster.
Step 2: Create a cluster
In the Software Configuration step, configure the parameters described in the following table. For more information about other parameters, see Create a cluster.
DataLake and custom cluster parameter | Hadoop cluster parameter | Description | |
Metadata | Select Self-managed RDS. Note The Metadata parameter is available only if you select the HDFS, YARN, and Hive services for a custom cluster. | ||
javax.jdo.option.ConnectionURL | RDS Endpoint | Specify an endpoint in the
| |
javax.jdo.option.ConnectionUserName | RDS Username | Enter the username recorded in Step 1: Prepare a metadatabase. | |
javax.jdo.option.ConnectionPassword | RDS Password | Enter the password recorded in Step 1: Prepare a metadatabase. |
(Optional) Step 3: Initialize the Metastore service
If you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.
EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.
Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to switch to the hadoop user:
su - hadoop
Run the following command to initialize the Metastore service:
schematool -initSchema -dbType mysql
After the service is initialized, you can use the self-manage ApsaraDB RDS for MySQL database as the Hive metadatabase.
NoteThe HiveMetaStore and HiveServer2 components of Hive and the ThriftServer component of Spark may be in abnormal status before the Metastore service is initialized. The components are recovered after the Metastore service is initialized.