All Products
Search
Document Center

E-MapReduce:Configure a self-managed ApsaraDB RDS for MySQL database

Last Updated:Oct 23, 2024

This topic describes how to configure a self-managed ApsaraDB RDS for MySQL database as the metadatabase of an E-MapReduce (EMR) DataLake cluster, a custom cluster, or a Hadoop cluster.

Prerequisites

An ApsaraDB RDS for MySQL instance is purchased. For more information, see Create an ApsaraDB RDS for MySQL instance. All EMR clusters support MySQL 5.7, and only clusters of a minor version that is later than EMR V3.35.0 or V5.0.0 support MySQL 8.0.

Note

In this topic, an ApsaraDB RDS for MySQL instance that runs MySQL 5.7 is used.

Limits

When you create an ApsaraDB RDS for MySQL instance, you must set Database Engine to MySQL 5.7 and Edition to High-availability.

Procedure

  1. Step 1: Prepare a metadatadase

    Prepare a metadatabase.

  2. Step 2: Create a cluster

    Create a cluster in the EMR console and associate the cluster with the metadatabase.

  3. (Optional) Step 3: Initialize the Metastore service

    Important
    • If you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.

    • EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.

Step 1: Prepare a metadatabase

  1. Create a database.

    For more information, see Create a database.

  2. Create a standard account and grant read and write permissions to the account.

    For more information, see Create an account.

    Note

    Record the username and password of the account, which are required in Step 2: Create a cluster.

  3. Obtain the internal endpoint of the database.

    1. Configure an IP address whitelist. For more information, see Configure an IP address whitelist.

    2. In the left-side navigation pane of the instance details page, click Database Connection.

    3. On the Database Connection page, click the Copy icon on the right of the internal endpoint to copy the internal endpoint.

      image

      Record the internal endpoint. The endpoint is required in Step 2: Create a cluster.

Step 2: Create a cluster

In the Software Configuration step, configure the parameters described in the following table. For more information about other parameters, see Create a cluster.

DataLake and custom cluster parameter

Hadoop cluster parameter

Description

Metadata

Select Self-managed RDS.

Note

The Metadata parameter is available only if you select the HDFS, YARN, and Hive services for a custom cluster.

javax.jdo.option.ConnectionURL

RDS Endpoint

Specify an endpoint in the jdbc:mysql://rm-xxxxxx.mysql.rds.aliyuncs.com/<Database name> format.

javax.jdo.option.ConnectionUserName

RDS Username

Enter the username recorded in Step 1: Prepare a metadatabase.

javax.jdo.option.ConnectionPassword

RDS Password

Enter the password recorded in Step 1: Prepare a metadatabase.

(Optional) Step 3: Initialize the Metastore service

Important
  • If you created a Hadoop cluster of EMR V3.38.X, EMR V4.9.X, EMR V5.4.X, or a minor version that is earlier than EMR V3.38.X, EMR V4.9.X, or EMR V5.4.X in the previous step, or you changed the metadata storage of an existing cluster to an ApsaraDB RDS for MySQL database, you must initialize the Metastore service.

  • EMR initializes the Hive metadatabase based on the database connection parameters that you configure when you create a DataLake or custom cluster. Skip this step if you use a DataLake or custom cluster.

  1. Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to switch to the hadoop user:

    su - hadoop
  3. Run the following command to initialize the Metastore service:

    schematool -initSchema -dbType mysql

    After the service is initialized, you can use the self-manage ApsaraDB RDS for MySQL database as the Hive metadatabase.

    Note

    The HiveMetaStore and HiveServer2 components of Hive and the ThriftServer component of Spark may be in abnormal status before the Metastore service is initialized. The components are recovered after the Metastore service is initialized.