All Products
Search
Document Center

E-MapReduce:Use EMR-CLI to deploy a custom gateway environment

Last Updated:Jan 06, 2026

A gateway submits jobs to a compute cluster and provides security isolation. E-MapReduce provides a tool named EMR-CLI to simplify gateway deployment. This tool creates an Alibaba Cloud ECS instance and deploys a gateway environment on the instance. This topic describes how to deploy a gateway environment after you create a DataLake, Dataflow, or OLAP cluster.

Three Gateway deployment modes and selection guide

A Gateway is an EMR job submission isolation layer that provides the following core benefits:

  • Decoupling client workloads from core cluster services

    It separates client operations, such as spark-submit, hive -f, and yarn application, from the master or Resource Manager nodes.

  • Implementing multi-tenant environment isolation

    It lets you configure independent runtime environments for different users or departments.

  • Improving cluster stability and maintainability

    It prevents issues such as high-frequency submissions, script debugging, environment conflicts, or resource contention from affecting key services such as YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.

EMR offers three Gateway modes. Each mode is suitable for different cluster types, versions, and architectural requirements.

Type

Supported cluster types and version requirements

Deployment method and key features

Scenarios and recommendations

Gateway node group
(Recommended)

Only the following clusters are supported:

  • DataLake and DataFlow clusters: EMR-5.10.1 and later

  • Custom clusters: EMR-5.17.1 and later

• Add a node group directly to an existing cluster. For more information, see Manage node groups.
• Automatically synchronizes client configurations from the associated cluster.

Recommended: Best for quickly adding a secure, isolated submission entry point to existing DataLake or DataFlow clusters. This option offers the lowest O&M costs and ensures high configuration consistency.

Gateway environment

Supports DataLake, DataFlow, Custom, and OLAP clusters

• Manually deploy on an ECS instance. For more information, see Use the EMR command-line interface (CLI) to customize a Gateway environment deployment.
• Provides a completely independent file system and runtime environment. You must manually synchronize client configurations from the associated cluster.

A standard alternative when a cluster does not support Gateway node groups.

Gateway cluster

Supports only Hadoop and Kafka clusters

  • Create a separate EMR cluster that contains only Gateway nodes. For more information, see Create a Gateway cluster.

  • Automatically synchronizes client configurations from the associated cluster.

Suitable for Hadoop and Kafka clusters.

Prerequisites

A DataLake, Dataflow, OLAP, or Custom compute cluster is created in E-MapReduce, and the cluster is in the Running state. For more information about how to create a cluster, see Create a cluster.

Limits

  • Cluster type: This method is only for deploying a gateway environment for DataLake, Dataflow, OLAP, or Custom clusters. If your cluster type and version are compatible, we recommend that you use a Gateway node group.

    For more information about how to deploy a gateway environment for existing Hadoop and Kafka clusters, see Create a gateway cluster.

    Note

    You can create a Hadoop or Kafka cluster after 17:00 on December 19, 2022 (UTC+8) only if your Alibaba Cloud account has been used to create Hadoop or Kafka clusters at the top of and before 17:00 on December 19, 2022 (UTC+8).

  • Overwrite installation: EMR-CLI deploys the gateway client in overwrite mode. If you redeploy a gateway on an ECS instance where a gateway already exists, the new client overwrites the existing client in the same directory.

  • Independent deployment: Do not use an existing ECS instance in the EMR cluster, such as a Master, Core, or Task node, as the gateway machine. This practice prevents the client environment from interfering with the normal operation of cluster services.

  • Supported services: You can use this method to deploy clients for the following services: HDFS, YARN, HBase, HIVE, SPARK2, SPARK3, JINDOSDK, FLINK, SQOOP, IMPALA, PRESTO, HUDI, ICEBERG, TEZ, and DELTALAKE.

Deploy a gateway for the first time

  1. Create an ECS instance in the ECS console. For more information, see Create an instance using the wizard.

    Note

    The created ECS instance does not need public network access.

    The recommended parameter settings are as follows.

    Parameter

    Description

    Region and zone

    Must be the same as the region and zone of the EMR cluster.

    Image

    Must match the system of the EMR instance.

    System disk

    We recommend that you use an enterprise SSD (ESSD) of at least 60 GiB.

    Network

    Must be the same as the VPC of the EMR cluster.

    Security group

    Must be the same as the security group of the Master instance group of the EMR cluster. This ensures network connectivity between the ECS instance and the EMR cluster.

  2. Create a dedicated ECS RAM role for the EMR Gateway.

    1. Log on to the RAM console as a RAM administrator.

    2. In the navigation pane on the left, choose Identity Management > Roles.

    3. On the Roles page, click Create Role.

    4. In the Create Role panel, set Trusted Entity Type to Alibaba Cloud Service and Trusted Service to ECS. Then, click OK.

    5. Enter a Role Name, such as ECSForEMRGatewayRole, and then click OK.

  3. Grant permissions to the RAM role.

    1. On the Permission Management tab, click Grant Permission.

    2. In the Grant Permission panel, select AliyunEMRFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess from System Policy, and then click OK.

      image.png

    3. Click Close.

  4. Grant the RAM role to the ECS instance.

    1. Log on to the ECS console.

    2. In the navigation pane on the left, choose Instances & Images > Instances.

    3. In the top navigation bar, select a region.

    4. Find the new ECS instance and choose image > Instance Settings > Grant/Revoke RAM Role.

    5. In the dialog box that appears, select the ECSForEMRGatewayRole role and click OK.

  5. Connect to the ECS instance. For more information, see Connect to an ECS instance.

  6. Run the following command to install EMR-CLI.

    regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}

    If the installation is successful, the following information is returned.

    install emrcli success
  7. Run the following command to deploy the EMR Gateway client.

    emrcli gateway deploy \
      --clusterId <ClusterId> \
      --appNames <ApplicationName>

    Modify the following parameters as needed.

    Parameter

    Required

    Description

    clusterId

    Yes

    The ID of the cluster created in EMR.

    appNames

    No

    The application name. To specify multiple applications, separate them with commas (,), for example, HDFS,YARN.

    If you do not specify this parameter, clients for all supported applications in the cluster, such as Hive and HDFS, are deployed by default.

    If the deployment is successful, the following information is returned.

    deployGateway success
    Important

    After the gateway is installed, the JAVA_HOME system environment variable is changed to /usr/lib/jvm/java-1.8.0. You can modify the variable in the /etc/profile.d/emr_env.sh file. However, this modification may affect gateway features. Proceed with caution.

  8. Log on to the ECS instance again for the changes to the system environment variable to take effect.

  9. Optional: Configure domain name resolution for the gateway node.

    Important

    This step is required if the gateway includes the Spark service.

    1. Add a zone. For more information, see Add a built-in authoritative domain name.

    2. Add a DNS record. For more information, see Add a DNS record.

      The parameters are described in the following table.

      Parameter

      Description

      Record Type

      Use the default value A.

      Host Record

      Enter the hostname of the gateway machine. For example, iZ2zea8r0aht2vzbqci****.

      You can run the hostname command to get the hostname.

      Record Value

      Enter the private IP address of the gateway machine.

      You can view it on the node management page.

      TTL Value

      Use the default value.

Manage the gateway environment

After you create the gateway, if new services are added to the associated compute cluster or its service configurations are changed, you can use the following commands to update client components or synchronize the latest configurations.

Update client components

If a new service, such as Flink, is added to the EMR cluster, you can incrementally install the corresponding client on the gateway node. The deploy command overwrites the configurations of installed applications and incrementally installs new applications.

# Example: Add the FLINK client to an environment that already has HDFS and YARN.
emrcli gateway deploy \
  --clusterId <ClusterId> \
  --appNames HDFS,YARN,FLINK

If the update is successful, the following information is returned.

deployGateway success

Synchronize modified configurations from the EMR cluster

If the configurations of a service in the EMR cluster are changed, for example, if you modify core-site.xml in the EMR console, you must manually synchronize the new configurations to the gateway node.

Important

Synchronizing configurations overwrites the existing configurations on the gateway. Proceed with caution.

# Execute the synchronization command.
emrcli gateway refreshConfigs \
  --clusterId <ClusterId> \
  --appNames <ApplicationName> # Optional. Specify the applications to synchronize.

If the synchronization is successful, the following information is returned:

refreshConfiguration success

Manage EMR-CLI

View the EMR-CLI version

You can run the following command to view the EMR-CLI version information.

emrcli version

Information similar to the following is returned.

2.0.0

Upgrade EMR-CLI

To automatically upgrade EMR-CLI to the latest version, repeat the installation step described in the Deploy a gateway for the first time section.

FAQ

Q: How do I switch compute clusters?

A: To switch compute clusters, perform the following steps:

  1. Use the -mv command to manually back up files from the old cluster (the cluster used before the switch) to prevent data loss. The files include the /opt/apps directory, the /etc/taihao-apps directory, and the /etc/profile.d/yarn.sh file.

  2. To redeploy the compute cluster, execute the operations in this topic again.