A gateway submits jobs to a compute cluster and provides security isolation. E-MapReduce provides a tool named EMR-CLI to simplify gateway deployment. This tool creates an Alibaba Cloud ECS instance and deploys a gateway environment on the instance. This topic describes how to deploy a gateway environment after you create a DataLake, Dataflow, or OLAP cluster.
Three Gateway deployment modes and selection guide
A Gateway is an EMR job submission isolation layer that provides the following core benefits:
Decoupling client workloads from core cluster services
It separates client operations, such as
spark-submit,hive -f, andyarn application, from the master or Resource Manager nodes.Implementing multi-tenant environment isolation
It lets you configure independent runtime environments for different users or departments.
Improving cluster stability and maintainability
It prevents issues such as high-frequency submissions, script debugging, environment conflicts, or resource contention from affecting key services such as YARN ResourceManager and Hadoop Distributed File System (HDFS) NameNode.
EMR offers three Gateway modes. Each mode is suitable for different cluster types, versions, and architectural requirements.
Type | Supported cluster types and version requirements | Deployment method and key features | Scenarios and recommendations |
Gateway node group | Only the following clusters are supported:
| • Add a node group directly to an existing cluster. For more information, see Manage node groups. | Recommended: Best for quickly adding a secure, isolated submission entry point to existing DataLake or DataFlow clusters. This option offers the lowest O&M costs and ensures high configuration consistency. |
Gateway environment | Supports DataLake, DataFlow, Custom, and OLAP clusters | • Manually deploy on an ECS instance. For more information, see Use the EMR command-line interface (CLI) to customize a Gateway environment deployment. | A standard alternative when a cluster does not support Gateway node groups. |
Gateway cluster | Supports only Hadoop and Kafka clusters |
| Suitable for Hadoop and Kafka clusters. |
Prerequisites
A DataLake, Dataflow, OLAP, or Custom compute cluster is created in E-MapReduce, and the cluster is in the Running state. For more information about how to create a cluster, see Create a cluster.
Limits
Cluster type: This method is only for deploying a gateway environment for DataLake, Dataflow, OLAP, or Custom clusters. If your cluster type and version are compatible, we recommend that you use a Gateway node group.
For more information about how to deploy a gateway environment for existing Hadoop and Kafka clusters, see Create a gateway cluster.
NoteYou can create a Hadoop or Kafka cluster after 17:00 on December 19, 2022 (UTC+8) only if your Alibaba Cloud account has been used to create Hadoop or Kafka clusters at the top of and before 17:00 on December 19, 2022 (UTC+8).
Overwrite installation: EMR-CLI deploys the gateway client in overwrite mode. If you redeploy a gateway on an ECS instance where a gateway already exists, the new client overwrites the existing client in the same directory.
Independent deployment: Do not use an existing ECS instance in the EMR cluster, such as a Master, Core, or Task node, as the gateway machine. This practice prevents the client environment from interfering with the normal operation of cluster services.
Supported services: You can use this method to deploy clients for the following services: HDFS, YARN, HBase, HIVE, SPARK2, SPARK3, JINDOSDK, FLINK, SQOOP, IMPALA, PRESTO, HUDI, ICEBERG, TEZ, and DELTALAKE.
Deploy a gateway for the first time
Create an ECS instance in the ECS console. For more information, see Create an instance using the wizard.
NoteThe created ECS instance does not need public network access.
The recommended parameter settings are as follows.
Parameter
Description
Region and zone
Must be the same as the region and zone of the EMR cluster.
Image
Must match the system of the EMR instance.
System disk
We recommend that you use an enterprise SSD (ESSD) of at least 60 GiB.
Network
Must be the same as the VPC of the EMR cluster.
Security group
Must be the same as the security group of the Master instance group of the EMR cluster. This ensures network connectivity between the ECS instance and the EMR cluster.
Create a dedicated ECS RAM role for the EMR Gateway.
Log on to the RAM console as a RAM administrator.
In the navigation pane on the left, choose .
On the Roles page, click Create Role.
In the Create Role panel, set Trusted Entity Type to Alibaba Cloud Service and Trusted Service to ECS. Then, click OK.
Enter a Role Name, such as ECSForEMRGatewayRole, and then click OK.
Grant permissions to the RAM role.
On the Permission Management tab, click Grant Permission.
In the Grant Permission panel, select AliyunEMRFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess from System Policy, and then click OK.

Click Close.
Grant the RAM role to the ECS instance.
Log on to the ECS console.
In the navigation pane on the left, choose .
In the top navigation bar, select a region.
Find the new ECS instance and choose .
In the dialog box that appears, select the ECSForEMRGatewayRole role and click OK.
Connect to the ECS instance. For more information, see Connect to an ECS instance.
Run the following command to install EMR-CLI.
regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}If the installation is successful, the following information is returned.
install emrcli successRun the following command to deploy the EMR Gateway client.
emrcli gateway deploy \ --clusterId <ClusterId> \ --appNames <ApplicationName>Modify the following parameters as needed.
Parameter
Required
Description
clusterId
Yes
The ID of the cluster created in EMR.
appNames
No
The application name. To specify multiple applications, separate them with commas (,), for example,
HDFS,YARN.If you do not specify this parameter, clients for all supported applications in the cluster, such as Hive and HDFS, are deployed by default.
If the deployment is successful, the following information is returned.
deployGateway successImportantAfter the gateway is installed, the
JAVA_HOMEsystem environment variable is changed to/usr/lib/jvm/java-1.8.0. You can modify the variable in the /etc/profile.d/emr_env.sh file. However, this modification may affect gateway features. Proceed with caution.Log on to the ECS instance again for the changes to the system environment variable to take effect.
Optional: Configure domain name resolution for the gateway node.
ImportantThis step is required if the gateway includes the Spark service.
Add a zone. For more information, see Add a built-in authoritative domain name.
Add a DNS record. For more information, see Add a DNS record.
The parameters are described in the following table.
Parameter
Description
Record Type
Use the default value A.
Host Record
Enter the hostname of the gateway machine. For example, iZ2zea8r0aht2vzbqci****.
You can run the
hostnamecommand to get the hostname.Record Value
Enter the private IP address of the gateway machine.
You can view it on the node management page.
TTL Value
Use the default value.
Manage the gateway environment
After you create the gateway, if new services are added to the associated compute cluster or its service configurations are changed, you can use the following commands to update client components or synchronize the latest configurations.
Update client components
If a new service, such as Flink, is added to the EMR cluster, you can incrementally install the corresponding client on the gateway node. The deploy command overwrites the configurations of installed applications and incrementally installs new applications.
# Example: Add the FLINK client to an environment that already has HDFS and YARN.
emrcli gateway deploy \
--clusterId <ClusterId> \
--appNames HDFS,YARN,FLINKIf the update is successful, the following information is returned.
deployGateway successSynchronize modified configurations from the EMR cluster
If the configurations of a service in the EMR cluster are changed, for example, if you modify core-site.xml in the EMR console, you must manually synchronize the new configurations to the gateway node.
Synchronizing configurations overwrites the existing configurations on the gateway. Proceed with caution.
# Execute the synchronization command.
emrcli gateway refreshConfigs \
--clusterId <ClusterId> \
--appNames <ApplicationName> # Optional. Specify the applications to synchronize.If the synchronization is successful, the following information is returned:
refreshConfiguration successManage EMR-CLI
View the EMR-CLI version
You can run the following command to view the EMR-CLI version information.
emrcli versionInformation similar to the following is returned.
2.0.0Upgrade EMR-CLI
To automatically upgrade EMR-CLI to the latest version, repeat the installation step described in the Deploy a gateway for the first time section.
FAQ
Q: How do I switch compute clusters?
A: To switch compute clusters, perform the following steps:
Use the
-mvcommand to manually back up files from the old cluster (the cluster used before the switch) to prevent data loss. The files include the/opt/appsdirectory, the/etc/taihao-appsdirectory, and the/etc/profile.d/yarn.shfile.To redeploy the compute cluster, execute the operations in this topic again.
> Instance Settings > Grant/Revoke RAM Role