A gateway is used to submit jobs to E-MapReduce (EMR) clusters and isolate clusters in a secure manner. EMR provides the EMR-CLI tool that you can use to deploy a gateway on an Alibaba Cloud Elastic Compute Service (ECS) instance. You can perform operations in this topic to deploy a gateway for DataLake, Dataflow, or online analytical processing (OLAP) clusters.
Prerequisites
A DataLake cluster, a Dataflow cluster, or an OLAP cluster is created, and the cluster is in the Running state. For more information about how to create a cluster, see Create a cluster.
Limits
This topic is applicable only to scenarios in which you want to deploy gateways for DataLake clusters, Dataflow clusters, and OLAP clusters.
For more information about how to deploy gateways for Hadoop and Kafka clusters, see Create a gateway cluster.
NoteIf this is the first time you create an EMR cluster after 17:00 (UTC+8) on December 19, 2022, you cannot create a Hadoop or Kafka cluster.
We recommend that you do not deploy a gateway on an ECS instance that hosts an EMR cluster. Otherwise, the environment in which the EMR cluster runs is affected by the gateway.
A gateway is deployed in overwrite mode by using EMR-CLI. If you deploy a gateway on an ECS instance on which another gateway already exists, the new gateway is installed in the same directory as the original gateway, and the original gateway is overwritten.
You can perform operations described in this topic to deploy a gateway for the following services: HDFS, YARN, HBase, Hive, Spark 2, Spark 3, JindoSDK, Flink, Sqoop, Impala, Presto, Hudi, Iceberg, Tez, and Delta Lake.
Deploy a gateway
Create an ECS instance in the ECS console. For more information, see Create an instance on the Custom Launch tab.
NoteThe created ECS instance does not need to be accessible over the Internet.
The following table describes the parameter settings when you create an ECS instance.
Parameter
Description
Region and Zone
You must create an ECS instance in the same region and zone as your EMR cluster.
Image
The operating system specified by this parameter must match that of the EMR cluster.
System Disk
We recommend that you use an enhanced SSD (ESSD) that has a storage capacity of at least 60 GiB.
Network Type
You must select the virtual private cloud (VPC) in which the EMR cluster resides.
Security Group
You must select the security group to which the master node group of the EMR cluster belongs. This ensures that a network connection between the ECS instance and the EMR cluster is established.
Create a dedicated ECS RAM role.
Log on to the RAM console with an Alibaba Cloud account or a RAM user who has administrative rights.
In the left-side navigation pane, choose
.On the Roles page, click Create Role.
In the Select Role Type step of the Create Role panel, select Alibaba Cloud Service for Select Trusted Entity and click Next. In the Configure Role step, configure the RAM Role Name parameter, select Elastic Compute Service from the Select Trusted Service drop-down list, and then click OK. For example, you can set RAM Role Name to ECSForEMRGatewayRole.
Attach policies to the RAM role.
In the Finish step of the Create Role panel, click Add Permissions to RAM Role.
On the page that appears, click Grant Permission.
In the Grant Permission panel, select System Policy and then the AliyunEMRFullAccess, AliyunOSSFullAccess, and AliyunDLFFullAccess policies and click OK.
Click Complete.
Assign the RAM role to the ECS instance.
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region where the ECS instance resides.
Find the ECS instance, click the icon in the Actions column, and then click Attach/Detach RAM Role in the Instance Settings section.
In the Attach/Detach RAM Role dialog box, select ECSForEMRGatewayRole from the RAM Role drop-down list and click Confirm.
Connect to the ECS instance. For more information, see Connect to an instance.
Run the following command to install EMR-CLI:
regionId=`curl http://100.100.100.200/latest/meta-data/region-id`; curl https://ecm-repo-${regionId}.oss-${regionId}-internal.aliyuncs.com/emrcli/emrcli.sh -o /tmp/emrcli.sh; chmod 755 /tmp/emrcli.sh; sh /tmp/emrcli.sh install ${regionId}
If EMR-CLI is successfully installed, the following result is returned:
install emrcli success
Run the following command to deploy the gateway:
emrcli gateway deploy \ --clusterId <ClusterId> \ --appNames <ApplicationName>
Configure the parameters based on your business requirements. The following table describes the parameters.
Parameter
Required
Description
clusterId
Yes
The ID of the EMR cluster.
appNames
No
The name of a service. Separate multiple services with commas (,), such as
HDFS,YARN
.If no service is specified, the gateway is deployed for all supported services of the EMR cluster, such as Hive and HDFS.
If the gateway is successfully deployed, the following result is returned:
deployGateway success
ImportantAfter the gateway is deployed, the value of the
JAVA_HOME
system environment variable is changed to/usr/lib/jvm/java-1.8.0
. You can change the value of the variable in the /etc/profile.d/emr_env.sh file. However, the change may affect the features of the gateway. Proceed with caution.Log on to the ECS instance to make the system environment variable take effect.
Optional Configure domain name resolution for the gateway.
ImportantThis step is required when the gateway contains the Spark service.
Add a zone. For more information, see Add a built-in authoritative zone.
Add a DNS record. For more information, see Add DNS records.
The following table describes the parameters.
Parameter
Description
Record Type
The type of the DNS record. Use the default value A.
Hostname
The hostname of the gateway, such as iZ2zea8r0aht2vzbqci****.
You can run the
hostname
command to obtain the hostname of the gateway.Record Value
The internal IP address of the gateway.
You can view the internal IP address of the gateway on the Nodes tab of the new EMR console.
TTL Period
The time-to-live period. Use the default value.
Manage the gateway
If new services are added to the EMR cluster that is associated with the gateway or service configurations of the EMR cluster are modified, you can add the services or synchronize the most recent configurations of services to the gateway.
Add a service to the gateway
If a new service is added to the EMR cluster, you can use EMR-CLI to add the service to the gateway.
The command that is used to add a service to the gateway is similar to the command that is used to deploy the gateway. You need to specify the name of the service that you want to add to the gateway in the appNames parameter. Existing services are not affected.
emrcli gateway deploy \
--clusterId <ClusterId> \
--appNames <ApplicationName>
If the service is successfully added to the gateway, the following result is returned:
deployGateway success
Synchronize the modified configurations of a service for the EMR cluster to the gateway
If you modify the configurations of a service for the EMR cluster, such as the configurations of the core-site.xml file, you can use EMR-CLI to synchronize the modified configurations to the gateway.
When the modified configurations are synchronized, the original configurations of the service for the gateway are overwritten. Proceed with caution.
emrcli gateway refreshConfigs \
--clusterId <ClusterId> \
--appNames <ApplicationName>
If the modified configurations are successfully synchronized to the gateway, the following result is returned:
refreshConfiguration success
Manage EMR-CLI
View the version of EMR-CLI
You can run the following command to view the version of EMR-CLI:
emrcli version
If the command is successfully run, the result similar to the following information is returned:
2.0.0
Update EMR-CLI
You can perform Step 6 in Deploy a gateway to update EMR-CLI to the latest version.
FAQ
Q: How do I switch to another cluster and deploy a gateway for the cluster?
A: Perform the following steps to switch to another cluster and deploy a gateway for the cluster:
Run the
-mv
command to back up files in the original cluster to prevent data loss. The files include files in the/opt/apps
and/etc/taihao-apps
directories, and the/etc/profile.d/yarn.sh
file.Perform the operations in this topic again to deploy a gateway for the new cluster.