This topic describes how to connect to Hive from an E-MapReduce (EMR) cluster.
Prerequisites
An EMR cluster is created. For more information, see Create a cluster.
Precautions
To connect to Hive from an EMR cluster, you must configure the following parameters:
<Master node name>
: You can go to the cluster details page in the EMR console and obtain the name of the master node on the Nodes tab. For more information, see Log on to a cluster.cluster-xxx@EMR.xxx.COM
: You must replacexxx
with the hostname of the master node. You can run thehostname
command on the master node to obtain the hostname.By default, HiveServer2 does not verify the username and password. If you want the username and password to be authenticated, you can enable LDAP authentication. For more information, see Use LDAP authentication.
Method 1: Use the Hive client to connect to Hive
Common cluster
hive
High-security cluster
Run the following command to perform authentication:
Run the following command to perform authentication:
kinit Username
Enter the password of the user.
Connect to Hive.
kinit -kt /etc/emr/hive-conf/keytab/hive.keytab hive/<Master node name>.cluster-xxx@EMR.xxx.COM
You can also use the user management feature to add a user. Before you connect to Beeline, run the kinit Username
command and enter the password of the user to perform authentication. For more information about how to add a user, see Manage users.
hive
Method 2: Use Beeline to connect to HiveServer2
Common cluster
beeline -u jdbc:hive2://<Master node name>:10000
Run one of the following commands based on your cluster type:
DataLake cluster
beeline -u jdbc:hive2://master-1-1:10000
Hadoop cluster
beeline -u jdbc:hive2://emr-header-1:10000
High-availability cluster
Run one of the following commands based on your cluster type:
DataLake cluster
Set serviceDiscoveryMode to zooKeeper.
beeline -u 'jdbc:hive2://master-1-1:2181,master-1-2:2181,master-1-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
Set serviceDiscoveryMode to multiServers.
beeline -u 'jdbc:hive2://master-1-1:10000,master-1-2:10000,master-1-3:10000/default;serviceDiscoveryMode=multiServers'
Hadoop cluster
Set serviceDiscoveryMode to zooKeeper.
beeline -u 'jdbc:hive2://emr-header-1:2181,emr-header-2:2181,emr-header-3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
Set serviceDiscoveryMode to multiServers.
beeline -u 'jdbc:hive2://emr-header-1:10000,emr-header-2:10000,emr-header-3:10000/default;serviceDiscoveryMode=multiServers'
High-security cluster
Run the following command to perform authentication:
kinit -kt /etc/ecm/hive-conf/hive.keytab hive/<Master node name>.cluster-xxx@EMR.xxx.COM
You can also use the user management feature to add a user. Before you connect to Beeline, run the
kinit Username
command and enter the password of the user to perform authentication. For more information about how to add a user, see Manage users.Run the following command to perform authentication:
kinit Username
Enter the password of the user.
Connect to HiveServer2.
beeline -u "jdbc:hive2://<Master node name>:10000/;principal=hive/<Master node name>.cluster-xxx@EMR.xxx.COM"
NoteThe
JDBC
URL must be enclosed in a pair of double quotation marks (").
Method 3: Use Java to connect to HiveServer2
Before you perform the following steps, make sure that you have set up a Java environment, installed a Java programming tool, and configured environment variables.
Configure the project dependencies hadoop-common and hive-jdbc in the pom.xml file. Example:
<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.3.9</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.8.5</version> </dependency> </dependencies>
Write code to connect to HiveServer2 and perform operations on data of a Hive table. Sample code:
import java.sql.*; public class App { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); } // 1. After the code is packaged to a JAR file, you must map master-1-1 to // the public or internal IP address of the EMR cluster in the hosts file on the host for running the JAR file. // 2. For more JDBC connection strings, see "Method 2: Use Beeline to connect to HiveServer2". Connection con = DriverManager.getConnection( "jdbc:hive2://master-1-1:10000", "root", ""); Statement stmt = con.createStatement(); String sql = "select * from sample_tbl limit 10"; ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } } }
Package the project to generate a JAR file and upload the JAR file to the host for running the JAR file.
ImportantThe hadoop-common and hive-jdbc dependencies are required to run the JAR file. If the two dependencies are not configured in the environment variables on the host, you must download the dependencies and configure the environment variables on the host. Alternatively, you can package the two dependencies and the project to the same JAR file. If one of the dependencies is missing when you run the JAR file, an error message appears.
If hadoop-common is missing, the error message
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
appears.If hive-jdbc is missing, the error message
java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
appears.
In this example, the JAR file emr-hiveserver2-1.0.jar is generated and uploaded to the master-1-1 node of the EMR cluster.
Check whether the JAR file can run properly.
ImportantWe recommend that you run the JAR file on a host that is in the same virtual private cloud (VPC) and security group as the EMR cluster. Make sure that the host and the EMR cluster can communicate with each other. If the host and the EMR cluster are in different VPCs or of different network types, they can communicate only over the Internet. In this case, you can also connect them by using an Alibaba Cloud network service. This way, they can communicate over an internal network. Use the following methods to test the connectivity:
Internet:
telnet Public IP address of the master-1-1 node 10000
Internal network:
telnet Internal IP address of the master-1-1 node 10000
java -jar emr-hiveserver2-1.0.jar