After you register an E-MapReduce (EMR) cluster to DataWorks, you can configure the Kyuubi connection information for the EMR cluster based on your business requirements. You can use a pair of custom username and password to log on to Kyuubi to run related tasks. This topic describes how to configure the Kyuubi connection information for an EMR cluster in DataWorks.
Background information
Apache Kyuubi is a distributed and multi-tenant gateway that provides query services such as SQL queries for data lake query engines. The data lake query engines include Spark, Flink, and Trino. For more information, see Overview.
Prerequisites
An EMR cluster is registered to DataWorks. For more information, see Register an EMR cluster to DataWorks.
Configure the Kyuubi connection information
Go to the Kyuubi configuration page.
Go to the Management Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, click Cluster Management. The Cluster Management page appears.
Find the desired EMR cluster, click the . The Kyuubi configuration page appears.
Configure the Kyuubi connection information.
Follow the on-screen instructions to set the Connection Mode parameter based on your business requirements.
Connection Information of Alibaba Cloud EMR Cluster: If you select this connection mode, the default access identity that is specified when you register the EMR cluster is used to log on to Kyuubi. This mode is selected by default.
Custom Configuration Information: If you select this connection mode, a pair of custom username and password is used to log on to Kyuubi. The value for the JDBC URL parameter is in the
jdbc:hive2://host:port/;user=<Username for logon>;password=<Password for logon>
format.NoteThe first time you select Custom Configuration Information, the value of the JDBC URL parameter is automatically filled based on the account information that you configure when you register the EMR cluster. You can modify the JDBC URL based on your business requirements.
If you select Pass Proxy User Information when you register the EMR cluster, the configuration information of
hive.server2.proxy.user
is concatenated to the JDBC URL after an EMR task is run in DataWorks. Concatenation rules:If the placeholder
DATAWORKS_PROXY_USER
is not specified in the JDBC URL for the custom configuration information, the platform concatenates the configuration information ofhive.server2.proxy.user
at the end of the JDBC URL by default when the EMR task is executed.If the placeholder
DATAWORKS_PROXY_USER
is specified in the JDBC URL for the custom configuration information, the platform dynamically replaces the placeholder with the configuration information ofhive.server2.proxy.user
when the EMR task is executed.
What to do next
For information about how to configure relevant component environments and perform data development operations in DataWorks, see General development process.