Manage Kyuubi engines - E-MapReduce - Alibaba Cloud Documentation Center

This topic describes Kyuubi engines and their share levels, and provides examples on how to start Kyuubi engines and submit jobs to Kyuubi engines.

Kyuubi engines

When you install Kyuubi in an E-MapReduce (EMR) cluster, make sure that the YARN and Spark 3.x engines are installed in advance. Kyuubi in an EMR cluster supports Spark 3.x on YARN, but does not support Flink, Trino, or Spark 2.x engines. In the following examples, Spark 3.x engines are used to describe how to manage Kyuubi engines. Each Spark 3.x engine corresponds to a Spark application on YARN. For more information, see Examples.

Share levels

To configure the share levels of Kyuubi engines, go to the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console and configure the kyuubi.engine.share.level parameter. The following table describes the details of different share levels.

Share level	Description	Scenario	Isolation degree	Sharability
CONNECTION	One engine per session	Large-scale extract, transform, and load (ETL) Ad-hoc queries	High	Low
USER	One engine per user		Medium	Medium
GROUP	One engine per resource group		Low	High
SERVER	One engine per cluster	Administrators	Highest for a high-security cluster, and lowest for a standard cluster	High-security clusters available only to administrators

Examples

The following examples use the share level USER to describe how to manage Kyuubi engines. In the examples, the kyuubi.engine.share.level parameter is set to USER, and all users have passed Lightweight Directory Access Protocol (LDAP) authentication or Kerberos authentication.

Start a Kyuubi engine as required.

If a new user named user1 needs to use Spark 3.x engines, run the following commands. After a job is submitted by using kyuubi-beeline, the Kyuubi server starts a new Spark 3.x engine to process the job.

kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

If another new user named user2 needs to configure the resources used by Spark 3.x engines, use one of the following methods:

Method 1: Configure resources such as Spark application executors in the Java Database Connectivity (JDBC) URL. Sample code:

# Set User config via JDBC Connection URL

kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \
  -f query1.sql

Method 2: Configure the resources used by Spark 3.x engines in the kyuubi-defaults.conf configuration file. Sample code:

# Set User default config in kyuubi-defatuls.conf
# ___user2___.spark.dynamicAllocation.enabled=false
# ___user2___.spark.executor.memory=5g
# ___user2___.spark.executor.cores=2
# ___user2___.spark.executor.instances=10

kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

Submit jobs to the specified Spark 3.x engine.
After jobs are complete, the Spark 3.x engine started by the Kyuubi server keeps running for a period of time. If you want to submit other jobs to the engine, you can directly reuse the engine without launching new YARN applications. This improves the performance of jobs or SQL queries. If no job is submitted within the period of time, the engine automatically exits the running state. The period of time during which a Spark 3.x engine keeps running is specified by the kyuubi.session.engine.idle.timeout parameter. By default, the engine waits for 30 minutes for reuse. To modify the period of time, you can change the value of this parameter from PT30M to another value on the kyuubi-defaults.conf tab of the Kyuubi service page.
Kyuubi allows you to create subdomains at the same share level. For example, if a new user named user4 needs to use different engine resources in different business scenarios, you can configure the kyuubi.engine.share.level.subdomain parameter in JDBC URLs and then submit jobs to different engines.
```
kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \
  -f query1.sql

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \
  -f query2.sql

 kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \
  -f query3.sql
```
Use a Spark 3.x engine in multiple Spark sessions.
Kyuubi allows a Spark 3.x engine to be used in multiple Spark sessions. For example, if the user named user1 submits two jobs from two terminals at the same time, the two jobs can use the same Spark 3.x engine for computing. Executors are allocated to the jobs based on the default scheduling rules of Spark.
```
# Console 1
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz1" \
  -f query1.sql

# Console 2
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz2" \
  -f query2.sql
```

E-MapReduce:Manage Kyuubi engines

Kyuubi engines

Share levels

Examples

References