This topic describes Kyuubi engines and their share levels, and provides examples on how to start Kyuubi engines and submit jobs to Kyuubi engines.
Kyuubi engines
When you install Kyuubi in an E-MapReduce (EMR) cluster, make sure that the YARN and Spark 3.x engines are installed in advance. Kyuubi in an EMR cluster supports Spark 3.x on YARN, but does not support Flink, Trino, or Spark 2.x engines. In the following examples, Spark 3.x engines are used to describe how to manage Kyuubi engines. Each Spark 3.x engine corresponds to a Spark application on YARN. For more information, see Examples.
Share levels
To configure the share levels of Kyuubi engines, go to the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console and configure the kyuubi.engine.share.level parameter. The following table describes the details of different share levels.
Share level | Description | Scenario | Isolation degree | Sharability |
CONNECTION | One engine per session |
| High | Low |
USER | One engine per user | Medium | Medium | |
GROUP | One engine per resource group | Low | High | |
SERVER | One engine per cluster | Administrators | Highest for a high-security cluster, and lowest for a standard cluster | High-security clusters available only to administrators |
Examples
The following examples use the share level USER to describe how to manage Kyuubi engines. In the examples, the kyuubi.engine.share.level parameter is set to USER, and all users have passed Lightweight Directory Access Protocol (LDAP) authentication or Kerberos authentication.
Start a Kyuubi engine as required.
If a new user named user1 needs to use Spark 3.x engines, run the following commands. After a job is submitted by using kyuubi-beeline, the Kyuubi server starts a new Spark 3.x engine to process the job.
kyuubi-beeline -n user1 \ -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \ -f query1.sql
If another new user named user2 needs to configure the resources used by Spark 3.x engines, use one of the following methods:
Method 1: Configure resources such as Spark application executors in the Java Database Connectivity (JDBC) URL. Sample code:
# Set User config via JDBC Connection URL kyuubi-beeline -n user2 \ -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \ -f query1.sql
Method 2: Configure the resources used by Spark 3.x engines in the kyuubi-defaults.conf configuration file. Sample code:
# Set User default config in kyuubi-defatuls.conf # ___user2___.spark.dynamicAllocation.enabled=false # ___user2___.spark.executor.memory=5g # ___user2___.spark.executor.cores=2 # ___user2___.spark.executor.instances=10 kyuubi-beeline -n user2 \ -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \ -f query1.sql
Submit jobs to the specified Spark 3.x engine.
After jobs are complete, the Spark 3.x engine started by the Kyuubi server keeps running for a period of time. If you want to submit other jobs to the engine, you can directly reuse the engine without launching new YARN applications. This improves the performance of jobs or SQL queries. If no job is submitted within the period of time, the engine automatically exits the running state. The period of time during which a Spark 3.x engine keeps running is specified by the kyuubi.session.engine.idle.timeout parameter. By default, the engine waits for 30 minutes for reuse. To modify the period of time, you can change the value of this parameter from PT30M to another value on the kyuubi-defaults.conf tab of the Kyuubi service page.
Kyuubi allows you to create subdomains at the same share level. For example, if a new user named user4 needs to use different engine resources in different business scenarios, you can configure the kyuubi.engine.share.level.subdomain parameter in JDBC URLs and then submit jobs to different engines.
kyuubi-beeline -n user4 \ -u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \ -f query1.sql kyuubi-beeline -n user4 \ -u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \ -f query2.sql kyuubi-beeline -n user4 \ -u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \ -f query3.sql
Use a Spark 3.x engine in multiple Spark sessions.
Kyuubi allows a Spark 3.x engine to be used in multiple Spark sessions. For example, if the user named user1 submits two jobs from two terminals at the same time, the two jobs can use the same Spark 3.x engine for computing. Executors are allocated to the jobs based on the default scheduling rules of Spark.
# Console 1 kyuubi-beeline -n user1 \ -u "jdbc:hive2://master-1-1:10009/biz1" \ -f query1.sql # Console 2 kyuubi-beeline -n user1 \ -u "jdbc:hive2://master-1-1:10009/biz2" \ -f query2.sql