All Products
Search
Document Center

E-MapReduce:Manage Kyuubi engines

Last Updated:Aug 14, 2023

This topic describes Kyuubi engines and their share levels, and provides examples on how to start Kyuubi engines and submit jobs to Kyuubi engines.

Kyuubi engines

When you install Kyuubi in an E-MapReduce (EMR) cluster, make sure that the YARN and Spark 3.x engines are installed in advance. Kyuubi in an EMR cluster supports Spark 3.x on YARN, but does not support Flink, Trino, or Spark 2.x engines. In the following examples, Spark 3.x engines are used to describe how to manage Kyuubi engines. Each Spark 3.x engine corresponds to a Spark application on YARN. For more information, see Examples.

Share levels

To configure the share levels of Kyuubi engines, go to the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console and configure the kyuubi.engine.share.level parameter. The following table describes the details of different share levels.

Share level

Description

Scenario

Isolation degree

Sharability

CONNECTION

One engine per session

  • Large-scale extract, transform, and load (ETL)

  • Ad-hoc queries

High

Low

USER

One engine per user

Medium

Medium

GROUP

One engine per resource group

Low

High

SERVER

One engine per cluster

Administrators

Highest for a high-security cluster, and lowest for a standard cluster

High-security clusters available only to administrators

Examples

The following examples use the share level USER to describe how to manage Kyuubi engines. In the examples, the kyuubi.engine.share.level parameter is set to USER, and all users have passed Lightweight Directory Access Protocol (LDAP) authentication or Kerberos authentication.

  1. Start a Kyuubi engine as required.

    If a new user named user1 needs to use Spark 3.x engines, run the following commands. After a job is submitted by using kyuubi-beeline, the Kyuubi server starts a new Spark 3.x engine to process the job.

    kyuubi-beeline -n user1 \
      -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
      -f query1.sql

    If another new user named user2 needs to configure the resources used by Spark 3.x engines, use one of the following methods:

    • Method 1: Configure resources such as Spark application executors in the Java Database Connectivity (JDBC) URL. Sample code:

      # Set User config via JDBC Connection URL
      
      kyuubi-beeline -n user2 \
        -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \
        -f query1.sql
    • Method 2: Configure the resources used by Spark 3.x engines in the kyuubi-defaults.conf configuration file. Sample code:

      # Set User default config in kyuubi-defatuls.conf
      # ___user2___.spark.dynamicAllocation.enabled=false
      # ___user2___.spark.executor.memory=5g
      # ___user2___.spark.executor.cores=2
      # ___user2___.spark.executor.instances=10
      
      kyuubi-beeline -n user2 \
        -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
        -f query1.sql
  2. Submit jobs to the specified Spark 3.x engine.

    After jobs are complete, the Spark 3.x engine started by the Kyuubi server keeps running for a period of time. If you want to submit other jobs to the engine, you can directly reuse the engine without launching new YARN applications. This improves the performance of jobs or SQL queries. If no job is submitted within the period of time, the engine automatically exits the running state. The period of time during which a Spark 3.x engine keeps running is specified by the kyuubi.session.engine.idle.timeout parameter. By default, the engine waits for 30 minutes for reuse. To modify the period of time, you can change the value of this parameter from PT30M to another value on the kyuubi-defaults.conf tab of the Kyuubi service page.

    Kyuubi allows you to create subdomains at the same share level. For example, if a new user named user4 needs to use different engine resources in different business scenarios, you can configure the kyuubi.engine.share.level.subdomain parameter in JDBC URLs and then submit jobs to different engines.

    kyuubi-beeline -n user4 \
      -u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \
      -f query1.sql
    
    kyuubi-beeline -n user4 \
      -u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \
      -f query2.sql
    
     kyuubi-beeline -n user4 \
      -u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \
      -f query3.sql
  3. Use a Spark 3.x engine in multiple Spark sessions.

    Kyuubi allows a Spark 3.x engine to be used in multiple Spark sessions. For example, if the user named user1 submits two jobs from two terminals at the same time, the two jobs can use the same Spark 3.x engine for computing. Executors are allocated to the jobs based on the default scheduling rules of Spark.

    # Console 1
    kyuubi-beeline -n user1 \
      -u "jdbc:hive2://master-1-1:10009/biz1" \
      -f query1.sql
    
    # Console 2
    kyuubi-beeline -n user1 \
      -u "jdbc:hive2://master-1-1:10009/biz2" \
      -f query2.sql

References