All Products
Search
Document Center

E-MapReduce:Overview

Last Updated:Aug 14, 2023

Apache Kyuubi is a distributed and multi-tenant gateway that provides query services such as SQL queries for data lake query engines such as Spark, Flink, and Trino.

Features

  • Multi-tenancy: Kyuubi provides end-to-end multi-tenancy for resource acquisition and access to data or metadata by using a unified authentication or authorization layer.

  • High availability: Kyuubi supports load balancing by using ZooKeeper. This ensures an enterprise-level high availability and an unlimited high client concurrency.

  • Multiple workloads: Kyuubi can easily support multiple workloads by using one platform, one copy of data, and one SQL interface.

Scenarios

  • Interactive analytics: Kyuubi helps build an enterprise-level analytic platform for visualized interactive analytics on big data. The platform supports common computing frameworks. Kyuubi supports Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) interfaces. You can use SQL or business intelligence (BI) tools to access Kyuubi and efficiently perform queries. Kyuubi caches background engine instances at the user level to ensure computing resource sharing and quick response. This way, large amounts of data can be queried in parallel and the query results can be quickly returned.

  • Batch processing: Kyuubi provides an SQL interface for batch processing, especially for large-scale extract, transform, and load (ETL) processes. Kyuubi and its engines support independent storage and a number of data sources. Kyuubi isolates background engine instances at the connection level to improve computing resource isolation and stability.

Comparison among Kyuubi, Livy, and Spark Thrift Server

Item

Kyuubi

Livy

Spark Thrift Server

Supported interfaces

SQL and Scala

SQL, Scala, Python, and R

SQL

Supported engines

Spark, Flink, and Trino

Spark

Spark

Spark version

Spark 3.x

Spark 2.x and Spark 3.x

Built-in Spark components

Supported protocols

Thrift and JDBC

HTTP, and Thrift and JDBC

Thrift and JDBC

Client

Kyuubi Beeline

HTTP Client

Spark Beeline

High availability

Supported

Supported

Not supported

Resource isolation

Supported

Supported

Not supported

Lightweight Directory Access Protocol (LDAP) authentication

Supported

Supported

Supported

Alibaba Cloud EMR version

  • V3.42.0 and later

  • V5.8.0 and later

  • V3.40.0 and earlier

  • V5.6.0 and earlier

All versions

References