Apache Kyuubi is a distributed and multi-tenant gateway that provides query services such as SQL queries for data lake query engines such as Spark, Flink, and Trino.
Features
Multi-tenancy: Kyuubi provides end-to-end multi-tenancy for resource acquisition and access to data or metadata by using a unified authentication or authorization layer.
High availability: Kyuubi supports load balancing by using ZooKeeper. This ensures an enterprise-level high availability and an unlimited high client concurrency.
Multiple workloads: Kyuubi can easily support multiple workloads by using one platform, one copy of data, and one SQL interface.
Scenarios
Interactive analytics: Kyuubi helps build an enterprise-level analytic platform for visualized interactive analytics on big data. The platform supports common computing frameworks. Kyuubi supports Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) interfaces. You can use SQL or business intelligence (BI) tools to access Kyuubi and efficiently perform queries. Kyuubi caches background engine instances at the user level to ensure computing resource sharing and quick response. This way, large amounts of data can be queried in parallel and the query results can be quickly returned.
Batch processing: Kyuubi provides an SQL interface for batch processing, especially for large-scale extract, transform, and load (ETL) processes. Kyuubi and its engines support independent storage and a number of data sources. Kyuubi isolates background engine instances at the connection level to improve computing resource isolation and stability.
Comparison among Kyuubi, Livy, and Spark Thrift Server
Item | Kyuubi | Livy | Spark Thrift Server |
Supported interfaces | SQL and Scala | SQL, Scala, Python, and R | SQL |
Supported engines | Spark, Flink, and Trino | Spark | Spark |
Spark version | Spark 3.x | Spark 2.x and Spark 3.x | Built-in Spark components |
Supported protocols | Thrift and JDBC | HTTP, and Thrift and JDBC | Thrift and JDBC |
Client | Kyuubi Beeline | HTTP Client | Spark Beeline |
High availability | Supported | Supported | Not supported |
Resource isolation | Supported | Supported | Not supported |
Lightweight Directory Access Protocol (LDAP) authentication | Supported | Supported | Supported |
Alibaba Cloud EMR version |
|
| All versions |