This article focuses on the code structure of GalaxySQL (the PolarDB-X CN computing layer). It will briefly review the PolarDB-X architecture, introduce the functions of each module in the light of directories, and list some key interfaces for readers to debug the code.
The PolarDB-X consists of four core components. Compute Node (CN) is responsible for computing. Data Node (DN) is responsible for storing. Global Meta Service (GMS) is responsible for managing metadata and providing TSO services, and Change Data Capture (CDC) is responsible for generating change journals. Among them, CN serves as the service portal and completes three tasks:
The following section illustrates the roles of code, directory, and modules in conjunction with the three tasks of CN and offers some key interfaces for readers to explore.
The CN code is managed on GitHub and divided into two warehouses, GalaxySQL and GalaxyGlue. Protocol implementation and distributed query engine of MySQL are included in the GalaxySQL warehouse. Due to the License, the RPC protocol-related code that interacts with DN is placed in GalaxyGlue separately.
Before debugging codes, the codes of the two warehouses need to be downloaded. We recommend downloading the GalaxySQL code first and introducing GalaxyGlue through the Git submodule, which allows the changes to the two warehouses to be submitted independently. Please see the contributing documentation for more information.
CN is a multi-module Java project. Services are exposed between modules through interfaces. Module relationships are recorded in pom.xml. You can view all dependencies with mvn dependency:tree
commands.
Some interfaces use the SPI mechanism. This part of the interface needs to view the specific implementation used in the current module in the src/main/resources/META-INF/polardbx
directory of the module.
Please see compilation/initialization documentation for more information about compilation and packaging. The main method of CN is in the TddlLauncher class of the polardbx-server module. Tddl stands for Taobao Distributed Data Layer and originated in the PolarDB -X 0.5 era. It has been retained since the associated system is dependent on its class name.
Many directories and files can be seen on the GalaxySQL project homepage. With the GalaxyGlue added to the project after renaming it as polardbx-rpc, the project root directory contains 13 folders and nine files. The code directory starts with polardbx. The docke_build.sh is used to generate a Docker image locally. The pom.xml is the project file of the entire project. The saveVersion.sh is used to generate a version number suffix when you type an RPM package.
In the root directory, each directory that starts with polardbx represents a separate module. Each module will be introduced below:
PolarDB-X is a complex system with many codes, interfaces, and modules. Reading codes requires some skills. We recommend going through the code from beginning to end from a bigger picture and then going back to check the finer details.
Like all SQL databases, CN can be divided into the protocol layer, optimizer, and executor. Reading codes can start with the input and output of each layer to understand the overall process from the user initiating read and write requests to receive the results. Some key interfaces of each layer are introduced below.
The protocol layer implements the MySQL protocol, which is responsible for establishing connections, receiving data packages sent by users, assembling SQL statements, and passing parameters to the optimizer. According to the function, the protocol layer codes can be divided into connection management, package parsing, and protocol parsing. The connection management and package parsing codes are in the polardbx-net module, and the protocol parsing codes are in the polardbx-server module.
The processing of SQL includes syntax parsing, validating, generating logical plans, optimizing logical plans, and optimizing physical plans. Optimizing produces physical execution plans by passing them to executors. The optimizer uses the Apache Calcite RBO/CBO framework. Therefore, the optimizer framework code is stored in the polardbx-calcite module and implemented in the polardbx-optimizer module. The optimizer is located in Planner#plan. The key interfaces of each step are listed below.
Steps | Interfaces |
Parsing Syntax | FastsqlParser#parse |
Verification | SqlConverter#validate |
Logical Plan Generation | SqlConverter#toRel |
Logical Plan Optimization | Planner#optimizeBySqlWriter |
Physical Plan Optimization | Planner#optimizeByPlanEnumerator |
After receiving the physical execution plan, the executor determines the execution mode according to the plan type, including cursor/local/mpp. The corresponding execution code for each operator may be different in different row modes. Therefore, you need to bind the operator to the execution code. The execution process communicates with the DN through the RPC interface, issues read and write requests, and summarizes the results. The key interfaces are listed below.
Steps | Interfaces |
Executor Entry | PlanExecutor#execute |
Execution Mode Selection | ExecutorHelper#execute |
The cursor mode operators are bound to the execution code. | AbstractGroupExecutor#executeInner |
The local mode operators are bound to the execution code. | LocalExecutionPlanner#plan |
The mpp mode splits the execution plan. | PlanFragmenter.Fragmenter#buildRootFragment |
Cursor mode communicates with DN. | MyJdbcHandler |
Local/mpp mode communicates with DN. | TableScanClient |
If you want to understand the module code in depth, a better way should be to look at it with questions. For example, if you want to understand the protocol layer code, first think about the processing of the simplest query select 1
. Combined with the previous module and interface introduction, it is easy to get the answer after tracking and debugging. Then, continue thinking about the differences between other statements/protocol types (such as SET and Prepared Statement). What should I do if the received/returned package is too large? How do I handle ssl? Note: Although it is an in-depth understanding, it is still recommended to distinguish the priority in the reading process. We recommend focusing on the packages listed in the Directories and Modules section and skimming the content in other packages.
This article mainly introduces the compute node code involving the two warehouses of GalaxySQL and GalaxyRPC. The purpose is to help readers quickly understand the overall structure of the CN code. Functionally, CN completes three tasks: protocol processing, query optimization, and interaction with DN. Therefore, the code can be divided into the protocol layer, optimizer, and executor. The article introduces the organization of code engineering and lists the relevant documents for compilation and debugging. Then, the corresponding functions of each directory module are explained to facilitate the quick positioning of readers who need in-depth knowledge of the code. Finally, the code reading suggestions are given, and the key interfaces in each module are listed for readers to debug.
VLDB 2022 – Alibaba Cloud Dr. Jiong Xie Paper Session About Ganos
[Infographic] Highlights | Database New Feature in September
ApsaraDB - October 24, 2022
ApsaraDB - October 24, 2022
ApsaraDB - October 24, 2022
ApsaraDB - October 25, 2022
ApsaraDB - November 1, 2022
ApsaraDB - October 25, 2022
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreMore Posts by ApsaraDB