An Interpretation of PolarDB-X Source Codes (1): CN Code Structure

This article focuses on the code structure of GalaxySQL (the PolarDB-X CN computing layer). It will briefly review the PolarDB-X architecture, introduce the functions of each module in the light of directories, and list some key interfaces for readers to debug the code.

Overall Architecture

The PolarDB-X consists of four core components. Compute Node (CN) is responsible for computing. Data Node (DN) is responsible for storing. Global Meta Service (GMS) is responsible for managing metadata and providing TSO services, and Change Data Capture (CDC) is responsible for generating change journals. Among them, CN serves as the service portal and completes three tasks:

Receive user requests through the MySQL protocol and return results
As a distributed query engine, it is compatible with MySQL syntax and provides features (such as distributed transactions, global indexes, and MPP).
Interact with DN through RPC protocol, issue read and write instructions, and summarize results

The following section illustrates the roles of code, directory, and modules in conjunction with the three tasks of CN and offers some key interfaces for readers to explore.

An Introduction to the Code

The CN code is managed on GitHub and divided into two warehouses, GalaxySQL and GalaxyGlue. Protocol implementation and distributed query engine of MySQL are included in the GalaxySQL warehouse. Due to the License, the RPC protocol-related code that interacts with DN is placed in GalaxyGlue separately.

Before debugging codes, the codes of the two warehouses need to be downloaded. We recommend downloading the GalaxySQL code first and introducing GalaxyGlue through the Git submodule, which allows the changes to the two warehouses to be submitted independently. Please see the contributing documentation for more information.

CN is a multi-module Java project. Services are exposed between modules through interfaces. Module relationships are recorded in pom.xml. You can view all dependencies with mvn dependency:tree commands.

Some interfaces use the SPI mechanism. This part of the interface needs to view the specific implementation used in the current module in the src/main/resources/META-INF/polardbx directory of the module.

Please see compilation/initialization documentation for more information about compilation and packaging. The main method of CN is in the TddlLauncher class of the polardbx-server module. Tddl stands for Taobao Distributed Data Layer and originated in the PolarDB -X 0.5 era. It has been retained since the associated system is dependent on its class name.

Directories and Modules

Many directories and files can be seen on the GalaxySQL project homepage. With the GalaxyGlue added to the project after renaming it as polardbx-rpc, the project root directory contains 13 folders and nine files. The code directory starts with polardbx. The docke_build.sh is used to generate a Docker image locally. The pom.xml is the project file of the entire project. The saveVersion.sh is used to generate a version number suffix when you type an RPM package.

In the root directory, each directory that starts with polardbx represents a separate module. Each module will be introduced below:

Table_1

How to Get Started

PolarDB-X is a complex system with many codes, interfaces, and modules. Reading codes requires some skills. We recommend going through the code from beginning to end from a bigger picture and then going back to check the finer details.

Overall Understanding

Like all SQL databases, CN can be divided into the protocol layer, optimizer, and executor. Reading codes can start with the input and output of each layer to understand the overall process from the user initiating read and write requests to receive the results. Some key interfaces of each layer are introduced below.

Protocol Layer

The protocol layer implements the MySQL protocol, which is responsible for establishing connections, receiving data packages sent by users, assembling SQL statements, and passing parameters to the optimizer. According to the function, the protocol layer codes can be divided into connection management, package parsing, and protocol parsing. The connection management and package parsing codes are in the polardbx-net module, and the protocol parsing codes are in the polardbx-server module.

You can learn about the connection management code by establishing a connection. The entry is in NIOAcceptor#accept.
Package parsing is the process of converting data into protocol data objects. We recommend starting with the text protocol. The entry is in AbstractConnection#read.
Protocol parsing is the process of distributing protocol data objects to specific execution logic. The entry is in the FrontendCommandHandler#handle.

Optimizer

The processing of SQL includes syntax parsing, validating, generating logical plans, optimizing logical plans, and optimizing physical plans. Optimizing produces physical execution plans by passing them to executors. The optimizer uses the Apache Calcite RBO/CBO framework. Therefore, the optimizer framework code is stored in the polardbx-calcite module and implemented in the polardbx-optimizer module. The optimizer is located in Planner#plan. The key interfaces of each step are listed below.

Steps	Interfaces
Parsing Syntax	FastsqlParser#parse
Verification	SqlConverter#validate
Logical Plan Generation	SqlConverter#toRel
Logical Plan Optimization	Planner#optimizeBySqlWriter
Physical Plan Optimization	Planner#optimizeByPlanEnumerator

Executor

After receiving the physical execution plan, the executor determines the execution mode according to the plan type, including cursor/local/mpp. The corresponding execution code for each operator may be different in different row modes. Therefore, you need to bind the operator to the execution code. The execution process communicates with the DN through the RPC interface, issues read and write requests, and summarizes the results. The key interfaces are listed below.

Steps	Interfaces
Executor Entry	PlanExecutor#execute
Execution Mode Selection	ExecutorHelper#execute
The cursor mode operators are bound to the execution code.	AbstractGroupExecutor#executeInner
The local mode operators are bound to the execution code.	LocalExecutionPlanner#plan
The mpp mode splits the execution plan.	PlanFragmenter.Fragmenter#buildRootFragment
Cursor mode communicates with DN.	MyJdbcHandler
Local/mpp mode communicates with DN.	TableScanClient

In-Depth Understanding

If you want to understand the module code in depth, a better way should be to look at it with questions. For example, if you want to understand the protocol layer code, first think about the processing of the simplest query select 1. Combined with the previous module and interface introduction, it is easy to get the answer after tracking and debugging. Then, continue thinking about the differences between other statements/protocol types (such as SET and Prepared Statement). What should I do if the received/returned package is too large? How do I handle ssl? Note: Although it is an in-depth understanding, it is still recommended to distinguish the priority in the reading process. We recommend focusing on the packages listed in the Directories and Modules section and skimming the content in other packages.

Summary

This article mainly introduces the compute node code involving the two warehouses of GalaxySQL and GalaxyRPC. The purpose is to help readers quickly understand the overall structure of the CN code. Functionally, CN completes three tasks: protocol processing, query optimization, and interaction with DN. Therefore, the code can be divided into the protocol layer, optimizer, and executor. The article introduces the organization of code engineering and lists the relevant documents for compilation and debugging. Then, the corresponding functions of each directory module are explained to facilitate the quick positioning of readers who need in-depth knowledge of the code. Finally, the code reading suggestions are given, and the key interfaces in each module are listed for readers to debug.

Community

An Interpretation of PolarDB-X Source Codes (1): CN Code Structure

Overall Architecture

An Introduction to the Code

Directories and Modules

How to Get Started

Overall Understanding

Protocol Layer

Optimizer

Executor

In-Depth Understanding

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

PolarDB for PostgreSQL

PolarDB for Xscale

AnalyticDB for MySQL