×
Community Blog How to Solve Multi-Tenant Data Isolation and Performance Problems? PolarDB for PostgreSQL Practical Guide

How to Solve Multi-Tenant Data Isolation and Performance Problems? PolarDB for PostgreSQL Practical Guide

This article provides a practical guide to multi-tenant data isolation and performance optimization using PolarDB for PostgreSQL Distributed Edition.

In the era of Software as a Service and AI, multi-tenancy architecture has become one of the core scenarios that databases must address. How to maximize hardware resource utilization and reduce O&M costs while ensuring multi-tenant data isolation is a question that every database team needs to answer. PolarDB for PostgreSQL Distributed Edition provides a mature, flexible solution that is compatible with the community edition of Citus.

Overview of PolarDB for PostgreSQL Distributed Architecture

PolarDB for PostgreSQL Distributed Edition uses a two-tier architecture that separates compute and storage. The Coordinator Node (CN) handles query logic and metadata management, and the Data Node (DN) handles the storage and compute of physical shards. The underlying PolarStore shared storage pool, combined with the multi-replica synchronization mechanism, ensures high availability while enabling on-demand billing — costs are calculated based on the single-replica edition only.

1_
PolarDB for PostgreSQL distributed architecture: consists of two layers of CN and DN, where each node corresponds to a centralized instance.

In terms of distributed table management, PolarDB for PostgreSQL Distributed Edition is fully compatible with the Citus ecosystem, including core functions such as create_distributed_table, create_reference_table, polar_cluster_move_shard_placement. Citus users can migrate at zero cost. The product also retains full PostgreSQL ecosystem compatibility, allowing all types of ORMs, graphical management tools, and drivers to be used directly.

Two Classic Patterns of Multi-tenancy Architecture

In multi-tenant scenarios, the data splitting policy directly determines the extensibility, fencing, and O&M complexity of the system. PolarDB for PostgreSQL Distributed Edition supports two mainstream partition modes:

Pattern 1: Vertical splitting — one schema per tenant

2
Vertical splitting by schema. In the graph, there are three tenants/schemas distributed on two data nodes.

The core idea of vertical splitting is to assign an independent schema to each tenant and map the schemas of different tenants to different data nodes. Typical scenarios include:

• Tens to hundreds of tenants (usually fewer than 200);

• Different tenants have different table schemas, such as optional attributes and custom attributes.

• Strong application layer fencing is required, and tenants are completely decoupled.

PolarDB for PostgreSQL Distributed Edition uses the built-in polar_cluster_schema_distribute feature to distribute schemas across DNs. Combined with the search_path configuration, the application can transparently access the specified tenant's data without explicitly specifying schemas in the business code.

-- Migrate a specific schema to the destination nodes
SELECT polar_cluster_schema_move('tenant_001_schema', 'dn-node-ip', 5432);

The core advantages of the solution include:

Table-level fencing: Each tenant has an independent schema. Table schemas, indexes, and permissions are completely isolated.

Exclusive resources: You can migrate the schemas of high-payload tenants to dedicated DN to completely eliminate resource contention.

Online migration: Schema migration is performed online without blocking read and write operations.

After the application specifies a tenant through search_path, query requests are precisely routed to a single DN, avoiding the network overhead caused by broadcast queries. The performance is close to that of standalone PostgreSQL. At the same time, you can enjoy the O&M advantages of distributed clusters and effectively use hardware resources through tenant migration.

In real customer scenarios, multiple leading enterprise-level Software as a Service customers have adopted this multi-tenancy practice, spanning industries such as the Internet, catering, gaming, and enterprise ERP.

Pattern 2: Horizontal splitting — tenant ID as the partition key

3
Horizontal splitting by tenant ID, where users and items are distributed tables partitioned by tenant_id

Horizontal splitting stores the data of all tenants in the same distributed table, uses the tenant ID (tenant_id) as the distribution key, and uses a hash algorithm to distribute the rows of different tenants across DN. Applicable scenarios include:

• A large number of tenants (tens of thousands to millions);

• All tenants have an identical table schema;

• Pursuit of optimal resource utilization and Auto Scaling capabilities.

In PolarDB for PostgreSQL, the create_distributed_table feature is used to create a distributed table:

-- Create a table and use tenant_id as the distribution key
CREATE TABLE orders (
    tenant_id   bigint NOT NULL,
    order_id    bigint NOT NULL,
    amount      decimal(10,2),
    created_at  timestamp DEFAULT now(),
    PRIMARY KEY (tenant_id, order_id)
);

-- Discretize by tenant_id
SELECT create_distributed_table('orders', 'tenant_id');

The core advantages of this solution include:

Scalability: When a new tenant is added, data is automatically routed to the appropriate DN through hash. You do not need to execute DDL statements to create new tables.

Affinity optimization: Through the colocate (affinity group) mechanism, multiple tables of the same tenant are distributed to the same node, ensuring that data is stored on the same node during cross-table JOINs and avoiding cross-node network transmission.

Replicated table assistance: Public data shared by tenants, such as configuration tables and dictionary tables, can be set as replicated tables (reference tables). Each DN stores a complete replica, and no node communication is required when performing table joins with distributed tables.

During a query, when the WHERE condition contains an equivalent condition for tenant_id, the optimizer automatically performs shard pruning to precisely locate the query to the shards that store the tenant data. The performance is the same as that of a single-node query.

-- Isolate a specific tenant to a separate shard
SELECT isolate_tenant_to_new_shard('orders', '10001', 'CASCADE');

-- Migrate the shard to the target node
SELECT polar_cluster_move_shard_placement(...);

For tenants with particularly high payloads, PolarDB for PostgreSQL Distributed Edition also provides the isolate_tenant_to_new_shard feature, which separates a specific tenant's data from the shared shard and migrates it to a dedicated DN. This ensures that even with a horizontal splitting architecture, vertical isolation level Resource Assurance can be provided for key tenants.

4
Split hot spot tenant 11 to a separate DN

In practice, multiple DingTalk businesses adopt this multi-tenancy practice, which aligns well with DingTalk's business properties — from large enterprises or organizations with tens of thousands of people to OPCs (One-Person Companies) or individual accounts, all can enjoy the complete DingTalk office suite.

Comparison of Two Patterns

Dimension Vertical splitting (Schema-per-tenant) Horizontal splitting (tenant_id partition key)
Tenant capacity Tens to hundreds Unlimited
Table schema difference Supported Must be consistent
Fencing Table-level fencing Implemented through shard Ingress
O&M complexity Medium (multiple schemas need to be managed) Low (unified table management)
Resource utilization May be unbalanced Hash discretization, load balancing
Tenant-level fencing Schema-level migration Shard-level fencing
Scenario Enterprise-level Software as a Service with customization requirements Internet Software as a Service with massive small and micro tenants

In practice, the two patterns can also be combined — horizontal splitting for most small and medium-sized tenants, and vertical splitting for key customers with customization requirements or high Service-Level Agreement requirements.

Key Attributes of Distributed Capabilities

Transactional Consistency

The PolarDB for PostgreSQL Distributed Edition uses the two-phase commit protocol (2PC) and the global timestamp mechanism to ensure the atomicity of distributed transactions across nodes. At the default Read Committed isolation level, applications can perform operations such as INSERT, UPDATE, and DELETE. After global consistency transactions are enabled, cross-node read requests also have snapshot consistency.

Parallel Data Redistribution

Data redistribution is one of the heaviest operations in distributed cluster O&M. To address the long duration of data redistribution, the PolarDB for PostgreSQL Distributed Edition supports copying data between nodes in parallel. This makes full use of the cluster I/O bandwidth and completes data redistribution as quickly as possible within a given time window, ensuring that normal traffic during peak hours is not affected.

Column Store Accelerated Analysis Query

PolarDB for PostgreSQL Distributed Edition has a built-in column store analysis DPI engine that improves analytical query performance by up to 60x. Column store data stays near-real-time consistent with row store data through asynchronous synchronization, without affecting online writes. The engine naturally adapts to multi-tenancy environments. Whether horizontally split by tenant ID or vertically split by schema, it accelerates global statistics and analysis across tenants.

CDC Data Change Synchronization

By configuring CDC (Change Data Capture), PolarDB for PostgreSQL Distributed Edition can stream data changes in real time to downstream systems such as Kafka and Elasticsearch. For CDC of distributed tables, consumers need to connect to each DN to obtain management events. For replicated tables, only the primary CN needs to be connected. With tools such as Flink CDC and Debezium, you can easily synchronize and analyze multi-tenancy data in real time.

Connection Pool

In PostgreSQL databases, connections often reach their limit, preventing new connections from being created or causing excessive memory usage that triggers out-of-memory errors and instance crashes. PolarDB for PostgreSQL Distributed Edition supports intelligent proxy and connection pool features, allowing more connections with automatic load balancing. You no longer need to worry about connection limits or out-of-memory issues.

Full-Text Index and AISearch

PolarDB for PostgreSQL distributed edition supports full-text index and AISearch features to meet common business retrieval requirements. Full-text index supports attributes such as BM25, multi-language tokenization, synonyms, and pinyin, along with a wide range of query types, and optimizes jieba memory usage. AISearch supports optimization in vector quantization, index building, query retrieval, and high-dimensional vectors. These features can be seamlessly combined with distributed capabilities and scale linearly as compute nodes are added.

Summary

PolarDB for PostgreSQL Distributed Edition provides two mature data splitting policies for multi-tenancy scenarios: vertical splitting is suitable for enterprise Software as a Service scenarios where the number of tenants is limited and table schemas may differ; horizontal splitting is suitable for scenarios with massive Internet tenants and unified table schemas, using distribution keys to automatically discretize data and enable Auto Scaling. Both support tenant-level resource isolation and can be used in combination.

With complete Citus compatibility, flexible query optimization, column store analysis acceleration, and CDC real-time synchronization capabilities, PolarDB for PostgreSQL Distributed Edition has become an ideal choice for SaaS enterprises to build multi-tenancy database services.

0 0 0
Share on

ApsaraDB

636 posts | 185 followers

You may also like

Comments