All Products
Search
Document Center

PolarDB:Global indexes

Last Updated:May 17, 2024

Global indexing is an indexing technique for partitioned tables. You can create global indexes on partitioned tables by using non-partition keys. Global indexes can provide unique constraints.

Prerequisites

The feature is supported on a PolarDB for PostgreSQL cluster that runs the following engine:

PostgreSQL 14 (revision version 14.6.4.0 or later)

Note

You can execute the following statement to view the revision version of your PolarDB for PostgreSQL cluster:

select version();

Background information

As business data increases, data partitioning becomes a key feature of enterprise-level databases and an important means to reduce data scale. A partitioned table is split into several independent sub-tables based on partition keys. The sub-tables are managed separately to improve manageability, overall performance, and load balancing.

PolarDB for PostgreSQL allows you to manage partitions in a partitioned table. Typically, partitions are managed in the dimension of time.

  • Time is the partition keys of partitioned tables.

  • New partitions are created periodically (weekly or monthly), and new data is added to new partitions.

  • Old partitions are archived on a regular basis to reduce O&M costs.

In the preceding scenario, time is the partition key, instead of the primary key or unique ID. This causes the following problems:

  • In queries on the data in non-partition keys, all partitions are scanned because it is not clear which partition the data is in.

  • If you modify the data in non-partition keys, the data may not be unique in the partitioned table.

PolarDB for PostgreSQL provides the global index feature. Global indexes are created on a partitioned table. Unlike local indexes that are created on each partition by default, global indexes are for the data of the entire partitioned table (each index corresponding to multiple partitions). This can provide globally unique constraints on non-partition keys and greatly improve the query performance of non-partition keys.

Limits

  • You still can attach or detach partitions in partitioned tables where global indexes are created.

  • To create a global index, add the GLOBAL keyword in the statement to create an index. If you do not add this keyword, a local index is created by default.

  • You can concurrently create global indexes by adding the CONCURRENTLY keyword.

  • You cannot create global indexes on non-partitioned tables or subpartitioned tables.

  • Global indexes cannot be expression indexes.

  • You cannot create global indexes on the partition key columns of partitioned tables.

Syntax

Create a global index.

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ INCLUDE ( column_name [, ...] ) ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
[ GLOBAL/LOCAL ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]

    Examples

    Accelerate non-partition key queries

    1. Create a partitioned table with a time column as the partition key.

      CREATE TABLE partition_range (
          id INT,
          a INT,
          b INT,
          created_date TIMESTAMP WITHOUT TIME ZONE
      ) PARTITION BY RANGE (created_date);
      
      CREATE TABLE partition_range_part01 PARTITION OF partition_range FOR VALUES FROM (MINVALUE) TO ('2020-01-01 00:00:00');
      CREATE TABLE partition_range_part02 PARTITION OF partition_range FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-02-01 00:00:00');
      CREATE TABLE partition_range_part03 PARTITION OF partition_range FOR VALUES FROM ('2020-02-01 00:00:00') TO ('2020-03-01 00:00:00');
    2. Query the partition table based on non-partition key conditions.

      EXPLAIN (COSTS OFF) SELECT * FROM partition_range WHERE id = 1;

      The results indicate that all partitions are scanned and that the partition pruning feature cannot be used.

                              QUERY PLAN
      ------------------------------------------------------------
       Append
         ->  Seq Scan on partition_range_part01 partition_range_1
               Filter: (id = 1)
         ->  Seq Scan on partition_range_part02 partition_range_2
               Filter: (id = 1)
         ->  Seq Scan on partition_range_part03 partition_range_3
               Filter: (id = 1)
      (7 rows)
    3. Create local indexes on the partitioned table and perform the query again.

      CREATE INDEX partition_range_idx_local ON partition_range(id);
      
      EXPLAIN (COSTS OFF) SELECT * FROM partition_range WHERE id = 1;

      The results indicate that the local indexes of all partitions are scanned because local indexes are created on individual partitions.

                                                  QUERY PLAN
      --------------------------------------------------------------------------------------------------
       Append
         ->  Index Scan using partition_range_part01_id_idx on partition_range_part01 partition_range_1
               Index Cond: (id = 1)
         ->  Index Scan using partition_range_part02_id_idx on partition_range_part02 partition_range_2
               Index Cond: (id = 1)
         ->  Index Scan using partition_range_part03_id_idx on partition_range_part03 partition_range_3
               Index Cond: (id = 1)
      (7 rows)

    4. Create a global index on the partitioned table by using the GLOBAL keyword and perform the query again.

      CREATE INDEX partition_range_idx_global ON partition_range(id) GLOBAL;
      
      EXPLAIN (COSTS OFF) SELECT * FROM partition_range WHERE id = 1;

      The results indicate that the global index is used to find the partition that contains the data.

                                    QUERY PLAN
      -----------------------------------------------------------------------
       Global Index Scan using partition_range_idx_global on partition_range
         Index Cond: (id = 1)
      (2 rows)

    Unique constraint for non-partition keys

    The partitioned table in the preceding example is still used. The partition key is created_date, but the unique constraint column is id.

    CREATE UNIQUE INDEX partition_range_id_unique_idx ON partition_range(id);
    ERROR:  unique constraint on partitioned table must include all partitioning columns
    DETAIL:  UNIQUE constraint on table "partition_range" lacks column "created_date" which is part of the partition key.
    Note

    If you create the unique constraint for a local index of a non-partition key, an error message is returned, indicating that a partition key must be included in the index.

    If you add a unique constraint to a global index, such a restriction is removed.

    CREATE UNIQUE INDEX partition_range_id_unique_idx ON partition_range(id) GLOBAL;

    Performance test

    Use pgbench to generate 80,000 rows of data, and create a partitioned table and a non-partitioned table.

    Point query performance on non-partition keys

    Item

    TPS

    Prepared Statement

    Not used

    Used

    Concurrency

    1

    32

    64

    1

    32

    64

    Common table

    27,732

    494,433

    430,848

    53,935

    985,880

    886,882

    Partitioned table and local index

    367

    4,155

    3,688

    856

    8,742

    6,790

    Partitioned table and global index

    19,006

    308,128

    262,941

    45,090

    820,924

    731,557

    TPC-B performance on non-partition keys

    Note

    Both point queries and DML are included.

    Item

    TPS

    Prepared Statement

    Not used

    Used

    Concurrency

    1

    32

    64

    1

    32

    64

    Common table

    1,115

    51,025

    60,409

    4,822

    90,312

    100,802

    Partitioned table and local index

    271

    2,903

    2,524

    550

    5,276

    4,237

    Partitioned table and global index

    Not supported

    4,334

    69,040

    75,232

    Conclusions

    Global indexes can exponentially improve performance for point queries and DML.