×
Community Blog Quick Creation of a Large Number of Partition Indexes in PostgreSQL

Quick Creation of a Large Number of Partition Indexes in PostgreSQL

This article answers the following question, “How can I quickly create indexes for all partition tables without blocking DML operations?”

By digoal

Background

Suppose I have a partition table with tens of thousands of partitions. If I need to create indexes on these partitions, what should I do?

  1. I am lazy.
  2. I'm afraid that index creation will affect my business. (DML blocking occurs if indexes are not created using the concurrently statement.)
  3. I don't want to write too many SQL statements.
  4. I want to do it quickly. As far as I know, concurrently has been improved after PostgreSQL 12. For earlier versions, if there are a large number of DML operations, indexes created using concurrently will be slow.
  5. There are many partitions. If only one do command is used, the relevant DML operations will be blocked from the beginning to the end. If it takes a long time, I am afraid of affecting my business.
  6. Each partition is small. It takes a few seconds to add indexes to a partition (a few seconds have less impact on the business).

How can I quickly create indexes for all partition tables without blocking DML operations (or by temporarily blocking DML operations)?

The answer is procedure (PostgreSQL 11 is used as an example):

set lock_timeout=xx;  -- 避免长时间等锁导致雪崩.  
loop xx..xxxx   
  create index [if not exists] [concurrently] ?;  -- 仅仅在这个过程中堵塞dml  
  commit or rollback;  -- 每创建一个索引后都结束事务, 释放这个索引相关的锁.   
end loop;  
reset lock_timeout;  

Example

If you want to create multiple indexes on one table, the steps are the same as creating indexes on multiple tables. You only need to modify the content of the stored procedure:

create table tab2(uid int, info text);  
insert into tab2 select generate_series(1,100), random()::text;  

Create a function to execute SQL statements and configure the lock timeout period:

create or replace function exec_sql(text,text) returns boolean as $$  
declare  
begin  
  execute format('set lock_timeout=%L', $1);  
  execute $2;  
  reset lock_timeout;  
  return true;  
exception when others then  
  reset lock_timeout;  
  return false;  
end;  
$$ language plpgsql strict;  

Create a stored procedure to create a bunch of indexes on tab2:

create or replace procedure do_sqls() as $$  
declare  
begin  
for i in 1..65 loop  
  if exec_sql('1s', format('create index IF NOT EXISTS idx_%s on tab2 (uid)', i)) then  
    raise notice 'success index: %', 'idx_'||i;   
    commit;  -- 每个分区索引创建后, 结束事务, 自动释放锁  
  else  
    rollback;  -- 每个分区索引创建后, 结束事务, 自动释放锁  
    raise notice 'not success, lock_timeout index: %', 'idx_'||i;   
  end if;  
  -- perform pg_sleep(5);  -- 加一个sleep可以拉长整个时间, 从而有时间窗口可以模拟dml不堵塞的情况.    
end loop;  
  -- reset lock_timeout;    
end;  
$$ language plpgsql;  

Check the notice. If there are partitions unsuccessfully added with indexes (for example, the operation may fail due to lock_timeout), repeat the process until all partitions are added with indexes.

Advantages:

  1. During operations, only the partition where indexes are being created will block DML operations related to this partition.
  2. Indexes are immediately visible after being created.

Note: If you are using versions earlier than PostgreSQL 12 and directly accessing the main table to write, there may still be blocking. Blocking will not occur if you directly access partitions where indexes are not under creation. You can also use versions later than PostgreSQL 12 to solve this problem.

0 1 0
Share on

digoal

282 posts | 24 followers

You may also like

Comments