本文介绍pg_pathman插件的一些常见用法。
背景信息
为了提高分区表的性能,PolarDB PostgreSQL版引入pg_pathman插件。该插件是一款分区管理插件,提供了分区优化机制。
创建pg_pathman插件
如您需要使用pg_pathman插件的分区管理功能,请联系我们操作。
CREATE EXTENSION IF NOT EXISTS pg_pathman;
创建插件成功后可通过以下SQL语句查看版本:
SELECT extname,extversion FROM pg_extension WHERE extname = 'pg_pathman';
返回结果如下:
extname | extversion
------------+------------
pg_pathman | 1.5
(1 row)
插件升级
PolarDB PostgreSQL版会定期对插件进行升级,以提供更优质的数据库服务。而当您需要升级插件版本时,需要将集群升级到最新版本。
插件特性
目前支持HASH分区、RANGE分区。
支持自动分区管理(通过函数接口创建分区,自动将主表数据迁移到分区表),或手工分区管理(通过函数实现,将已有的表绑定到分区表,或者从分区表剥离)。
支持的分区字段类型包括int、float、date以及其他常用类型,包括自定义的domain。
有效的分区表查询计划(JOINs、subselects 等)。
使用
RuntimeAppend
&RuntimeMergeAppend
自定义计划节点实现了动态分区选择。PartitionFilter
是一种有效的插入触发器替换方法。支持自动新增分区(目前仅支持RANGE分区表)。
支持
copy from/to
直接读取或写入分区表,提高效率。支持分区字段的更新,需要添加触发器,如果不需要更新分区字段,则不建议添加这个触发器,对应会产生一定的性能影响。
允许用户自定义回调函数,在创建分区时会自动触发。
非堵塞式创建分区表,以及后台自动将主表数据非堵塞式迁移到分区表。
支持FDW,通过配置参数
pg_pathman.insert_into_fdw=(disabled | postgres | any_fdw)
支持postgres_fdw或任意FDW。
插件用法
更多用法,请参见GitHub。
相关视图和表
pg_pathman利用函数来维护分区表,并创建了一些视图,以便于查看分区表的状态。具体如下:
pathman_config
CREATE TABLE IF NOT EXISTS pathman_config ( partrel REGCLASS NOT NULL PRIMARY KEY, -- 主表oid attname TEXT NOT NULL, -- 分区列名 parttype INTEGER NOT NULL, -- 分区类型(hash or range) range_interval TEXT, -- range分区的interval CHECK (parttype IN (1, 2)) /* check for allowed part types */ );
pathman_config_params
CREATE TABLE IF NOT EXISTS pathman_config_params ( partrel REGCLASS NOT NULL PRIMARY KEY, -- 主表oid enable_parent BOOLEAN NOT NULL DEFAULT TRUE, -- 是否在优化器中过滤主表 auto BOOLEAN NOT NULL DEFAULT TRUE, -- insert时是否自动扩展不存在的分区 init_callback REGPROCEDURE NOT NULL DEFAULT 0); -- create partition时的回调函数oid
pathman_concurrent_part_tasks
-- helper SRF function CREATE OR REPLACE FUNCTION show_concurrent_part_tasks() RETURNS TABLE ( userid REGROLE, pid INT, dbid OID, relid REGCLASS, processed INT, status TEXT) AS 'pg_pathman', 'show_concurrent_part_tasks_internal' LANGUAGE C STRICT; CREATE OR REPLACE VIEW pathman_concurrent_part_tasks AS SELECT * FROM show_concurrent_part_tasks();
pathman_partition_list
-- helper SRF function CREATE OR REPLACE FUNCTION show_partition_list() RETURNS TABLE ( parent REGCLASS, partition REGCLASS, parttype INT4, partattr TEXT, range_min TEXT, range_max TEXT) AS 'pg_pathman', 'show_partition_list_internal' LANGUAGE C STRICT; CREATE OR REPLACE VIEW pathman_partition_list AS SELECT * FROM show_partition_list();
分区管理
RANGE分区
有四个管理函数用来创建范围分区。其中两个可以指定起始值、间隔、分区个数,其函数定义如下:
create_range_partitions(relation REGCLASS, -- 主表OID
attribute TEXT, -- 分区列名
start_value ANYELEMENT, -- 开始值
p_interval ANYELEMENT, -- 间隔;任意类型,适合任意类型的分区表
p_count INTEGER DEFAULT NULL, -- 分多少个区
partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
create_range_partitions(relation REGCLASS, -- 主表OID
attribute TEXT, -- 分区列名
start_value ANYELEMENT, -- 开始值
p_interval INTERVAL, -- 间隔;interval 类型,用于时间分区表
p_count INTEGER DEFAULT NULL, -- 分多少个区
partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
另外两个可以指定起始值、终值、间隔,其定义如下:
create_partitions_from_range(relation REGCLASS, -- 主表OID
attribute TEXT, -- 分区列名
start_value ANYELEMENT, -- 开始值
end_value ANYELEMENT, -- 结束值
p_interval ANYELEMENT, -- 间隔;任意类型,适合任意类型的分区表
partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
create_partitions_from_range(relation REGCLASS, -- 主表OID
attribute TEXT, -- 分区列名
start_value ANYELEMENT, -- 开始值
end_value ANYELEMENT, -- 结束值
p_interval INTERVAL, -- 间隔;interval 类型,用于时间分区表
partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
示例如下所示:
创建需要分区的主表,并插入测试数据。
--- 创建需要分区的主表 CREATE TABLE part_test(id int, info text, crt_time timestamp not null); -- 分区列必须有not null约束 --- 插入测试数据,模拟已存在数据的主表 INSERT INTO part_test SELECT id,md5(random()::text),clock_timestamp() + (id||' hour')::interval from generate_series(1,10000) t(id);
查询主表的数据情况:
SELECT * FROM part_test limit 10;
返回结果如下:
id | info | crt_time ----+----------------------------------+---------------------------- 1 | 36fe1adedaa5b848caec4941f87d443a | 2016-10-25 10:27:13.206713 2 | c7d7358e196a9180efb4d0a10269c889 | 2016-10-25 11:27:13.206893 3 | 005bdb063550579333264b895df5b75e | 2016-10-25 12:27:13.206904 4 | 6c900a0fc50c6e4da1ae95447c89dd55 | 2016-10-25 13:27:13.20691 5 | 857214d8999348ed3cb0469b520dc8e5 | 2016-10-25 14:27:13.206916 6 | 4495875013e96e625afbf2698124ef5b | 2016-10-25 15:27:13.206921 7 | 82488cf7e44f87d9b879c70a9ed407d4 | 2016-10-25 16:27:13.20693 8 | a0b92547c8f17f79814dfbb12b8694a0 | 2016-10-25 17:27:13.206936 9 | 2ca09e0b85042b476fc235e75326b41b | 2016-10-25 18:27:13.206942 10 | 7eb762e1ef7dca65faf413f236dff93d | 2016-10-25 19:27:13.206947 (10 rows)
创建分区,每个分区包含1个月的跨度数据 。
--- 创建分区,每个分区包含1个月的跨度数据 SELECT create_range_partitions('part_test'::regclass, -- 主表OID 'crt_time', -- 分区列名 '2016-10-25 00:00:00'::timestamp, -- 开始值 interval '1 month', -- 间隔;interval 类型,用于时间分区表 24, -- 分多少个区 false) ; -- 不迁移数据
使用非堵塞式的迁移接口,迁移主表数据。
--- 未迁移数据时,数据还在主表 SELECT count(*) FROM ONLY part_test; count ------- 10000 (1 row) --- 非堵塞式的迁移接口 partition_table_concurrently(relation REGCLASS, -- 主表OID batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录 sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。 --- 使用非堵塞式的迁移接口,迁移主表数据 SELECT partition_table_concurrently('part_test'::regclass, 10000, 1.0); --- 迁移结束后,主表数据已不存在,所有数据均已转移至分区中 SELECT count(*) FROM ONLY part_test; count ------- 0 (1 row)
数据迁移完成后,建议禁用主表,以避免在执行计划中出现主表。
--- 禁用主表 SELECT set_enable_parent('part_test'::regclass, false); --- 验证 EXPLAIN SELECT * FROM part_test WHERE crt_time = '2016-10-25 00:00:00'::timestamp; QUERY PLAN --------------------------------------------------------------------------------- Append (cost=0.00..16.18 rows=1 width=45) -> Seq Scan on part_test_1 (cost=0.00..16.18 rows=1 width=45) Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone) (3 rows)
在RANGE分区表使用过程中,建议您遵循以下原则:
分区列必须有not null约束。
分区个数必须能覆盖已有的所有记录。
使用非堵塞式迁移接口。
数据迁移完成后,禁用主表。
HASH分区
有一个管理函数用来创建范围分区,可以指定起始值、间隔、分区个数,具体如下:
create_hash_partitions(relation REGCLASS, -- 主表OID
attribute TEXT, -- 分区列名
partitions_count INTEGER, -- 打算创建多少个分区
partition_data BOOLEAN DEFAULT TRUE) -- 是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
示例如下所示:
创建需要分区的主表,并插入测试数据。
--- 创建需要分区的主表 CREATE TABLE part_test(id int, info text, crt_time timestamp not null); -- 分区列必须有not null约束 --- 插入测试数据,模拟已存在数据的主表 INSERT INTO part_test SELECT id,md5(random()::text),clock_timestamp() + (id||' hour')::interval FROM generate_series(1,10000) t(id);
查询主表的数据情况:
SELECT * FROM part_test limit 10;
返回结果如下:
id | info | crt_time ----+----------------------------------+---------------------------- 1 | 29ce4edc70dbfbe78912beb7c4cc95c2 | 2016-10-25 10:47:32.873879 2 | e0990a6fb5826409667c9eb150fef386 | 2016-10-25 11:47:32.874048 3 | d25f577a01013925c203910e34470695 | 2016-10-25 12:47:32.874059 4 | 501419c3f7c218e562b324a1bebfe0ad | 2016-10-25 13:47:32.874065 5 | 5e5e22bdf110d66a5224a657955ba158 | 2016-10-25 14:47:32.87407 6 | 55d2d4fd5229a6595e0dd56e13d32be4 | 2016-10-25 15:47:32.874076 7 | 1dfb9a783af55b123c7a888afe1eb950 | 2016-10-25 16:47:32.874081 8 | 41eeb0bf395a4ab1e08691125ae74bff | 2016-10-25 17:47:32.874087 9 | 83783d69cc4f9bb41a3978fe9e13d7fa | 2016-10-25 18:47:32.874092 10 | affc9406d5b3412ae31f7d7283cda0dd | 2016-10-25 19:47:32.874097 (10 rows)
创建分区。
--- 创建128个分区 SELECT create_hash_partitions('part_test'::regclass, -- 主表OID 'crt_time', -- 分区列名 128, -- 计划创建分区个数 false) ; -- 不迁移数据
使用非堵塞式的迁移接口,迁移主表数据。
--- 未迁移数据时,数据还在主表 SELECT count(*) FROM ONLY part_test; count ------- 10000 (1 row) --- 非堵塞式的迁移接口 partition_table_concurrently(relation REGCLASS, -- 主表OID batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录 sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。 --- 使用非堵塞式的迁移接口,迁移主表数据 SELECT partition_table_concurrently('part_test'::regclass, 10000, 1.0); --- 迁移结束后,主表数据已不存在,所有数据均已转移至分区中 SELECT count(*) FROM ONLY part_test; count ------- 0 (1 row)
数据迁移完成后,建议禁用主表,以避免在执行计划中出现主表。
--- 禁用主表 SELECT set_enable_parent('part_test'::regclass, false);
验证执行计划操作:
--- 查询单个分区 EXPLAIN SELECT * FROM part_test WHERE crt_time = '2016-10-25 00:00:00'::timestamp; QUERY PLAN --------------------------------------------------------------------------------- Append (cost=0.00..1.91 rows=1 width=45) -> Seq Scan on part_test_122 (cost=0.00..1.91 rows=1 width=45) Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone) (3 rows)
分区表约束如下,可以看出pg_pathman自动完成了转换,如果是传统的继承,
SELECT * FROM part_test WHERE crt_time = '2016-10-25 00:00:00'::timestamp;
这种写法无法筛选分区。\d+ part_test_122 Table "public.part_test_122" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_122_3_check" CHECK (get_hash_part_idx(timestamp_hash(crt_time), 128) = 122) Inherits: part_test
在HASH分区表使用过程中,建议您:
分区列必须有not null约束。
使用非堵塞式迁移接口。
数据迁移完成后,禁用主表。
pg_pathman不会受制于表达式的写法,所以
select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;
这样的写法也能用于HASH分区的。HASH分区列不局限于int类型的列,会使用HASH函数自动转换。
数据迁移到分区
如果创建分区表时,未将主表数据迁移到分区,那么可以使用非堵塞式的迁移接口,将数据迁移到分区。用法如下:
WITH tmp AS (DELETE FROM 主表 limit xx nowait returning *) INSERT INTO 分区 SELECT * FROM tmp;
或者使用如下语句进行标示,然后执行DELETE和INSERT。
SELECT array_agg(ctid) FROM 主表 limit xx FOR UPDATE nowati;
函数接口如下:
partition_table_concurrently(relation REGCLASS, -- 主表OID
batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录
sleep_time FLOAT8 DEFAULT 1.0) -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。
示例如下所示:
SELECT partition_table_concurrently('part_test'::regclass,
10000,
1.0);
查看后台的数据迁移任务。
SELECT * FROM pathman_concurrent_part_tasks;
分裂范围分区
如果某个分区太大,想分裂为两个分区,可以使用如下方法(目前仅支持RANGE分区表):
split_range_partition(partition REGCLASS, -- 分区oid
split_value ANYELEMENT, -- 分裂值
partition_name TEXT DEFAULT NULL) -- 分裂后新增的分区表名
示例如下所示:
使用RANGE分区示例的分区表,分区表结构如下所示。
\d+ part_test Table "public.part_test" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Child tables: part_test_1, part_test_10, part_test_11, part_test_12, part_test_13, part_test_14, part_test_15, part_test_16, part_test_17, part_test_18, part_test_19, part_test_2, part_test_20, part_test_21, part_test_22, part_test_23, part_test_24, part_test_3, part_test_4, part_test_5, part_test_6, part_test_7, part_test_8, part_test_9 \d+ part_test_1 Table "public.part_test_1" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone) Inherits: part_test
分区分裂。
SELECT split_range_partition('part_test_1'::regclass, -- 分区oid '2016-11-10 00:00:00'::timestamp, -- 分裂值 'part_test_1_2'); -- 分区表名
分裂后的两个表如下:
\d+ part_test_1 Table "public.part_test_1" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-10 00:00:00'::timestamp without time zone) Inherits: part_test \d+ part_test_1_2 Table "public.part_test_1_2" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_1_2_3_check" CHECK (crt_time >= '2016-11-10 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone) Inherits: part_test
此时,数据会自动迁移到另一个分区。
SELECT count(*) FROM part_test_1; count ------- 373 (1 row) SELECT count(*) FROM part_test_1_2; count ------- 360 (1 row)
继承关系如下:
\d+ part_test Table "public.part_test" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Child tables: part_test_1, part_test_10, part_test_11, part_test_12, part_test_13, part_test_14, part_test_15, part_test_16, part_test_17, part_test_18, part_test_19, part_test_1_2, -- 新增的表 part_test_2, part_test_20, part_test_21, part_test_22, part_test_23, part_test_24, part_test_3, part_test_4, part_test_5, part_test_6, part_test_7, part_test_8, part_test_9
合并范围分区
目前仅支持RANGE分区,且必须为相邻分区。调用如下接口:
--- 指定两个需要合并分区
merge_range_partitions(partition1 REGCLASS, partition2 REGCLASS)
示例如下所示:
使用分裂范围分区示例,进行合并。
SELECT merge_range_partitions('part_test_1'::regclass, 'part_test_1_2'::regclass);
说明非相邻分区执行合并将报错,报错如下。
SELECT merge_range_partitions('part_test_2'::regclass, 'part_test_12'::regclass) ; ERROR: merge failed, partitions must be adjacent CONTEXT: PL/pgSQL function merge_range_partitions_internal(regclass,regclass,regclass,anyelement) line 27 at RAISE SQL statement "SELECT public.merge_range_partitions_internal($1, $2, $3, NULL::timestamp without time zone)" PL/pgSQL function merge_range_partitions(regclass,regclass) line 44 at EXECUTE
合并后,会删处其中一个分区表。
\d part_test_1_2 Did not find any relation named "part_test_1_2". \d part_test_1 Table "public.part_test_1" Column | Type | Modifiers ----------+-----------------------------+----------- id | integer | info | text | crt_time | timestamp without time zone | not null Check constraints: "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone) Inherits: part_test SELECT count(*) FROM part_test_1; count ------- 733 (1 row)
添加范围分区
如果主表已完成分区,将来若需增加分区,可以采用多种方法。以下将介绍三种方法:向后添加范围分区、向前添加范围分区以及指定分区起始值添加范围分区。
向后添加范围分区
向后添加范围分区时(即在末尾追加分区),会使用初次创建该分区表时的interval作为间隔。可以在pathman_config中查询每个分区表初次创建时的interval,如下:
SELECT * FROM pathman_config;
partrel | attname | parttype | range_interval
-----------+----------+----------+----------------
part_test | crt_time | 2 | 1 mon
(1 row)
添加分区接口(目前不支持指定表空间)。
append_range_partition(parent REGCLASS, -- 主表OID
partition_name TEXT DEFAULT NULL, -- 新增的分区表名, 默认不需要输入
tablespace TEXT DEFAULT NULL) -- 新增的分区表放到哪个表空间, 默认不需要输入
示例如下所示:
SELECT append_range_partition('part_test'::regclass);
\d+ part_test_25
Table "public.part_test_25"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_25_3_check" CHECK (crt_time >= '2018-10-25 00:00:00'::timestamp without time zone AND crt_time < '2018-11-25 00:00:00'::timestamp without time zone)
Inherits: part_test
\d+ part_test_24
Table "public.part_test_24"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_24_3_check" CHECK (crt_time >= '2018-09-25 00:00:00'::timestamp without time zone AND crt_time < '2018-10-25 00:00:00'::timestamp without time zone)
Inherits: part_test
向前添加范围分区
向前添加范围分区,接口如下:
prepend_range_partition(parent REGCLASS,
partition_name TEXT DEFAULT NULL,
tablespace TEXT DEFAULT NULL)
示例如下所示:
SELECT prepend_range_partition('part_test'::regclass);
\d+ part_test_26
Table "public.part_test_26"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_26_3_check" CHECK (crt_time >= '2016-09-25 00:00:00'::timestamp without time zone AND crt_time < '2016-10-25 00:00:00'::timestamp without time zone)
Inherits: part_test
\d+ part_test_1
Table "public.part_test_1"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)
Inherits: part_test
指定分区起始值添加分区
指定分区起始值的方式添加范围分区,只要新创建的分区与现有分区之间不存在数据重叠,即可成功创建。也就是说,采用这种方法并不要求强制创建连续的分区。例如已有分区覆盖了2010-2015的范围,您可以直接创建一个2020年的分区表,不需要覆盖2015到2020的范围。接口如下:
add_range_partition(relation REGCLASS, -- 主表OID
start_value ANYELEMENT, -- 起始值
end_value ANYELEMENT, -- 结束值
partition_name TEXT DEFAULT NULL, -- 分区名
tablespace TEXT DEFAULT NULL) -- 分区创建在哪个表空间下
示例如下所示:
SELECT add_range_partition('part_test'::regclass, -- 主表OID
'2020-01-01 00:00:00'::timestamp, -- 起始值
'2020-02-01 00:00:00'::timestamp); -- 结束值
\d+ part_test_27
Table "public.part_test_27"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_27_3_check" CHECK (crt_time >= '2020-01-01 00:00:00'::timestamp without time zone AND crt_time < '2020-02-01 00:00:00'::timestamp without time zone)
Inherits: part_test
删除分区
删除单个范围分区,接口如下:
drop_range_partition(partition TEXT, -- 分区名称 delete_data BOOLEAN DEFAULT TRUE) -- 是否删除分区数据,如果false,表示分区数据迁移到主表。 Drop RANGE partition and all of its data if delete_data is true.
删除所有分区,并且指定是否要将数据迁移到主表。接口如下:
drop_partitions(parent REGCLASS, delete_data BOOLEAN DEFAULT FALSE) Drop partitions of the parent table (both foreign and local relations). If delete_data is false, the data is copied to the parent table first. Default is false.
示例如下所示:
删除分区,数据迁移到主表。
SELECT drop_range_partition('part_test_1',false); SELECT drop_range_partition('part_test_2',false);
查询当前主表数据量:
SELECT count(*) FROM part_test; count ------- 10000 (1 row)
删除分区,同时删除分区数据,分区数据不迁移至主表。
SELECT drop_range_partition('part_test_3',true);
查询当前主表数据量:
SELECT count(*) FROM part_test; count ------- 9256 (1 row) SELECT count(*) FROM ONLY part_test; count ------- 1453 (1 row)
删除所有分区。
SELECT drop_partitions('part_test'::regclass, false); -- 删除所有分区表,并将数据迁移到主表
查询主表数据:
SELECT count(*) FROM part_test; count ------- 9256 (1 row)
绑定分区
将已有的表,绑定到已有的某个分区主表。已有的表与主表要保持一致的结构,包括dropped columns(查看pg_attribute的一致性)。接口如下:
attach_range_partition(relation REGCLASS, -- 主表OID
partition REGCLASS, -- 分区表OID
start_value ANYELEMENT, -- 起始值
end_value ANYELEMENT) -- 结束值
示例如下所示:
创建作为分区的表。
CREATE TABLE part_test_1 (like part_test including all);
将已有的表绑定至主表。
\d+ part_test Table "public.part_test" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | \d+ part_test_1 Table "public.part_test_1" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | SELECT attach_range_partition('part_test'::regclass, 'part_test_1'::regclass, '2019-01-01 00:00:00'::timestamp, '2019-02-01 00:00:00'::timestamp);
绑定分区时,将自动创建继承关系,自动创建约束。
\d+ part_test_1 Table "public.part_test_1" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_1_3_check" CHECK (crt_time >= '2019-01-01 00:00:00'::timestamp without time zone AND crt_time < '2019-02-01 00:00:00'::timestamp without time zone) Inherits: part_test
解绑分区
将分区从主表的继承关系中删除,不删除数据,删除继承关系,删除约束。接口如下:
detach_range_partition(partition REGCLASS) -- 指定分区名,转换为普通表
示例如下所示:
查询当前主表和分区表数据量。
SELECT count(*) FROM part_test; count ------- 9256 (1 row) SELECT count(*) FROM part_test_2; count ------- 733 (1 row)
解绑分区。
SELECT detach_range_partition('part_test_2');
查询当前主表和分区表数据量。
SELECT count(*) FROM part_test_2; count ------- 733 (1 row) SELECT count(*) FROM part_test; count ------- 8523 (1 row)
禁用pg_pathman
您可以针对单个分区主表禁用pg_pathman。接口函数如下:
disable_pathman_for
没有可逆操作,请慎使用。
\sf disable_pathman_for
CREATE OR REPLACE FUNCTION public.disable_pathman_for(parent_relid regclass)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
BEGIN
PERFORM public.validate_relname(parent_relid);
DELETE FROM public.pathman_config WHERE partrel = parent_relid;
PERFORM public.drop_triggers(parent_relid);
/* Notify backend about changes */
PERFORM public.on_remove_partitions(parent_relid);
END
$function$
示例如下所示:
SELECT disable_pathman_for('part_test');
\d+ part_test
Table "public.part_test"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Child tables: part_test_10,
part_test_11,
part_test_12,
part_test_13,
part_test_14,
part_test_15,
part_test_16,
part_test_17,
part_test_18,
part_test_19,
part_test_20,
part_test_21,
part_test_22,
part_test_23,
part_test_24,
part_test_25,
part_test_26,
part_test_27,
part_test_28,
part_test_29,
part_test_3,
part_test_30,
part_test_31,
part_test_32,
part_test_33,
part_test_34,
part_test_35,
part_test_4,
part_test_5,
part_test_6,
part_test_7,
part_test_8,
part_test_9
\d+ part_test_10
Table "public.part_test_10"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | not null | plain | |
Check constraints:
"pathman_part_test_10_3_check" CHECK (crt_time >= '2017-06-25 00:00:00'::timestamp without time zone AND crt_time < '2017-07-25 00:00:00'::timestamp without time zone)
Inherits: part_test
禁用pg_pathman插件后,继承关系和约束不会变化,只是pg_pathman插件不介入custom scan执行计划。禁用pg_pathman插件后的执行计划如下:
EXPLAIN SELECT * FROM part_test WHERE crt_time='2017-06-25 00:00:00'::timestamp;
QUERY PLAN
---------------------------------------------------------------------------------
Append (cost=0.00..16.00 rows=2 width=45)
-> Seq Scan on part_test (cost=0.00..0.00 rows=1 width=45)
Filter: (crt_time = '2017-06-25 00:00:00'::timestamp without time zone)
-> Seq Scan on part_test_10 (cost=0.00..16.00 rows=1 width=45)
Filter: (crt_time = '2017-06-25 00:00:00'::timestamp without time zone)
(5 rows)
高级分区管理
禁用主表
当主表的数据全部迁移到分区后,可以禁用主表。接口函数如下:
set_enable_parent(relation REGCLASS, value BOOLEAN)
Include/exclude parent table into/from query plan.
In original PostgreSQL planner parent table is always included into query plan even if it's empty which can lead to additional overhead.
You can use disable_parent() if you are never going to use parent table as a storage.
Default value depends on the partition_data parameter that was specified during initial partitioning in create_range_partitions() or create_partitions_from_range() functions.
If the partition_data parameter was true then all data have already been migrated to partitions and parent table disabled.
Otherwise it is enabled.
示例如下所示:
SELECT set_enable_parent('part_test', false);
自动扩展分区
范围分区表,允许自动扩展分区。如果新插入的数据不在已有的分区范围内,会自动创建分区。
set_auto(relation REGCLASS, value BOOLEAN)
Enable/disable auto partition propagation (only for RANGE partitioning).
It is enabled by default.
示例如下所示:
查询当前测试表格分区情况。
\d+ part_test Table "public.part_test" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Child tables: part_test_10, part_test_11, part_test_12, part_test_13, part_test_14, part_test_15, part_test_16, part_test_17, part_test_18, part_test_19, part_test_20, part_test_21, part_test_22, part_test_23, part_test_24, part_test_25, part_test_26, part_test_3, part_test_4, part_test_5, part_test_6, part_test_7, part_test_8, part_test_9 \d+ part_test_26 Table "public.part_test_26" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_26_3_check" CHECK (crt_time >= '2018-09-25 00:00:00'::timestamp without time zone AND crt_time < '2018-10-25 00:00:00'::timestamp without time zone) Inherits: part_test \d+ part_test_25 Table "public.part_test_25" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Check constraints: "pathman_part_test_25_3_check" CHECK (crt_time >= '2018-08-25 00:00:00'::timestamp without time zone AND crt_time < '2018-09-25 00:00:00'::timestamp without time zone) Inherits: part_test
插入一个不在已有分区范围的值,将根据创建分区时的interval自动扩展若干个分区,需要提醒您的是这个操作耗时可能较长。
INSERT INTO part_test VALUES (1,'test','2222-01-01'::timestamp);
查询此时的分区情况:
\d+ part_test Table "public.part_test" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+--------------+------------- id | integer | | plain | | info | text | | extended | | crt_time | timestamp without time zone | not null | plain | | Child tables: part_test_10, part_test_100, part_test_1000, part_test_1001, ......
不建议开启自动扩展范围分区,不合理的自动扩展可能会消耗大量的时间。
回调函数
回调函数是在每创建一个分区时会自动触发调用的函数。例如,可以用在DDL逻辑复制中,将DDL语句记录下来,存放到表中。回调函数如下:
set_init_callback(relation REGCLASS, callback REGPROC DEFAULT 0)
Set partition creation callback to be invoked for each attached or created partition (both HASH and RANGE).
The callback must have the following signature:
part_init_callback(args JSONB) RETURNS VOID.
Parameter arg consists of several fields whose presence depends on partitioning type:
/* RANGE-partitioned table abc (child abc_4) */
{
"parent": "abc",
"parttype": "2",
"partition": "abc_4",
"range_max": "401",
"range_min": "301"
}
/* HASH-partitioned table abc (child abc_0) */
{
"parent": "abc",
"parttype": "1",
"partition": "abc_0"
}
示例如下所示:
创建回调函数。
CREATE OR REPLACE FUNCTION f_callback_test(jsonb) RETURNS void AS $$ DECLARE BEGIN CREATE TABLE if NOT EXISTS rec_part_ddl(id serial primary key, parent name, parttype int, partition name, range_max text, range_min text); if ($1->>'parttype')::int = 1 then raise notice 'parent: %, parttype: %, partition: %', $1->>'parent', $1->>'parttype', $1->>'partition'; INSERT INTO rec_part_ddl(parent, parttype, partition) values (($1->>'parent')::name, ($1->>'parttype')::int, ($1->>'partition')::name); elsif ($1->>'parttype')::int = 2 then raise notice 'parent: %, parttype: %, partition: %, range_max: %, range_min: %', $1->>'parent', $1->>'parttype', $1->>'partition', $1->>'range_max', $1->>'range_min'; INSERT INTO rec_part_ddl(parent, parttype, partition, range_max, range_min) values (($1->>'parent')::name, ($1->>'parttype')::int, ($1->>'partition')::name, $1->>'range_max', $1->>'range_min'); END if; END; $$ LANGUAGE plpgsql strict;
准备测试表。
CREATE TABLE tt(id int, info text, crt_time timestamp not null); --- 设置测试表的回调函数 SELECT set_init_callback('tt'::regclass, 'f_callback_test'::regproc); --- 创建分区 SELECT create_range_partitions('tt'::regclass, -- 主表OID 'crt_time', -- 分区列名 '2016-10-25 00:00:00'::timestamp, -- 开始值 interval '1 month', -- 间隔;interval 类型,用于时间分区表 24, -- 分多少个区 false) ;
检查回调函数是否已调用。
SELECT * FROM rec_part_ddl;
返回结果如下:
id | parent | parttype | partition | range_max | range_min ----+--------+----------+-----------+---------------------+--------------------- 1 | tt | 2 | tt_1 | 2016-11-25 00:00:00 | 2016-10-25 00:00:00 2 | tt | 2 | tt_2 | 2016-12-25 00:00:00 | 2016-11-25 00:00:00 3 | tt | 2 | tt_3 | 2017-01-25 00:00:00 | 2016-12-25 00:00:00 4 | tt | 2 | tt_4 | 2017-02-25 00:00:00 | 2017-01-25 00:00:00 5 | tt | 2 | tt_5 | 2017-03-25 00:00:00 | 2017-02-25 00:00:00 6 | tt | 2 | tt_6 | 2017-04-25 00:00:00 | 2017-03-25 00:00:00 7 | tt | 2 | tt_7 | 2017-05-25 00:00:00 | 2017-04-25 00:00:00 8 | tt | 2 | tt_8 | 2017-06-25 00:00:00 | 2017-05-25 00:00:00 9 | tt | 2 | tt_9 | 2017-07-25 00:00:00 | 2017-06-25 00:00:00 10 | tt | 2 | tt_10 | 2017-08-25 00:00:00 | 2017-07-25 00:00:00 11 | tt | 2 | tt_11 | 2017-09-25 00:00:00 | 2017-08-25 00:00:00 12 | tt | 2 | tt_12 | 2017-10-25 00:00:00 | 2017-09-25 00:00:00 13 | tt | 2 | tt_13 | 2017-11-25 00:00:00 | 2017-10-25 00:00:00 14 | tt | 2 | tt_14 | 2017-12-25 00:00:00 | 2017-11-25 00:00:00 15 | tt | 2 | tt_15 | 2018-01-25 00:00:00 | 2017-12-25 00:00:00 16 | tt | 2 | tt_16 | 2018-02-25 00:00:00 | 2018-01-25 00:00:00 17 | tt | 2 | tt_17 | 2018-03-25 00:00:00 | 2018-02-25 00:00:00 18 | tt | 2 | tt_18 | 2018-04-25 00:00:00 | 2018-03-25 00:00:00 19 | tt | 2 | tt_19 | 2018-05-25 00:00:00 | 2018-04-25 00:00:00 20 | tt | 2 | tt_20 | 2018-06-25 00:00:00 | 2018-05-25 00:00:00 21 | tt | 2 | tt_21 | 2018-07-25 00:00:00 | 2018-06-25 00:00:00 22 | tt | 2 | tt_22 | 2018-08-25 00:00:00 | 2018-07-25 00:00:00 23 | tt | 2 | tt_23 | 2018-09-25 00:00:00 | 2018-08-25 00:00:00 24 | tt | 2 | tt_24 | 2018-10-25 00:00:00 | 2018-09-25 00:00:00 (24 rows)