By digoal
A single computer supports discovering target customers up to 100 TB of data in milliseconds. How? You can do this using PostgreSQL.
PostgreSQL is the most advanced open-source database globally and offers many options to solve our data-related problems. We have always been able to solve business problems based on advanced PostgreSQL features.
Compared with other indexes, a Block Range Index (BRIN) is more cost-effective and is thus recommended for PostgreSQL. What you need is to add a data rule. Moreover, data used in advertising services is generally append-only static data. Therefore, you can introduce the data normalization step to the business logic without undermining the logic's overall aesthetic. This becomes practicable upon verification on the application side.
The business scenario is the same as that introduced in the previous two documents.
For example, how are a store's target customers discovered?
We first need to collect some data to answer this question. For example:
1) Users who browse and make purchases at this and other similar stores.
Browsing online stores generates behavior records, for instance, the stores and commodities we browse, the products we purchase, the stores where we make those purchases, and the times at which all of this happens.
Then for each store, we can extract a sample of users who browsed or made purchases.
2) After obtaining these user groups, we filter them for similar purchase intentions or other attributes.
We can analyze the attributes of these groups to form a larger target group and market to them.
1) 100 million daily active users
2) 100 million daily active stores
3) Each user visits 64 stores daily on average
4) 6.4 billion data points
1) 500 million weekly active users
2) 200 million weekly active stores
3) Each user views 256 stores each week on average
4) 128,000 million data points in total
We assumed that the volume will be enlarged up to 10 times and reaches a maximum of 1.28 trillion while operating.
1) New data is generated in the user-visit statistics table every day from an analytic system. This data is imported in parallel to Alibaba Cloud RDS for PostgreSQL through its external table interface for Object Storage Service (OSS) tables.
create table bi_user_tmall_vis(
uid int8, -- 用户ID
bid int8, -- 店铺ID,(商品ID使用其他表来表示,结果和查询需求类似,不再赘述)
cnt int -- 浏览次数,(商品浏览次数、购买次数,使用其他表来表示,结果和查询需求类似,不再赘述)
);
1) Querying a user ID that has browsed a specified store at, within, or beyond a fixed frequency.
This operation is performed to discover target customers of a specified store. This operation's idea is that customers who have viewed your products must have some interest in them.
2) It relates to the same condition but consists of multiple OR conditions.
Store IDs and views must be normalized to make queries more efficient.
The normalization method is as follows:
insert into bi_user_tmall_vis1 select * from bi_user_tmall_vis1 order by bid,cnt;
A bid and cnt-combined BRIN index is created after normalization.
create index idx_bi1 on bi_user_tmall_vis1 using brin (bid,cnt) WITH (pages_per_range='256');
We may also need to filter other customer attributes, such as gender, purchasing frequency, and age range.
That is to say, the table structure is no longer as simple as what we mentioned above. Nevertheless, we have done some screening in order to simplify the demo.
We can use three optimization methods when we have multiple query requirements:
1) Performing associated indexing
2) Performing multi-indexing, and combining the indexes using bitmapAnd and bitmapOr in PostgreSQL, and then performing SKIP scanning
3) Using table partitioning, i.e., partitioning other query conditions as partitioning fields
All of these methods aim to reduce the scanning volume and improve query efficiency.
Alibaba Cloud AnalyticDB for PostgreSQL supports partitioning syntax on multiple levels.
PostgreSQL can achieve multi-level partitioning using multi-level inheritance. Meanwhile, both PostgreSQL 10 and the plug-in pg_pathman support multi-level partitioning.
This feature leverages PostgreSQL's built-in multi-indexing BitmapAnd and BitmapOr scanning functions.
Example:
create table test(c1 int , c2 int, c3 int);
Index 1
create index idx1 on test (c1);
Index 2
create index idx3 on test (c3);
If you run a query based on these two indexes, PostgreSQL automatically combines them.
-- bitmapAnd scan
select * from test where c1 = ? and c3=?;
-- bitmapOr scan
select * from test where c1 = ? or c3=?;
1) Write 6.4 billion pieces of test data.
vi test.sql
insert into bi_user_tmall_vis select random()*2000000000,random()*100000000,random()*1000 from generate_series(1,10000);
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -t 20000
Table size:
postgres=# \dt+ bi_user_tmall_vis
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------------+-------+----------+--------+-------------
public | bi_user_tmall_vis | table | postgres | 311 GB |
(1 row)
2) Perform data normalization.
create table bi_user_tmall_vis1 (like bi_user_tmall_vis);
nohup psql -c "set work_mem='128GB';set maintenance_work_mem='128GB';insert into bi_user_tmall_vis1 select * from bi_user_tmall_vis order by bid,cnt;" >/dev/null 2>&1 &
3) Create a BRIN index.
create index idx_bi on bi_user_tmall_vis1 using brin (bid,cnt) WITH (pages_per_range='512');
Size of the index:
3MB左右,夸张吧,311GB的表,索引只有3MB大小。
4) Filter users who have browsed any store N to M times.
public | idx_bi | index | postgres | bi_user_tmall_vis1 | 3336 kB |
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where bid=1 and cnt between 1 and 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bi_user_tmall_vis1 (cost=521.47..105255.40 rows=7351 width=20) (actual time=16.024..25.791 rows=4 loops=1)
Output: uid, bid, cnt
Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))
Rows Removed by Index Recheck: 80380
Heap Blocks: lossy=512
Buffers: shared hit=529 read=511
-> Bitmap Index Scan on idx_bi (cost=0.00..519.63 rows=80384 width=0) (actual time=16.010..16.010 rows=5120 loops=1)
Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))
Buffers: shared hit=528
Planning time: 0.238 ms
Execution time: 25.822 ms
(11 rows)
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) or (bid=12000 and cnt <10000);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.bi_user_tmall_vis1 (cost=1674.17..315338.06 rows=153721 width=20) (actual time=47.115..78.014 rows=138 loops=1)
Output: uid, bid, cnt
Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)) OR ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000)))
Rows Removed by Index Recheck: 241014
Heap Blocks: lossy=1536
Buffers: shared hit=2608 read=512
-> BitmapOr (cost=1674.17..1674.17 rows=241151 width=0) (actual time=47.099..47.099 rows=0 loops=1)
Buffers: shared hit=1584
-> Bitmap Index Scan on idx_bi (cost=0.00..519.63 rows=80384 width=0) (actual time=16.167..16.167 rows=5120 loops=1)
Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))
Buffers: shared hit=528
-> Bitmap Index Scan on idx_bi (cost=0.00..519.63 rows=80384 width=0) (actual time=15.494..15.494 rows=5120 loops=1)
Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))
Buffers: shared hit=528
-> Bitmap Index Scan on idx_bi (cost=0.00..519.63 rows=80384 width=0) (actual time=15.437..15.437 rows=5120 loops=1)
Index Cond: ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000))
Buffers: shared hit=528
Planning time: 0.145 ms
Execution time: 78.062 ms
(19 rows)
Indexing Accuracy | Data Size of a Single Table | Size of a Single Table | Index Size | 1 Condition | 2 Conditions | 3 Conditions |
---|---|---|---|---|---|---|
pages_per_range=1 | 6.4 billion | 311GB | 1.6GB | 8.2 s | - | - |
pages_per_range=128 | 6.4 billion | 311GB | 13MB | 62 ms | - | 191 ms |
pages_per_range=256 | 6.4 billion | 311GB | 6MB | 33 ms | - | 105 ms |
pages_per_range=512 | 6.4 billion | 311GB | 3MB | 25 ms | - | 78 ms |
pages_per_range=sqrt(pg_class.relpages)=6384 | 6.4 billion | 311GB | 300KB | 97 ms | 112 ms | 139 ms |
As explained above, the I/O cost for BRIN indexing is divided into two parts:
1) Scanning BRIN index blocks
2) Scanning HEAP table blocks
These two scans combined represent the I/O costs, the remaining cost resulting from CUP computing for filtering each record.
I) For a single condition query, the number of HEAP data blocks that will be hit is determined by two factors:
The greater value will be selected, so 5,000 HEAP data blocks need to be queried at a time.
II) For a single condition query, how many BRIN index data blocks need to be scanned is determined by the index size.
When pages_per_range=512, the BRIN index size is about 3 MB.
III) For a single condition query, estimating I/O costs in BRIN index scanning requires scanning 3 MB + 5,000 HEAP blocks.
IV) The estimation method for a multi-condition query is similar.
Taking a three-condition query as an example, scanning costs (HEAP BLOCK + BRIN IDX BLOCK) are multiplied by three times.
So, how shall we select the BRIN index accuracy parameter pages_per_range?
How can we compute a ten-condition query's block costs assuming a measurement requirement, for example, execution in milliseconds?
1) Estimate how many records are occupied by an equivalent condition (A):
1、pg_stats.n_distinct >= 1 时
(pg_class.reltuples/pg_stats.n_distinct)
2、pg_stats.n_distinct < 1 时
(pg_class.reltuples*pg_stats.n_distinct)/pg_class.reltuples
3、pg_stats.n_distinct = -1 时
1
2) Estimate the correlation (B):
B = abs(pg_stats.correlation)
3) Estimate how many HEAP blocks are occupied by an equivalent condition (C):
C = A / B
4) Estimate how many data blocks are occupied by the BRIN index when pages_per_range=1 (D):
D = pg_class.relpages/(pg_class.reltuples/pg_class.relpages)
5) Estimate how many data blocks are occupied by the BRIN index when pages_per_range=n (E):
E = D / n
6) Estimate how many BRIN index blocks need to be scanned in a ten-condition query when pages_per_range=n (F):
F = 10 * E
7) Estimate how many HEAP blocks need to be scanned in a ten-condition query when pages_per_range=n (G):
G = 10 * C
8) Estimate how many HEAP blocks need to be scanned in a ten-condition query when pages_per_range=n (H):
H = F + G
You can use these formulas to compute how much pages_per_range needs to be set to in order to achieve milliseconds returns for a ten condition query.
According to weekly statistics, the volume usually is 1 billion data points in size, 1 trillion data points during operations.
We tested a 6.4 billion single table in the text above and got query returns within milliseconds.
Then how about a single table with 1 trillion data points? Partitioning based on stores and product IDs can be solved with a partition table.
Take HASH partitioning, for example:
HASH is divided into 64 partitions based on stores, with a 100 million data point volume for each partition.
HASH is divided into 640 partitions based on stores, with a 200 million - 2 billion data point volume for each partition.
A 6.4 billion data point single table is tested, and performance is proven. So, would you be concerned about a small table that only has a few hundred million data points?
Introduction to the PostgreSQL partitioning use method: https://github.com/postgrespro/pg_pathman
The following are the technical points necessary to the document:
1) BRIN, which helps users efficiently filter data at zero cost and deliver a response in milliseconds during target customer discovery using any store condition with a single table of 6.4 billion data points.
2) Data normalization that improves the linear correlation of fields. It makes the BRIN data scope almost perfect.
Data rules and acceleration methods:
a) Writing the targets in order when importing.
b) PostgreSQL 11 parallel sorting.
c) Partition table parallel sorting.
3) Partitioning + data normalization, a data optimization method used for volumes with 1 trillion or more data points.
4) HDB normalization + metascan, where metascan is similar to BRIN and relates to a kernel feature added by Alibaba Cloud for HDB for PostgreSQL. The original Greenplum does not have this feature.
5) Parallel append scan. After being split into multiple partition tables, PostgreSQL can then parallel-scan the partition tables. This can improve overall performance.
6) Multi-field indexing that improves data filter accuracy, reduces the amount scanned, and improves query performance by combining bitmapAnd and bitmapOr.
Comparison Between PostgreSQL 10 and Greenplum in Spatiotemporal Object Selection and Data Pivoting
Spatial Search: Geometry and GiST Combination Outperforms Geohash and B-tree
digoal - January 18, 2021
digoal - June 26, 2019
- January 12, 2018
Alibaba Clouder - January 18, 2018
digoal - May 16, 2019
digoal - December 21, 2020
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreTair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreMore Posts by digoal