By digoal
There are N results that meet the conditions, and M results are randomly returned. What are the requirements in that scenario? What is recommended?
1. BERNOULLI Sampling: Scan the whole table and return N% of the total sampling records.
If limit N is used to intercept N returns, the efficiency will be improved, but you will find that the data is not random enough, and it is basically concentrated in the blocks in the head of the data file:
postgres=# select ctid,* from tbl TABLESAMPLE BERNOULLI (1) limit 10;
ctid | id | loc | beginid | endid
---------+------+------+-----------+-----------
(1,6) | 76 | 2006 | 100034889 | 100035096
(3,8) | 218 | 3280 | 100105401 | 100105571
(4,19) | 299 | 708 | 100145769 | 100146449
(6,25) | 445 | 3195 | 100220431 | 100221192
(7,3) | 493 | 1867 | 100247252 | 100248048
(9,5) | 635 | 2125 | 100318350 | 100319087
(10,51) | 751 | 1151 | 100374936 | 100375883
(12,20) | 860 | 2302 | 100430532 | 100430674
(12,33) | 873 | 15 | 100438908 | 100439548
(17,15) | 1205 | 2540 | 100607913 | 100608198
(10 rows)
Time: 0.387 ms
postgres=# select ctid,* from tbl TABLESAMPLE BERNOULLI (1) limit 10;
ctid | id | loc | beginid | endid
---------+-----+------+-----------+-----------
(1,18) | 88 | 1195 | 100038559 | 100039159
(2,20) | 160 | 4137 | 100076107 | 100076919
(4,62) | 342 | 2480 | 100169826 | 100170454
(6,2) | 422 | 392 | 100209331 | 100209964
(6,4) | 424 | 4239 | 100210066 | 100211053
(8,52) | 612 | 3086 | 100304147 | 100304853
(10,9) | 709 | 2137 | 100354108 | 100354314
(10,45) | 745 | 1808 | 100371819 | 100372439
(13,19) | 929 | 3888 | 100471124 | 100471669
(13,27) | 937 | 609 | 100476280 | 100476579
(10 rows)
Time: 0.301 ms
postgres=# select ctid,* from tbl TABLESAMPLE BERNOULLI (1) limit 10;
ctid | id | loc | beginid | endid
---------+------+------+-----------+-----------
(0,38) | 38 | 486 | 100015564 | 100015931
(0,51) | 51 | 1244 | 100022368 | 100022860
(1,29) | 99 | 4326 | 100044116 | 100044999
(1,63) | 133 | 449 | 100061611 | 100062197
(4,10) | 290 | 755 | 100141357 | 100142271
(5,13) | 363 | 960 | 100178662 | 100179447
(5,64) | 414 | 3800 | 100204948 | 100205770
(11,22) | 792 | 831 | 100395860 | 100396113
(14,17) | 997 | 2487 | 100504353 | 100504505
(16,41) | 1161 | 4053 | 100587279 | 100587984
(10 rows)
Time: 0.507 ms
2. SYSTEM Sampling: Scan the whole BLOCKS of the table and return N% of the total sampling blocks. The efficiency is much higher than BERNOULLI sampling, but all records in each sampled block will be returned.
If limit N is used to intercept N returns, you will find that the data is not random enough, and all records in each sampled block will be returned.
postgres=# select ctid,* from tbl TABLESAMPLE system (5) limit 10;
ctid | id | loc | beginid | endid
---------+-----+------+-----------+-----------
(10,1) | 701 | 2675 | 100348960 | 100349937
(10,2) | 702 | 4307 | 100349937 | 100350353
(10,3) | 703 | 475 | 100350353 | 100351093
(10,4) | 704 | 1611 | 100351093 | 100351171
(10,5) | 705 | 4307 | 100351171 | 100351692
(10,6) | 706 | 2841 | 100351692 | 100352448
(10,7) | 707 | 3680 | 100352448 | 100353372
(10,8) | 708 | 1085 | 100353372 | 100354108
(10,9) | 709 | 2137 | 100354108 | 100354314
(10,10) | 710 | 3381 | 100354314 | 100354905
(10 rows)
Time: 0.547 ms
postgres=# select ctid,* from tbl TABLESAMPLE system (5) limit 10;
ctid | id | loc | beginid | endid
---------+------+------+-----------+-----------
(30,1) | 2101 | 2279 | 101047339 | 101047873
(30,2) | 2102 | 4113 | 101047873 | 101048273
(30,3) | 2103 | 47 | 101048273 | 101048691
(30,4) | 2104 | 3766 | 101048691 | 101049398
(30,5) | 2105 | 4133 | 101049398 | 101049842
(30,6) | 2106 | 2915 | 101049842 | 101050766
(30,7) | 2107 | 2489 | 101050766 | 101050799
(30,8) | 2108 | 4325 | 101050799 | 101051401
(30,9) | 2109 | 2204 | 101051401 | 101052374
(30,10) | 2110 | 1397 | 101052374 | 101052726
(10 rows)
Time: 0.457 ms
postgres=# select ctid,* from tbl TABLESAMPLE system (5) limit 10;
ctid | id | loc | beginid | endid
--------+----+------+-----------+-----------
(0,1) | 1 | 234 | 100000000 | 100000295
(0,2) | 2 | 1153 | 100000295 | 100000793
(0,3) | 3 | 4127 | 100000793 | 100001203
(0,4) | 4 | 3287 | 100001203 | 100001450
(0,5) | 5 | 2156 | 100001450 | 100002286
(0,6) | 6 | 1097 | 100002286 | 100002351
(0,7) | 7 | 2302 | 100002351 | 100002625
(0,8) | 8 | 1768 | 100002625 | 100003382
(0,9) | 9 | 39 | 100003382 | 100003630
(0,10) | 10 | 887 | 100003630 | 100004209
(10 rows)
Time: 0.427 ms
postgres=# select ctid,* from tbl TABLESAMPLE system (5) limit 10;
ctid | id | loc | beginid | endid
---------+------+------+-----------+-----------
(43,1) | 3011 | 4174 | 101493150 | 101493628
(43,2) | 3012 | 798 | 101493628 | 101494207
(43,3) | 3013 | 1334 | 101494207 | 101495189
(43,4) | 3014 | 4870 | 101495189 | 101495937
(43,5) | 3015 | 1920 | 101495937 | 101496065
(43,6) | 3016 | 2561 | 101496065 | 101496782
(43,7) | 3017 | 1617 | 101496782 | 101497469
(43,8) | 3018 | 3226 | 101497469 | 101497864
(43,9) | 3019 | 4582 | 101497864 | 101498854
(43,10) | 3020 | 2110 | 101498854 | 101499388
(10 rows)
Time: 1.528 ms
3. SYSTEM + random probability filtering solves the problem that the records returned in system sampling are not discrete enough.
Performance is also good. The larger the table is, the more outstanding the performance is.
The dispersion depends on the random probability condition, and the returned results are very discrete.
postgres=# select ctid,* from tbl TABLESAMPLE system (5) where random()<0.01 limit 10;
ctid | id | loc | beginid | endid
----------+-------+------+-----------+-----------
(3,19) | 229 | 2106 | 100110347 | 100110461
(3,45) | 255 | 4334 | 100124594 | 100125503
(149,43) | 10473 | 4125 | 105216168 | 105216240
(160,12) | 11212 | 787 | 105591453 | 105591718
(160,36) | 11236 | 106 | 105600343 | 105600824
(160,47) | 11247 | 4053 | 105606511 | 105607286
(203,48) | 14258 | 3078 | 107103358 | 107104264
(278,29) | 19489 | 3268 | 109723508 | 109723920
(286,10) | 20030 | 1048 | 109999654 | 110000489
(315,31) | 22081 | 2826 | 111013999 | 111014368
(10 rows)
Time: 2.837 ms
postgres=# select ctid,* from tbl TABLESAMPLE system (5) where random()<0.01 limit 10;
ctid | id | loc | beginid | endid
----------+-------+------+-----------+-----------
(69,26) | 4856 | 7 | 102417440 | 102417597
(71,65) | 5035 | 1092 | 102510582 | 102511527
(77,39) | 5429 | 1120 | 102704924 | 102705238
(152,31) | 10671 | 1466 | 105323336 | 105324170
(176,27) | 12347 | 4757 | 106150494 | 106151243
(176,59) | 12379 | 1195 | 106167151 | 106167416
(206,70) | 14490 | 3052 | 107221746 | 107222087
(229,46) | 16076 | 463 | 107995891 | 107996698
(242,29) | 16969 | 1865 | 108461627 | 108461736
(242,62) | 17002 | 4863 | 108477461 | 108478270
(10 rows)
Time: 2.667 ms
postgres=# select ctid,* from tbl TABLESAMPLE system (5) where random()<0.01 limit 10;
ctid | id | loc | beginid | endid
----------+-------+------+-----------+-----------
(1,54) | 124 | 1346 | 100056875 | 100057832
(9,69) | 699 | 2227 | 100347977 | 100348861
(14,54) | 1034 | 3879 | 100520998 | 100521176
(22,60) | 1600 | 2226 | 100802698 | 100803154
(28,30) | 1990 | 1133 | 100996297 | 100996580
(28,57) | 2017 | 189 | 101009211 | 101009952
(28,69) | 2029 | 2109 | 101015988 | 101016238
(106,1) | 7421 | 2469 | 103702081 | 103702435
(130,67) | 9167 | 1801 | 104582977 | 104583754
(149,20) | 10450 | 4126 | 105204544 | 105204560
(10 rows)
Time: 3.157 ms
postgres=# explain select ctid,* from tbl TABLESAMPLE system (5) where random()<0.01 limit 10;
QUERY PLAN
--------------------------------------------------------------------
Limit (cost=0.00..0.92 rows=10 width=90)
-> Sample Scan on tbl (cost=0.00..1535.40 rows=16667 width=90)
Sampling: system ('5'::real)
Filter: (random() < '0.01'::double precision)
(4 rows)
Time: 0.858 ms
4. In addition to the built-in sampling method, PG supports extended sampling methods, such as sampling by time (which refers to the random scanning and scanning in the specified time) and scanning samples for 1 millisecond at most and returning the records scanned in the 1 millisecond.
It also supports returning by the number of rows. For example, a maximum of 100 rows are sampled.
PostgreSQL Random Sampling Application- table sample, tsm_system_rows, tsm_system_time
-- Up to 5 milliseconds
postgres=# select ctid,* from tbl TABLESAMPLE system_time (5) where random()<0.02 limit 10;
ctid | id | loc | beginid | endid
------------+--------+------+-----------+-----------
(13943,32) | 976042 | 786 | 587732405 | 587732831
(5014,17) | 350997 | 4515 | 275520535 | 275520831
(10371,58) | 726028 | 3201 | 462659708 | 462660690
(10371,67) | 726037 | 2160 | 462664042 | 462664194
(1442,8) | 100948 | 1134 | 150431878 | 150432150
(1442,22) | 100962 | 2839 | 150436680 | 150437122
(1442,36) | 100976 | 3027 | 150444363 | 150444375
(6799,50) | 475980 | 4172 | 337780753 | 337781376
(6799,51) | 475981 | 1971 | 337781376 | 337781830
(12156,31) | 850951 | 3287 | 525181282 | 525182019
(10 rows)
Time: 0.908 ms
postgres=# explain select ctid,* from tbl TABLESAMPLE system_time (5) where random()<0.02 limit 10;
QUERY PLAN
----------------------------------------------------------------
Limit (cost=0.00..0.92 rows=10 width=90)
-> Sample Scan on tbl (cost=0.00..10.75 rows=117 width=90)
Sampling: system_time ('5'::double precision)
Filter: (random() < '0.02'::double precision)
(4 rows)
Time: 0.697 ms
- Up to 1000 rows
-- 1000 * probability 0.01 = 10, so limit 10 can be ignored.
postgres=# select ctid,* from tbl TABLESAMPLE system_rows(1000) where random()<0.01 ;
ctid | id | loc | beginid | endid
-----------+--------+------+-----------+-----------
(8522,48) | 596588 | 2148 | 397979059 | 397980022
(8454,21) | 591801 | 670 | 395580178 | 395581156
(8454,70) | 591850 | 1441 | 395608906 | 395609174
(1209,13) | 84643 | 3924 | 142242550 | 142243369
(8318,36) | 582296 | 4625 | 390811869 | 390811932
(1141,5) | 79875 | 2093 | 139858769 | 139859220
(1141,70) | 79940 | 4611 | 139892888 | 139893837
(1005,14) | 70364 | 2328 | 135109401 | 135109605
(1005,23) | 70373 | 2382 | 135113320 | 135113475
(9 rows)
Time: 2.370 ms
postgres=# select count(*) from tbl;
count
---------
1000000
(1 row)
Comparison of Random Scan of 1 Million Records
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./t1.sql -c 12 -j 12 -T 10
progress: 1.0 s, 11.9 tps, lat 685.136 ms stddev 65.024
progress: 2.0 s, 13.1 tps, lat 669.947 ms stddev 18.605
progress: 3.0 s, 22.9 tps, lat 708.591 ms stddev 43.357
progress: 4.0 s, 13.0 tps, lat 641.470 ms stddev 24.108
progress: 5.0 s, 22.9 tps, lat 676.400 ms stddev 37.549
progress: 6.0 s, 13.0 tps, lat 689.115 ms stddev 17.644
progress: 7.0 s, 18.9 tps, lat 767.281 ms stddev 22.536
progress: 8.0 s, 15.0 tps, lat 819.248 ms stddev 39.973
progress: 9.0 s, 13.0 tps, lat 856.244 ms stddev 29.988
progress: 10.0 s, 12.0 tps, lat 829.066 ms stddev 22.431
pgbench (PostgreSQL) 14.0
transaction type: ./t1.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 168
latency average = 749.340 ms
latency stddev = 101.206 ms
initial connection time = 11.026 ms
tps = 15.804188 (without initial connection time)
statement latencies in milliseconds:
749.340 select * from tbl order by random() limit 10;
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./t2.sql -c 12 -j 12 -T 10
progress: 1.0 s, 32119.4 tps, lat 0.367 ms stddev 0.932
progress: 2.0 s, 31335.6 tps, lat 0.384 ms stddev 1.286
progress: 3.0 s, 31556.0 tps, lat 0.376 ms stddev 1.679
progress: 4.0 s, 30996.3 tps, lat 0.390 ms stddev 1.413
progress: 5.0 s, 30622.0 tps, lat 0.390 ms stddev 1.006
progress: 6.0 s, 30778.7 tps, lat 0.384 ms stddev 2.160
progress: 7.0 s, 29766.9 tps, lat 0.405 ms stddev 2.657
progress: 8.0 s, 27425.5 tps, lat 0.443 ms stddev 2.396
progress: 9.0 s, 25860.8 tps, lat 0.462 ms stddev 3.315
progress: 10.0 s, 25695.2 tps, lat 0.462 ms stddev 2.708
pgbench (PostgreSQL) 14.0
transaction type: ./t2.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 296261
latency average = 0.404 ms
latency stddev = 2.059 ms
initial connection time = 13.047 ms
tps = 29641.685598 (without initial connection time)
statement latencies in milliseconds:
0.406 select ctid,* from tbl TABLESAMPLE SYSTEM (5) where random()<0.01 limit 10;
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./t3.sql -c 12 -j 12 -T 10
progress: 1.0 s, 28849.9 tps, lat 0.407 ms stddev 1.055
progress: 2.0 s, 28843.1 tps, lat 0.418 ms stddev 1.950
progress: 3.0 s, 27394.0 tps, lat 0.433 ms stddev 2.971
progress: 4.0 s, 28447.4 tps, lat 0.422 ms stddev 2.686
progress: 5.0 s, 26309.0 tps, lat 0.458 ms stddev 2.987
progress: 6.0 s, 29036.4 tps, lat 0.413 ms stddev 0.963
progress: 7.0 s, 29986.4 tps, lat 0.402 ms stddev 1.382
progress: 8.0 s, 29973.1 tps, lat 0.400 ms stddev 1.649
progress: 9.0 s, 25595.4 tps, lat 0.468 ms stddev 2.718
progress: 10.0 s, 26574.2 tps, lat 0.451 ms stddev 3.553
pgbench (PostgreSQL) 14.0
transaction type: ./t3.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 281140
latency average = 0.426 ms
latency stddev = 2.321 ms
initial connection time = 15.631 ms
tps = 28142.235104 (without initial connection time)
statement latencies in milliseconds:
0.429 select ctid,* from tbl TABLESAMPLE SYSTEM_ROWS (1000) where random()<0.01;
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./t4.sql -c 12 -j 12 -T 10
progress: 1.0 s, 26988.7 tps, lat 0.439 ms stddev 1.327
progress: 2.0 s, 24800.7 tps, lat 0.472 ms stddev 2.408
progress: 3.0 s, 25742.0 tps, lat 0.470 ms stddev 3.191
progress: 4.0 s, 29388.5 tps, lat 0.414 ms stddev 2.189
progress: 5.0 s, 29199.8 tps, lat 0.404 ms stddev 0.590
progress: 6.0 s, 25906.5 tps, lat 0.470 ms stddev 3.301
progress: 7.0 s, 29297.8 tps, lat 0.409 ms stddev 1.143
progress: 8.0 s, 29572.6 tps, lat 0.405 ms stddev 0.805
progress: 9.0 s, 26367.6 tps, lat 0.443 ms stddev 2.976
progress: 10.0 s, 25389.1 tps, lat 0.475 ms stddev 3.652
pgbench (PostgreSQL) 14.0
transaction type: ./t4.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 272779
latency average = 0.439 ms
latency stddev = 2.379 ms
initial connection time = 13.144 ms
tps = 27255.103831 (without initial connection time)
statement latencies in milliseconds:
0.441 select ctid,* from tbl TABLESAMPLE SYSTEM_time (5) where random()<0.01 limit 10;
Among the 10 million rows of records, 10,000 rows meet the conditions. In the random sampling method, users may have to scan many blocks to get accurate data.
Method 1: Traditional Order by Random() Method
create table t (id int primary key, c1 int, info text, crt_time timestamp);
insert into t select generate_series(1,10000000), random()*999, md5(random()::text), clock_timestamp();
create index idx_t_1 on t (c1);
postgres=# explain select * from t where c1=0 limit 10;
QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=0.43..11.11 rows=10 width=49)
-> Index Scan using idx_t_1 on t (cost=0.43..10623.39 rows=9947 width=49)
Index Cond: (c1 = 0)
(3 rows)
postgres=# explain select * from t where c1=0 order by random() limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------
Limit (cost=10863.21..10863.23 rows=10 width=57)
-> Sort (cost=10863.21..10888.07 rows=9947 width=57)
Sort Key: (random())
-> Index Scan using idx_t_1 on t (cost=0.43..10648.25 rows=9947 width=57)
Index Cond: (c1 = 0)
(5 rows)
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 10
progress: 1.0 s, 1165.9 tps, lat 9.960 ms stddev 14.434
progress: 2.0 s, 1298.1 tps, lat 9.319 ms stddev 12.472
progress: 3.0 s, 1313.8 tps, lat 9.137 ms stddev 11.276
progress: 4.0 s, 1161.6 tps, lat 10.068 ms stddev 13.447
progress: 5.0 s, 972.2 tps, lat 12.399 ms stddev 20.388
progress: 6.0 s, 999.8 tps, lat 12.249 ms stddev 17.811
progress: 7.0 s, 1153.3 tps, lat 10.366 ms stddev 13.479
progress: 8.0 s, 1180.6 tps, lat 9.992 ms stddev 12.386
progress: 9.0 s, 1029.6 tps, lat 11.916 ms stddev 17.798
progress: 10.0 s, 966.5 tps, lat 12.252 ms stddev 17.236
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 11257
latency average = 10.658 ms
latency stddev = 15.150 ms
initial connection time = 13.008 ms
tps = 1124.835901 (without initial connection time)
statement latencies in milliseconds:
11.003 select * from t where c1=0 order by random() limit 10;
Method 2: The index is used to get the blocks that meet the conditions. (This step is the most time-consuming.) Get ten random blocks from the blocks that meet the conditions and return ten random records from all records of these ten blocks.
select t.* from t where exists
( select 1 from
(select blkid from (select substring(ctid::text,'(\d*),') blkid from t where c1=0 group by 1) t -- Obtain all BLOCK IDs that meet the condition.
order by random() limit 10) t1 -- Obtain 10 random block id.
where t.ctid>=('('||t1.blkid||',0)')::tid -- Obtain all the records of these 10 blocks.
and t.ctid< ('('||t1.blkid::int+1||',0)')::tid -- Obtain all the records of these 10 blocks.
and t.c1=0 ) -- filter out c1=0
order by random()
limit 10; -- 10 entries are returned.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=26432.50..26432.52 rows=10 width=57)
-> Sort (cost=26432.50..26435.26 rows=1106 width=57)
Sort Key: (random())
-> Nested Loop Semi Join (cost=11246.98..26408.60 rows=1106 width=57)
Join Filter: ((t.ctid >= ((('('::text || t1.blkid) || ',0)'::text))::tid) AND (t.ctid < ((('('::text || (((t1.blkid)::integer + 1))::text) || ',0)'::text))::tid))
-> Index Scan using idx_t_1 on t (cost=0.43..10633.09 rows=9957 width=55)
Index Cond: (c1 = 0)
-> Materialize (cost=11246.54..11246.72 rows=10 width=32)
-> Subquery Scan on t1 (cost=11246.54..11246.67 rows=10 width=32)
-> Limit (cost=11246.54..11246.57 rows=10 width=40)
-> Sort (cost=11246.54..11271.43 rows=9957 width=40)
Sort Key: (random())
-> Subquery Scan on t_1 (cost=10732.66..11031.37 rows=9957 width=40)
-> HashAggregate (cost=10732.66..10906.91 rows=9957 width=32)
Group Key: "substring"((t_2.ctid)::text, '(\d*),'::text)
-> Index Scan using idx_t_1 on t t_2 (cost=0.43..10707.77 rows=9957 width=32)
Index Cond: (c1 = 0)
(17 rows)
Time: 0.707 ms
IT-C02YW2EFLVDL:~ digoal$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 10
progress: 1.0 s, 113.9 tps, lat 99.518 ms stddev 20.071
progress: 2.0 s, 125.1 tps, lat 95.576 ms stddev 10.811
progress: 3.0 s, 117.7 tps, lat 101.022 ms stddev 14.942
progress: 4.0 s, 95.2 tps, lat 124.592 ms stddev 26.282
progress: 5.0 s, 93.8 tps, lat 128.335 ms stddev 15.483
progress: 6.0 s, 95.9 tps, lat 128.530 ms stddev 14.861
progress: 7.0 s, 101.4 tps, lat 115.433 ms stddev 15.376
progress: 8.0 s, 116.0 tps, lat 104.154 ms stddev 10.846
progress: 9.0 s, 112.7 tps, lat 105.956 ms stddev 11.932
progress: 10.0 s, 110.3 tps, lat 107.870 ms stddev 11.686
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 10 s
number of transactions actually processed: 1095
latency average = 109.931 ms
latency stddev = 19.405 ms
initial connection time = 10.773 ms
tps = 108.826246 (without initial connection time)
statement latencies in milliseconds:
109.965 select t.* from t where exists
Alternatively, we can regularly find out at the business level. Put it into the middle business to randomly return it to the customer or directly put it in the database as a temporary table and refresh the content regularly.
postgres=# create unlogged table tmp_t_c1_1 (like t);
CREATE TABLE
Time: 6.524 ms
postgres=# insert into tmp_t_c1_1 select * from t where c1=1;
INSERT 0 10001
Time: 51.261 ms
postgres=# select * from tmp_t_c1_1 tables
postgres=# select * from tmp_t_c1_1 tablesample system_rows(1000) where random()<0.01 ;
id | c1 | info | crt_time
---------+----+----------------------------------+----------------------------
7869277 | 1 | b4483c7728566a238de08f77eba12774 | 2021-05-27 16:19:32.566097
7881151 | 1 | 49af1161c2e3a779328b32ddd058c7f0 | 2021-05-27 16:19:32.588799
1226520 | 1 | 46981759668b3a9acb953ab8b279f4ea | 2021-05-27 16:19:16.06893
1244566 | 1 | db248939b6c6647b25b1a32584747be8 | 2021-05-27 16:19:16.10505
1251111 | 1 | 9096bc3b83f80a20342b47c483c66b3b | 2021-05-27 16:19:16.118112
4677752 | 1 | 97dbe82b4f6476ee50d6a37775fb3e6f | 2021-05-27 16:19:24.337757
7984355 | 1 | 4f996ed3be9fb0fbd26b0444f44681d6 | 2021-05-27 16:19:32.817659
7996207 | 1 | bd4d80164abf2b17153e6b10fc871725 | 2021-05-27 16:19:32.840801
8020673 | 1 | 95765bac6bf7952bc59a6bc8d5db24f8 | 2021-05-27 16:19:32.887891
1322032 | 1 | 0f769db9cb6c5a920dba9ffd364b0a7d | 2021-05-27 16:19:16.285266
4734787 | 1 | 75f8b9e9ce72e7f5733efeaf0ed40215 | 2021-05-27 16:19:24.449741
8128961 | 1 | c356f3d289459dbe099769c8bd786e2c | 2021-05-27 16:19:33.124422
1492978 | 1 | 0619d948880cf16f7ce0c8e405525621 | 2021-05-27 16:19:16.678358
(13 rows)
Time: 4.226 ms
postgres=# select * from tmp_t_c1_1 tablesample system_rows(1000) where random()<0.01 ;
id | c1 | info | crt_time
---------+----+----------------------------------+----------------------------
3659031 | 1 | edb2c2ee58aa530fbe3ec8199c31914c | 2021-05-27 16:19:21.990993
3705687 | 1 | 47dd5dce35ffdf84792a5c64f573f5cb | 2021-05-27 16:19:22.086795
2578458 | 1 | bc97c41db6329221ae30fe604eb222a2 | 2021-05-27 16:19:19.382254
1518045 | 1 | 78f90fa17050b2eacd3222c9948606dd | 2021-05-27 16:19:16.727858
1553043 | 1 | 48d2363be6446e266d074c6e98a856d5 | 2021-05-27 16:19:16.796445
1587461 | 1 | 9329578a7b41b36b4919e4c25b9e1c31 | 2021-05-27 16:19:16.890558
410551 | 1 | 6184732511cfad0b05c75c1865a9e920 | 2021-05-27 16:19:13.974389
9438915 | 1 | cb0fc9af4da9259c3f32ec37180a6774 | 2021-05-27 16:19:36.463244
9501777 | 1 | b82cb22ba0e7620b21cb6e9219d6f425 | 2021-05-27 16:19:36.620395
7345643 | 1 | 9820565758529cfb752135cc5e523adf | 2021-05-27 16:19:31.304132
6273533 | 1 | a6cdf268caeeeeda3d766d2370e84655 | 2021-05-27 16:19:28.488774
6323667 | 1 | b000452c87b882ff00e4aff2abe34cc9 | 2021-05-27 16:19:28.613903
6346355 | 1 | 6810c56b96bdcaf2337ce23797b9d118 | 2021-05-27 16:19:28.669933
5240569 | 1 | 2e33eaf8ea4e12cf8e3bcae6022c48c3 | 2021-05-27 16:19:25.660275
5243275 | 1 | 6bcdd1dea42daf07302310c0edbfdaa7 | 2021-05-27 16:19:25.666971
4184755 | 1 | 8c04c876d1fe13b7754ceb8eb4782969 | 2021-05-27 16:19:23.177027
(16 rows)
Time: 0.732 ms
The problem is about good filterability for randomly obtaining N records. Modifying the kernel index structure may be more effective in solving the problem. All the data that meets the conditions must be scanned sequentially through the index linked list, and then the random issue is considered.
Details of PostgreSQL Bloat Point and Monitoring Metrics – Unrecyclable Garbage Tuples
digoal - December 14, 2018
digoal - August 11, 2022
digoal - March 25, 2020
digoal - March 25, 2020
ApsaraDB - August 7, 2023
ApsaraDB - May 7, 2021
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal