This topic describes the hll extension, which supports the HyperLogLog data type to estimate metrics such as page views (PVs) and unique visitors (UVs).
Prerequisites
The extension is supported on the PolarDB for PostgreSQL clusters that run the following engines:
PostgreSQL 14 (revision version 14.5.1.0 or later)
PostgreSQL 11 (revision version 1.1.28 or later)
You can execute the following statement to view the minor version that is used by PolarDB for PostgreSQL:
PostgreSQL 14:
SELECT version();
PostgreSQL 11:
SHOW polar_version;
Background information
The hll extension supports an extendable, set-resembled data type HyperLogLog (hll) to estimate DISTINCT elements under a specified accuracy. For example, you can use 1,280 bytes of hll data to accurately estimate billions of DISTINCT elements. The hll extension is suitable for industries that need estimation analysis, such as Internet advertisement analysis to estimate PVs and UVs.
Create the hll extension
CREATE EXTENSION hll;
Basic operations
Execute the following statement to create a table that contains hll fields:
CREATE TABLE agg (id INT PRIMARY KEY, userids hll);
Execute the following statement to convert INT data to hll_hashval data:
SELECT 1::hll_hashval;
The following result is returned:
hll_hashval ------------- 1 (1 row)
Basic operators
The hll data type supports the following operators:
=
!=
<>
||
#
Examples:
=
SELECT hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
The following result is returned:
?column? ---------- f (1 row)
||
SELECT hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
The following result is returned:
?column? ------------------------------------------ \x128b7f00000000000000010000000000000002 (1 row)
#
SELECT #hll_add_agg(1::hll_hashval);
The following result is returned:
?column? ---------- 1 (1 row)
The hll_hashval data type supports the following operators:
=
!=
<>
Examples:
=
SELECT 1::hll_hashval = 2::hll_hashval;
The following result is returned:
?column? ---------- f (1 row)
<>
SELECT 1::hll_hashval <> 2::hll_hashval;
The following result is returned:
?column? ---------- t (1 row)
Basic functions
The hll extension supports hash functions such as hll_hash_boolean, hll_hash_smallint, and hll_hash_bigint. Examples:
SELECT hll_hash_boolean(true);
The following result is returned:
hll_hash_boolean --------------------- 8849112093580131862 (1 row)
SELECT hll_hash_integer(1);
The following result is returned:
hll_hash_integer ---------------------- -8604791237420463362 (1 row)
The hll extension supports the hll_add_agg function to convert the data type from INT to hll. Example:
SELECT hll_add_agg(1::hll_hashval);
The following result is returned:
hll_add_agg -------------------------- \x128b7f0000000000000001 (1 row)
The hll extension supports the hll_union function to perform UNION operations on hll data. Example:
SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));
The following result is returned:
hll_union ------------------------------------------ \x128b7f00000000000000010000000000000002 (1 row)
The hll extension supports the hll_set_defaults function to set the accuracy. Example:
SELECT hll_set_defaults(15,5,-1,1);
The following result is returned:
hll_set_defaults ------------------ (11,5,-1,1) (1 row)
The hll extension supports the hll_print function to display debug information. Example:
SELECT hll_print(hll_add_agg(1::hll_hashval));
The following result is returned:
hll_print ----------------------------------------------------------------------------- EXPLICIT, 1 elements, nregs=32768, nbits=5, expthresh=-1(2560), sparseon=1:+ 0: 1 (1 row)
Delete the hll extension
DROP EXTENSION hll;
Examples
Prepare data.
CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll); INSERT INTO access_date SELECT current_date, hll_add_agg(hll_hash_integer(user_id)) FROM generate_series(1,10000) t(user_id); INSERT INTO access_date SELECT current_date-1, hll_add_agg(hll_hash_integer(user_id)) FROM generate_series(5000,20000) t(user_id); INSERT INTO access_date SELECT current_date-2, hll_add_agg(hll_hash_integer(user_id)) FROM generate_series(9000,40000) t(user_id);
Execute the following queries.
SELECT #userids FROM access_date WHERE acc_date=current_date;
Result:
?column? ------------------ 9725.852733707077 (1 row)
SELECT #userids FROM access_date WHERE acc_date=current_date-1;
Result:
?column? ------------------ 14968.65968832792 (1 row)
SELECT #userids FROM access_date WHERE acc_date=current_date-2;
Result:
?column? ------------------ 29361.520914991113 (1 row)