Real-time materialized views pre-aggregate the data of a base table and store the aggregated data. This way, the amount of computing workload is reduced, and the query performance is significantly improved by querying the materialized views. This topic describes how to use materialized views in Hologres.
Background information
You do not need to manually refresh materialized views in Hologres. When data is written to the base table, the data is aggregated to materialized views in real time and can immediately be queried by using the materialized views.
The table that is written in real time is called a base table. Users perform INSERT, UPDATE, and DELETE operations on the base table. Materialized views are defined based on aggregation rules. When the base table is modified, the changes are synchronized to the materialized views in real time. Only INSERT operation-related changes can be synchronized.
Limits
To use materialized views, do not perform DELETE or UPDATE operations on a base table. You need to configure the
appendonly
property for the base table. This way, if you perform DELETE or UPDATE operations on the base table, the error messageTable XXX is append-only
is returned. If you use Flink to write data to the base table in real time, you must set themutateType
parameter to InsertOrIgnore.You cannot create materialized views for an existing table. You need to define materialized views for a table when you create the table.
You can create materialized views only for a single table. Materialized views do not support common table expressions (CTEs), multi-table JOIN operations, subqueries, or WHERE, ORDER BY, LIMIT, and HAVING clauses.
The GROUP BY key and value of a materialized view cannot be expressions. For example,
SUM(CASE WHEN COND THEN A ELSE B END)
,SUM(col1 + col2)
, andGROUP BY date_trunc('hour', ts)
cannot be used as the GROUP BY key or value of a materialized view.You can create a maximum of 10 materialized views for each base table. More materialized views consume more resources.
If you create a materialized view for a partitioned table, the GROUP BY key of the materialized view must contain the partition key column of the partitioned table. You can create a materialized view only for a parent partitioned table, not child partitioned tables.
If you create a materialized view for a partitioned table, you cannot execute the
ATTACH PARTITION
statement to attach a partition to the parent partitioned table. However, you can execute theCREATE TABLE PARTITION OF
statement to create a partition.After you create materialized views for a base table, you cannot execute the
DROP COLUMN
statement to delete columns from the base table.The underlying data of a materialized view has the same time to live (TTL) as the base table for which the materialized view is created. Do not manually configure a TTL for the materialized view. Otherwise, the data in the materialized view may become inconsistent with that in the base table.
Supported aggregate functions
Materialized views support the following aggregate functions:
SUM
COUNT
AVG
MIN
MAX
RB_BUILD_CARDINALITY_AGG: Only the BIGINT data type is supported. In addition, you must create the roaring bitmap extension.
Sample SQL statements
Create a materialized view.
BEGIN; CREATE TABLE base_sales( day text not null, hour int , ts timestamptz, amount float, pk text not null primary key ); CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'appendonly'); -- After the materialized view is dropped, you can execute the following statement to cancel the appendonly property for the base table: --CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'none'); CREATE MATERIALIZED VIEW mv_sales AS SELECT day, hour, avg(amount) AS amount_avg FROM base_sales GROUP BY day, hour; COMMIT; insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),100,'pk1'); insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),200,'pk2'); insert into base_sales values(to_char(now(),'YYYYMMDD'),'12',now(),300,'pk3');
Create a materialized view for a partitioned table.
BEGIN; CREATE TABLE base_sales_p( day text not null, hour int, ts timestamptz, amount float, pk text not null, primary key (day, pk) ) partition by list(day); CALL SET_TABLE_PROPERTY('base_sales_p', 'mutate_type', 'appendonly'); -- day is a partition key column. It must be included in the condition of the GROUP BY clause of the materialized view. CREATE MATERIALIZED VIEW mv_sales_p AS SELECT day, hour, avg(amount) AS amount_avg FROM base_sales_p GROUP BY day, hour; COMMIT; create table base_sales_20220101 partition of base_sales_p for values in('20220101');
Query a materialized view.
SELECT * FROM mv_sales WHERE day = to_char(now(),'YYYYMMDD') AND hour = 12;
Drop a materialized view.
DROP MATERIALIZED VIEW mv_sales;
Query the storage space occupied by a materialized view.
select pg_relation_size('mv_sales');
Query the storage space occupied by all materialized views.
SELECT schemaname || '.' || matviewname AS mv_full_name, pg_size_pretty(pg_relation_size('"' || schemaname || '"."' || matviewname || '"')) AS mv_size, pg_relation_size('"' || schemaname || '"."' || matviewname || '"') AS order_size FROM pg_matviews ORDER BY order_size DESC;
Improve the performance of precise UV computing by using materialized views
Precise unique visitor (UV) computing is an operator with high computational complexity and is usually a system performance bottleneck. Hologres supports the RB_BUILD_CARDINALITY_AGG
aggregate function. By using the roaring bitmap data structure, Hologres can pre-aggregate data of the BIGINT type into materialized views. The BIGINT type is usually used by business ID fields. This implements real-time deduplication of UV statistics. You can create materialized views by executing the following statements. Only fields of the BIGINT type are supported for aggregation and deduplication.
-- UV computing depends on the roaring bitmap data type. You must create the roaring bitmap extension in advance.
CREATE EXTENSION if not exists roaringbitmap;
BEGIN;
CREATE TABLE base_sales_r(
day text not null,
hour int ,
ts timestamptz,
amount float,
userid bigint,
pk text not null primary key
);
CALL SET_TABLE_PROPERTY('base_sales_r', 'mutate_type', 'appendonly');
CREATE MATERIALIZED VIEW mv_sales_r AS
SELECT
day,
hour,
avg(amount) AS amount_avg,
rb_build_cardinality_agg(userid) as user_count
FROM base_sales_r
GROUP BY day, hour;
COMMIT;
insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),100,1,'pk1');
insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),200,2,'pk2');
insert into base_sales_r values(to_char(now(),'YYYYMMDD'),'12',now(),300,3,'pk3');
select user_count as UV from mv_sales_r where day = to_char(now(),'YYYYMMDD') AND hour = 12;
The RB_BUILD_CARDINALITY_AGG
aggregate function is used to compute the UV value. In the mv_sales_r
view, the user_count
field stores the UV value for the userid
field. You can query the user_count
field to obtain the UV value.
Perform multi-dimensional aggregate queries by using materialized views
In this example, the mv_sales
materialized view is used. The following table describes the data that the base_sales
base table contains.
Day | Hour | Amount | PK |
20210101 | 12 | 2 | pk1 |
20210101 | 12 | 4 | pk2 |
20210101 | 13 | 6 | pk3 |
The following result is returned if you directly query the sales_mv
materialized view:
postgres=> select * from mv_sales;
day | hour | amount_avg
-----------+---------+--------------
20210101 | 12 | 3
20210101 | 13 | 6
If you query the materialized view from an aggregation dimension, incorrect results are returned. For example, if you use the AVG function to aggregate data in the materialized view by day, an incorrect result is returned. This is because the average of average values does not equal the average of original values.
postgres=> select day, avg(amount_avg) from mv_sales group by day;
day | avg
-----------+--------
20210101 | 4.5
In this case, you can create another materialized view that is aggregated by day. However, this will cause the number of materialized views to expand. Hologres allows you to use only one materialized view to implement aggregate queries in different dimensions based on the aggregation result in the intermediate state. In the following example, the AVG aggregate function is used. Execute the following modified DDL statements to create the materialized view:
BEGIN;
CREATE TABLE base_sales(
day text not null,
hour int ,
ts timestamptz,
amount float,
pk text not null primary key
);
CALL SET_TABLE_PROPERTY('base_sales', 'mutate_type', 'appendonly');
CREATE MATERIALIZED VIEW mv_sales_partial AS
SELECT
day,
hour,
avg(amount) as avg,
avg_partial(amount) AS amt_avg_partial
FROM base_sales
GROUP BY day, hour;
COMMIT;
As shown in the statements, the AVG aggregate function is replaced with the AVG_PARTIAL aggregate function. The amount_avg_partial column stores the aggregation result in the intermediate state. You can use the AVG_FINAL aggregate function to query the final aggregation of the aggregate result in the intermediate state.
postgres=> select day, avg(avg) as avg_avg, avg_final(amt_avg_partial) as real_avg from mv_sales_partial group by day;
day | avg_avg | real_avg
-----------+-----------+----------
20210101 | 4.5 | 4
The following table describes the aggregate functions and their partial and final aggregate functions that are supported.
Aggregate function | Partial aggregate function | Final aggregate function |
AVG | AVG_PARTIAL | AVG_FINAL |
RB_BUILD_CARDINALITY_AGG | RB_BUILD_AGG | RB_OR_CARDINALITY_AGG |
About TTL
If a TTL is set for a base table and a materialized view is created, Hologres cannot ensure the consistency between the query results if you query the base table and the materialized view for data that is about to expire. The results of querying data that is about to expire from materialized views are undefined. In the following example, the base_sales_table
base table and the sales_mv
materialized view are used.
A TTL is configured for the base_sales_table
base table, and some data has expired. If you query the base table, the following query result is returned:
postgres=> SELECT
day,
hour,
avg(amount) AS amount_avg
FROM base_sales
GROUP BY day, hour;
-- Query result
day | hour | amount_avg
-----------+---------+--------------
20210101 | 12 | 4
20210101 | 13 | 6
However, the expired data has been materialized into the materialized view. The following query result may be returned if you query the materialized view:
postgres=> select * from mv_sales;
-- Query result
day | hour | amount_avg
-----------+---------+--------------
20210101 | 12 | 3
20210101 | 13 | 6
In this case, the query results are inconsistent. We recommend that you make the following improvements:
Do not set a TTL for the base table.
If a TTL is set for the base table and a time field is contained in the GROUP BY key of a materialized view, do not query the materialized view for data that is about to expire.
Create the base table as a partitioned table. Do not set a TTL for the base table. Delete expired data by deleting child partitioned tables.
Best practices for using real-time materialized views
When you create a table, we recommend that you set the GROUP BY key of a materialized view to the distribution key of the base table. This increases the data compression ratio and improves query performance.
We recommend that you put the filter conditions that are commonly used in queries of materialized views in the front of the GROUP BY key based on the leftmost matching principle of the clustering key.
Intelligent routing for materialized views
You do not need to explicitly specify the materialized view to query. You can perform queries based on the base table as before. If matching materialized views exist, the optimizer intelligently routes queries to the most appropriate materialized view to accelerate the queries. Hologres selects a materialized view based on the following rules:
Hologres selects the materialized view that contains all queried columns or the columns that can be used to calculate the values of the queried columns.
Hologres selects the materialized view whose GROUP BY key contains all the columns specified in the GROUP BY key of the original query.
If multiple materialized views meet the requirements, Hologres selects the materialized view that contains the fewest fields in the GROUP BY key.
The aggregate functions that support intelligent routing include SUM, COUNT, MIN, and MAX.