AnalyticDB for PostgreSQL provides real-time materialized views. Compared with standard materialized views, real-time materialized views can automatically refresh when data changes without the need to execute REFRESH statements.
You can create a real-time materialized view based on a base table, and then create another real-time materialized view based on the previously created real-time materialized view. When the base table changes, the two real-time materialized views can be automatically updated. This helps build a real-time extract-transform-load (ETL) processing chain for data analysis.
Real-time materialized views of AnalyticDB for PostgreSQL support only statement-level automatic refresh. When an update statement, such as INSERT, COPY, UPDATE, or DELETE, is executed on a base table, real-time materialized views that are created based on the base table are updated in real time to ensure strong data consistency.
For more information about standard materialized views, see Manage materialized views.
Statement-level refresh
When a statement is executed on a base table, real-time materialized views that are created based on the base table are updated in real time to ensure data consistency. When an update statement of the base table returns a result, the updates of the corresponding materialized views are complete. Real-time materialized views are updated based on the following logic:
The database engine updates base tables, and then updates the corresponding materialized views. If base tables fail to be updated, no data changes are made to their materialized views.
If materialized views fail to be updated, no data changes are made to their base tables and an error is returned for the executed statement.
For an explicit transaction that starts with the BEGIN TRANSACTION statement and ends with the COMMIT statement, after base tables are updated, data changes are also made to their materialized views in the transaction.
If AnalyticDB for PostgreSQL uses the default READ COMMITTED isolation level, the updates of materialized views are invisible to other transactions before the transaction is committed.
If the transaction is rolled back, base tables and their materialized views are also rolled back.
Limits
AnalyticDB for PostgreSQL imposes the following limits on the query statements that are used to create real-time materialized views:
If a query statement contains JOIN operations, INNER JOIN and OUTER JOIN can be used to join tables.
Query statements can contain most filtering and projection operations.
If a query statement contains an OUTER JOIN operation, only the AND operator can be used to join tables. OR and NOT are not supported. In addition, the query statement cannot contain non-equivalent conditions or include columns from the same table in equivalent conditions.
Only the following aggregate operations are supported in a query statement: COUNT, SUM, AVG, MAX, and MIN. The HAVING clause is not supported.
Only simple statements, FROM clauses, and UNION ALL statements are supported. Complex statements such as those that contain subqueries and common table expressions (CTEs) are not supported.
After you create a real-time materialized view based on a base table, DDL statements that you execute on the base table are subject to the following limits:
When you execute the TRUNCATE statement on the base table, the real-time materialized view is not synchronously updated. You must manually refresh the real-time materialized view or create another materialized view.
You must specify the CASCADE option to execute the DROP TABLE statement on the base table.
ALTER TABLE statements on the base table cannot be used to delete or modify fields that are referenced by the materialized view.
Scenarios
We recommend that you use real-time materialized views in scenarios that have the following characteristics:
Real-time materialized views can be nested to build a real-time ETL processing chain. You can create a real-time materialized view based on a base table, and then create another real-time materialized view based on the previously created real-time materialized view. This way, ETL processing results are produced in real time to accelerate data query and analysis.
Real-time materialized views can be used to significantly accelerate data query in various scenarios such as when the query result contains only a small number of rows or columns in the base table or large amounts of computations are required to obtain the query result. Sample scenarios:
Filter conditions effectively narrow down the query result.
Aggregation functions result in a significant concentration or grouping of values.
Semi-structured data is analyzed.
Aggregate operations require a long time to complete.
The base table contains a large amount of full data, and the amount of incremental data updates is much smaller than the amount of full data.
Real-time materialized views can be used in all scenarios for which materialized views are suitable. Unlike standard materialized views, data of real-time materialized views is highly consistent with the data in the corresponding base tables. When a base table changes, its real-time materialized views also update immediately with minimal performance cost in an incremental manner. However, when you use standard materialized views, you must fully update the materialized views to reflect the changes to the base tables. As a result, real-time materialized views perform far better than standard materialized views in scenarios that involve significant data changes or streaming updates.
Disadvantages of real-time materialized views
Real-time materialized views are similar to indexes that are maintained in real time. Real-time materialized views can significantly optimize query performance but may reduce write performance. The following elements can affect the write performance of real-time materialized views:
The complexity of the statement that is used to create real-time materialized views and the number of nested layers. If a single-layer real-time materialized view involves a single table and simple JOIN operations, tens to hundreds of thousands of rows can be written per second based on the instance configurations. If a nested real-time materialized view involves complex JOIN operations, data writes consume large amounts of instance computing resources and degrade performance proportionally.
JOIN operations. Real-time materialized views that involve JOIN operations may result in write amplification. For example, a fact table that contains one billion rows of data is joined with a real-time materialized view of a dimension table that contains 10,000 rows of data. In most cases, the fact table achieves high write performance. However, the dimension table degrades write performance proportionally due to write amplification that is caused by incremental computation.
Data writes in batch mode. Batch data writes help reduce the maintenance overheads of real-time materialized views. When you execute COPY or INSERT statements, we recommend that you increase the number of batch processed rows within a single statement. This significantly reduces the maintenance overheads of real-time materialized views.
The amount of resources in AnalyticDB for PostgreSQL instances. Incremental computation and writes of real-time materialized views consume instance resources. The amount of consumed instance resources affects the write performance of real-time materialized views. If the write performance does not meet your business requirements, you can increase computing resources to improve write performance.
Create or delete a real-time materialized view
Execute the following
CREATE INCREMENTAL MATERIALIZED VIEW
statement to create a real-time materialized view namedmv
:CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM base WHERE id > 40;
Execute the following
DROP MATERIALIZED VIEW
statement to delete themv
materialized view:DROP MATERIALIZED VIEW mv;
Examples
Execute the following statement to create a base table:
CREATE TABLE test (a int, b int) DISTRIBUTED BY (a);
Execute the following statement to create a real-time materialized view:
CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM TEST WHERE b > 40 DISTRIBUTED BY (a);
Execute the following statement to insert data to the base table:
INSERT INTO test VALUES (1, 30), (2, 40), (3, 50), (4, 60);
Execute the following statement to view data in the base table:
SELECT * FROM test;
Sample result:
a | b ---+---- 1 | 30 2 | 40 3 | 50 4 | 60 (4 rows)
Execute the following statement to view data in the materialized view:
SELECT * FROM mv;
The materialized view is updated. Sample result:
a | b ---+---- 3 | 50 4 | 60 (2 rows)