All Products
Search
Document Center

AnalyticDB:Real-time materialized views

Last Updated:Dec 02, 2024

AnalyticDB for PostgreSQL provides real-time materialized views. Compared with standard materialized views, real-time materialized views can automatically refresh when data changes without the need to execute REFRESH statements.

Real-time materialized views in AnalyticDB for PostgreSQL support incremental updates. When the base table of a real-time materialized view changes, the real-time materialized view is accordingly updated in an incremental manner. You can use real-time materialized views to perform data processing similar to real-time extract, transform, load (ETL) jobs. You can also create another real-time materialized view based on the previously created real-time materialized view. When the base table changes, the two real-time materialized views can be automatically updated. This way, you can build real-time ETL jobs to perform data analysis.

Real-time materialized views in AnalyticDB for PostgreSQL support two modes: synchronous mode and asynchronous mode.

  • In synchronous mode, real-time materialized views are automatically updated based on SQL statements. When an SQL statement, such as INSERT, COPY, UPDATE, and DELETE, is executed on a base table, the real-time materialized view created based on the base table is updated in real time to ensure strong data consistency. If the update of the real-time materialized view is not complete, a write transaction cannot be committed.

  • In asynchronous mode, after you perform write operations on a base table, write transactions are committed. Then, the database system automatically schedules incremental updates of the real-time materialized view created based on the base table in the background. If the system load is within the expected range, data updates of real-time materialized views can be complete within seconds.

For more information about standard materialized views, see Manage materialized views.

Synchronous mode

In synchronous mode, real-time materialized views in AnalyticDB for PostgreSQL implement data consistency based on SQL statements. When an SQL statement is executed on a base table, the real-time materialized view created based on the base table is updated in real time. The following section describes the update logic:

  • The database system performs an update on the base table, and then on the materialized view. If the base table fails to be updated, no data is changed in the materialized view.

  • If the materialized view fails to be updated, the base table also fails to be updated. No data is changed in the base table, and an error message is returned for the update statement.

If you use the BEGIN and COMMIT statements to commit an explicit transaction, data updates of the materialized view are also included in the transaction after the update statement is executed on the base table.

  • If AnalyticDB for PostgreSQL uses the READ COMMITTED isolation level, data updates of the materialized view are invisible to other transactions before the transaction is committed.

  • If the transaction is rolled back, the base table and the materialized view are also rolled back.

Supported versions

By default, AnalyticDB for PostgreSQL instances of the following versions use real-time materialized views in synchronous mode:

  • AnalyticDB for PostgreSQL V7.0 instances earlier than V7.0.6.9.

  • AnalyticDB for PostgreSQL V6.0 instances earlier than V6.6.2.5.

To change the default mode of a real-time materialized view, submit a ticket.

Asynchronous mode

In asynchronous mode, after you perform write operations on a base table, write transactions are committed but data of the real-time materialized view is not immediately updated. After the transactions are committed, the database system automatically schedules incremental updates of the real-time materialized view in the background and processes concurrent write transactions. When the real-time materialized view is updated, multiple update operations on the base table may be performed at a time. The database system batch processes all concurrent write operations on the base table to ensure ultimate data consistency. This way, even if concurrent write operations are performed in asynchronous mode, the database system ensures that the data in the real-time materialized view is eventually consistent with the data in the base table. If the system load is within the expected range, data updates of real-time materialized views can be complete within seconds. Therefore, the asynchronous mode can meet requirements of most real-time data warehouse scenarios.

Supported versions

By default, AnalyticDB for PostgreSQL instances of the following versions use new real-time materialized views in asynchronous mode:

  • AnalyticDB for PostgreSQL V7.0 instances of V7.0.6.9 or later.

  • AnalyticDB for PostgreSQL V6.0 instances of V6.6.2.5 or later.

To change the default mode of a real-time materialized view, submit a ticket.

Limits

AnalyticDB for PostgreSQL imposes the following limits on the query statements that are used to create real-time materialized views:

  • Query statements can contain most filtering operations, projection operations, built-in functions of PostgreSQL, and user-defined functions (UDFs).

  • Query statements can contain most aggregate functions and window functions.

  • If a query statement contains JOIN operations, you can use INNER JOIN and OUTER JOIN to join tables.

  • If a query statement contains an OUTER JOIN operation, you can use only the AND operator to join tables. You cannot use the OR operator. In addition, the query statement cannot include columns from the same table in equivalent conditions.

  • Only simple statements, FROM clauses, and UNION ALL clauses are supported. Other clauses and common table expressions (CTEs) are not supported.

After you create a real-time materialized view based on a base table, the following limits are imposed on the DDL statements that you execute on the base table:

  • When you execute the TRUNCATE statement on the base table, the real-time materialized view is not synchronously updated. You must manually refresh the real-time materialized view or create another real-time materialized view.

  • You can execute the DROP TABLE statement on the base table only by specifying the CASCADE option.

  • When you execute the ALTER TABLE statement on the base table, the fields of the base table referenced in the materialized view cannot be deleted or modified.

Scenarios

We recommend that you use real-time materialized views in scenarios that have the following characteristics:

  • You can use the incremental update and nested cascade capabilities of real-time materialized views to build real-time ETL jobs without the need for an additional scheduling system to process dependencies. You can create a real-time materialized view based on a base table, and then create another real-time materialized view based on the previously created real-time materialized view. This way, ETL processing results, such as real-time wide tables and aggregations, are produced in real time to accelerate data query and analysis.

  • You can use real-time materialized views to significantly accelerate data query in various scenarios, such as when the query result contains only a small number of rows or columns in the base table or large amounts of computations are required to obtain the query result. Sample scenarios:

    • Filter conditions effectively narrow down the query result.

    • Aggregation functions result in a significant concentration or grouping of values.

    • Semi-structured data is analyzed.

    • Aggregate operations require a long time to complete.

  • The base table contains a large amount of full data, and the amount of incremental data updates is much smaller than the amount of full data.

You can use real-time materialized views in all scenarios for which standard materialized views are suitable. Unlike standard materialized views, the data of real-time materialized views is highly consistent with the data in the corresponding base tables. When a base table changes, the real-time materialized view incrementally updates with low performance cost. However, each time a base table changes, you must manually perform a full update on the standard materialized view with high performance cost to ensure data consistency. Real-time materialized views perform far better than standard materialized views in scenarios that involve significant data changes or streaming updates.

Disadvantages of real-time materialized views

Real-time materialized views are similar to indexes that are maintained in real time. Real-time materialized views can significantly optimize query performance but may reduce write performance. The following elements can affect the write performance of real-time materialized views:

  • The complexity of the query statement that is used to create real-time materialized views and the number of nested layers. If a single-layer real-time materialized view involves a single table or a simple JOIN operation, tens to hundreds of thousands of rows can be written per second based on the instance configurations. If a nested real-time materialized view involves complex JOIN operations, data writes consume large amounts of instance computing resources and proportionally degrade performance.

  • JOIN operations. Real-time materialized views that involve JOIN operations may result in write amplification. For example, a real-time materialized view involves a JOIN operation between a fact table that contains one billion rows of data and a dimension table that contains 10,000 rows of data. In most cases, the fact table achieves high write performance. However, the dimension table proportionally degrades write performance due to write amplification that is caused by incremental computation.

  • The amount of resources in AnalyticDB for PostgreSQL instances. Incremental computation and write operations of real-time materialized views consume instance resources. The amount of instance resources affects the write performance of real-time materialized views. If the write performance does not meet your business requirements, you can increase computing resources to improve the performance.

Create or delete a real-time materialized view

  • Execute the following CREATE INCREMENTAL MATERIALIZED VIEW statement to create a real-time materialized view named mv:

    CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM base WHERE id > 40;
  • Execute the following DROP MATERIALIZED VIEW statement to delete the mv materialized view:

    DROP MATERIALIZED VIEW mv;

Examples

  1. Create a base table.

    CREATE TABLE test (a int, b int) DISTRIBUTED BY (a);
  2. Create a real-time materialized view.

    CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM TEST WHERE b > 40 DISTRIBUTED BY (a);
  3. Insert data into the base table.

    INSERT INTO test VALUES (1, 30), (2, 40), (3, 50), (4, 60);
  4. View data in the base table.

    SELECT * FROM test;

    Sample result:

     a | b
    ---+----
     1 | 30
     2 | 40
     3 | 50
     4 | 60
    (4 rows)
  5. View data in the materialized view.

    SELECT * FROM mv;

    The materialized view is updated. Sample result:

     a | b
    ---+----
     3 | 50
     4 | 60
    (2 rows)