Multi-table joins and complex aggregate calculations can significantly tax computing resources and lead to extended query durations. AnalyticDB for MySQL materialized views can mitigate these challenges. By precomputing and storing user-defined queries, materialized views enable the direct retrieval of precomputed results when queried, thereby enhancing query response times. This topic primarily discusses the attributes of materialized views and the process for their creation.
What is a Materialized View
A materialized view is a specialized table within AnalyticDB for MySQL that contains the precomputed outcomes of queries, whether they are from a single table or multiple tables. By querying a materialized view, the database can immediately return these precomputed results, bypassing the need to access the underlying base tables, which greatly enhances the speed of query responses.
Only the XIHE compute engine can create materialized views. The Spark engine does not support creating materialized views.
Furthermore, materialized views can improve query performance even when users directly query the base tables. If a user's query matches the query in a materialized view either fully or partially, AnalyticDB for MySQL automatically redirects the query from the base tables to the materialized view, leveraging precomputed results for faster query execution.
However, it is important to note that a materialized view reflects the data status of the last refresh and may not represent the latest data in the base tables. Materialized views do not automatically refresh data with each query. You can set scheduled auto-refresh, auto-refresh upon base table overwrite, or manually refresh to ensure the timeliness and accuracy of the materialized view data.
The following table introduces the features supported by materialized views.
Refresh Policy | Base Table Type | Refresh Trigger Mechanism | Query Type | Query Rewrite |
Full Refresh |
|
| Any query | Supports query rewrite (V3.1.4) |
Incremental Refresh (V3.1.9.0) |
| Scheduled auto-refresh |
|
For more information about the limitations of materialized views and version information for each feature, see Limits and Feature Change Log.
Scenarios Suitable for Creating Materialized Views
Materialized views are ideal for accelerating queries that need to be executed periodically or repeatedly.
-
Multiple queries contain the same subquery
If multiple queries share an identical subquery that employs aggregate functions with significant computing resource demands, it's advisable to create a materialized view. This view precomputes and stores the subquery's results. When a query includes this subquery, AnalyticDB for MySQL's query rewrite feature will automatically utilize the materialized view. This results in accelerated query performance and less computing resource usage.
-
Periodic multi-table joins
If there is a need for periodic joins of multiple tables, you can create a materialized view for these table join operations. The materialized view will precompute and store the join results and periodically auto-refresh the data. When a query contains these table join operations, the results will be directly returned from the materialized view without re-executing the multi-table join, significantly improving query response times.
-
Data lake query acceleration
When users initiate queries on external tables, AnalyticDB for MySQL can automatically convert queries on external tables into queries on the materialized view, enhancing data lake analysis efficiency.
Prerequisites
The kernel version is 3.1.3.4 or later.
Create a Materialized View
You can create a materialized view using internal tables, external tables, existing materialized views, and views from AnalyticDB for MySQL.
Permission Requirements
The user creating the materialized view must have all of the following permissions:
CREATE permission is required for the database that contains the materialized view.
SELECT permission on all relevant columns (or entire tables) associated with the base tables of the materialized view is required.
To create a materialized view with auto-refresh, you also need the following two permissions:
-
You need permission to connect to
'%'
AnalyticDB for MySQL from any IP address. -
You must have INSERT permission for all tables in the database where the materialized view is located, or specifically for the materialized view itself, to ensure that the data can be refreshed.
-
Prepare Base Tables for Creating a Materialized View
The following uses the customer table and sales table as examples to guide you in quickly creating a materialized view.
CREATE TABLE customer (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(255),
is_vip Boolean
);
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
customer_id INT,
price DECIMAL(10, 2),
quantity INT,
sale_date TIMESTAMP
);
The following example of creating a materialized view does not specify a resource group. When no resource group is specified, the default resource group (Interactive resource group) is used to create and refresh the materialized view. If you want to understand the differences between using Interactive resource groups and Job resource groups to create and refresh materialized views, or if you want to use Job resource groups to create and refresh materialized views, see Use Elastic Resources to Create or Refresh Materialized Views.
Create a Fully Refreshed Materialized View
Create a fully refreshed materialized view named join_mv
based on the customer
and sales
tables, setting the refresh trigger mechanism to manual.
CREATE MATERIALIZED VIEW join_mv
REFRESH COMPLETE ON DEMAND
AS
SELECT
sale_id,
SUM(price * quantity) AS price
FROM customer
INNER JOIN (SELECT sale_id,customer_id,price,quantity FROM sales) sales ON customer.customer_id = sales.customer_id
GROUP BY customer.customer_id;
To refresh the data in the materialized view, manually execute the command REFRESH MATERIALIZED VIEW join_mv;
.
Create an Incrementally Refreshed Materialized View
The following will guide you in creating a multi-table associated incrementally refreshed materialized view.
Before creating an incrementally refreshed materialized view, complete the following preparations:
Ensure that the cluster's kernel version is 3.1.9.0 or higher.
Enable the cluster-level Binlog feature along with the Binlog function for the base tables.
SET ADB_CONFIG BINLOG_ENABLE=true; ALTER TABLE customer binlog=true; ALTER TABLE sales binlog=true;
Create a materialized view named sales_mv_incre
that is incrementally refreshed. This view is limited to query computations on the sales
table.
CREATE MATERIALIZED VIEW sales_mv_incre
REFRESH FAST NEXT now() + INTERVAL 3 minute
AS
SELECT
sale_id,
SUM(price * quantity) AS price
FROM sales
GROUP BY sale_id;
If the cluster's kernel version is 3.2.1.0 or higher, you can create a multi-table materialized view called join_mv_incre
that supports incremental refreshes and is set to automatically refresh every 3 minutes.
CREATE MATERIALIZED VIEW join_mv_incre
REFRESH FAST NEXT now() + INTERVAL 3 minute
AS
SELECT
customer.customer_id,
SUM(sales.price) AS price
FROM customer
INNER JOIN (SELECT customer_id,price FROM sales) sales ON customer.customer_id = sales.customer_id
GROUP BY customer.customer_id;
For more details on the syntax of materialized views and more examples, see CREATE MATERIALIZED VIEW.
View Materialized Views Being Created
You can retrieve a list of materialized views currently being created by executing SHOW PROCESSLIST WHERE info LIKE '%CREATE MATERIALIZED VIEW%';
.
Each row in the returned results corresponds to a materialized view in the process of being created. The user
field indicates the database account that is creating the materialized view, the status
field shows the creation status, and the Info
field contains the statement used to create the materialized view. For more information about the fields in the returned results, see SHOW PROCESSLIST.
If SHOW PROCESSLIST
returns no records, this signifies that the materialized view was created successfully, complete with its table structure and initial data loaded.
Use Elastic Resources to Create or Refresh Materialized Views
Elastic resources pertain to the computing resources within Job resource groups. The creation and refresh of materialized views utilize the computing resources from the cluster. If no resource group is specified (meaning MV_PROPERTIES='{"mv_resource_group":"job_rg_name"}'
is undefined), the creation and refresh of materialized views will tap into the reserved computing resources of the default resource group, known as the user_default Interactive resource group.
When creating a materialized view, if a Job resource group is specified, both the creation and subsequent refresh operations will utilize that Job resource group. Additionally, you can specify the resource allocation for the materialized view by using "mv_refresh_hints":{"elastic_job_max_acu":"value"}
. For more information on usage, see mv_properties.
The difference between using the default Interactive resource group and the Job resource group is that using the Job resource group does not require purchasing resources in advance, effectively avoiding resource waste and thus reducing costs. However, the refresh performance of materialized views is not as good as that of the Interactive resource group because each time a materialized view is refreshed, the Job resource group needs to temporarily pull up computing resources, which takes about seconds or minutes.
If the cluster meets the following conditions, you can use the elastic resources of the Job resource group to create and refresh materialized views.
The cluster's product edition can be .
Ensure the cluster's kernel version is 3.1.9.3 or later.
Take the customer
table as an example: you can utilize the elastic resources from the Job resource group my_job_rg to create and refresh a materialized view at a high priority.
CREATE MATERIALIZED VIEW job_mv
MV_PROPERTIES='{
"mv_resource_group":"my_job_rg",
"mv_refresh_hints":{"query_priority":"HIGH"}
}'
REFRESH COMPLETE ON DEMAND
START WITH now()
NEXT now() + INTERVAL 1 DAY
AS
SELECT * FROM customer;
Refresh Trigger Mechanism of Materialized Views
Materialized views reflect the data status of the last refresh and may not represent the latest data in the base tables. Materialized views do not automatically refresh data with each query. To ensure the timeliness and accuracy of materialized view data, you can set materialized views to refresh automatically at scheduled intervals, refresh automatically upon base table overwrite, or refresh manually.
When refreshing a materialized view, the refresh strategy determines whether to refresh full data or incremental data.
For more details on the differences between refresh trigger mechanisms and refresh strategies, application scenarios, and more, see Refresh Materialized Views.
Limits
You cannot execute
INSERT
,DELETE
, orUPDATE
operations on materialized views.You cannot delete or rename base tables or their columns when they are part of a materialized view. To make changes to the base tables, you must first delete the materialized views.
By default, an AnalyticDB for MySQL cluster supports a maximum number of materialized views as follows:
-
If the kernel version is 3.1.4.7 or higher, you can create up to 64 materialized views.
-
If the kernel version is lower than 3.1.4.7, you can create up to 8 materialized views.
NoteShould you reach the limit of materialized views, please contact technical support to request an increase.
-
The following are limitations for incrementally refreshed materialized views:
-
Clusters with kernel versions earlier than 3.2.3.0 cannot use partitioned tables as the base for incrementally refreshed materialized views.
-
Clusters with kernel versions earlier than 3.2.3.1 do not support the INSERT OVERWRITE and TRUNCATE operations on base tables of incrementally refreshed materialized views, and attempting these operations will result in an error.
-
Incremental refresh is limited to scheduled auto-refresh and does not support manual refresh. The minimum refresh interval for scheduled auto-refresh is 5 seconds, while the maximum is 5 minutes.
-
For incrementally refreshed materialized views, the query_body has certain restrictions:
-
To ensure complete consistency with the base tables and to accommodate any DML changes, not all query_body configurations can be incrementally refreshed. If the materialized view cannot support incremental refresh, an error will occur.
-
Nondeterministic expressions, such as
now()
andrand()
, cannot be used as conditions. -
Only the following aggregate functions are supported: COUNT, SUM, MAX, MIN, AVG, APPROX_DISTINCT, and COUNT(DISTINCT).
-
When using MAX, MIN, APPROX_DISTINCT, or COUNT(DISTINCT) aggregate functions in the query_body, the base tables of the incrementally refreshed materialized view are limited to INSERT operations only. Operations that may result in data deletion, such as DELETE, UPDATE, REPLACE, and INSERT ON DUPLICATE KEY UPDATE, are not allowed.
-
Aggregate functions, except for COUNT(DISTINCT), do not support the DISTINCT keyword.
-
COUNT(DISTINCT) is only compatible with the INTEGER data type.
-
The AVG function does not support the DECIMAL data type.
-
Aggregate operations do not support the HAVING keyword.
-
Window functions are not supported.
-
Sorting operations are not supported.
-
UNION, EXCEPT, INTERSECT, and other set operations are not supported.
-
-
Additional limitations for multi-table incrementally refreshed materialized views include the following:
-
Multi-table materialized views currently only support INNER JOIN operations.
-
By default, multi-table materialized views can include up to 5 tables. If you need to exceed this limit, you can to request an increase based on cluster specifications.
-
The fields used for joining tables in multi-table materialized views must be the original fields from those tables, and the data types of the join fields must match, with INDEX indexes.
-
-
The limitations for elastic materialized views, which utilize Job resource groups, include the following:
-
The cluster's product edition must be .
-
The cluster's kernel version must be 3.1.9.3 or higher.
-
Important Feature Change Log of Materialized Views
Version 3.1.4 and later support the enabling of query rewrite for materialized views.
Versions 3.1.4.7 and later:
-
Supports the OR REPLACE keyword, allowing for the deletion of an existing materialized view with the same name and the creation of a new one in its place.
-
The maximum number of materialized views has been increased from 8 to 64.
-
Version 3.1.9.0 and later support incremental refresh of single-table materialized views.
From version 3.1.9.3 onwards, the Enterprise Edition, Basic Edition, and Data Lakehouse Edition offer support for elastic materialized views, which allows for the creation and refresh of materialized views using Job resource groups.
Starting with version 3.2.0.0, support is provided for incrementally refreshing multi-table materialized views.
Starting with version 3.2.2.1, incrementally refreshed materialized views support functions such as MAX, MIN, AVG, APPROX_DISTINCT, and COUNT(DISTINCT) within the query_body.
Starting with version 3.2.3.0, incrementally refreshed materialized views can utilize partitioned tables as base tables.
Starting with version 3.2.3.1, incrementally refreshed materialized views support INSERT OVERWRITE and TRUNCATE operations on base tables.
FAQ
How to Retain Only the Data of the Last Year in a Materialized View?
When defining a materialized view, use a date column as the partition key (PARTITION BY) and set the lifecycle (LIFECYCLE) to retain only the data of the last year.
Assuming that the sales table generates new sales data every day, you can use the sales date (sales_date) as the partition key for the materialized view and set the lifecycle to 365 (i.e., retain up to 365 partitions) to retain only the data of the last year.
CREATE MATERIALIZED VIEW sales_mv_lifecycle
PARTITION BY VALUE(DATE_FORMAT(sale_date, '%Y%m%d')) LIFECYCLE 365
REFRESH FAST NEXT now() + INTERVAL 100 second
AS
SELECT
sale_date,
SUM(price * quantity) AS price
FROM sales
GROUP BY sale_date;
Troubleshoot Common Errors
Query Execution Error: Query Exceeded Reserved Memory Limit
Cause of Error: The query occupies more memory on a single node than allowed.
Solution: SQL queries with high memory usage typically contain aggregation operators, TopN operators, window operators, and join operators. You can use the SQL Diagnosis feature to query the stages and operators with high memory usage and then perform SQL tuning on the operators with high memory usage. For details, see SQL Diagnosis and Memory Metrics and Analyze Queries Using Stage and Task Details.
References
CREATE MATERIALIZED VIEW: Discover the syntax specifics for creating materialized views.
Refresh Materialized Views: Discover refresh strategies, trigger mechanisms, and methods for manually refreshing materialized views.
Manage Materialized Views: This feature allows you to query the definitions and refresh records of materialized views, retrieve a list of all materialized views, and delete materialized views.
Query Materialized Views: This function allows you to query materialized views.