All Products
Search
Document Center

MaxCompute:Materialized view operations

Last Updated:Nov 08, 2024

A materialized view is a database object that stores the pre-calculation results of time-consuming queries, such as JOIN and AGGREGATE. You can directly reuse the pre-calculation results when you want to perform the same queries. This accelerates the speed of queries. This topic describes the statements and syntax that are related to materialized views and provides examples of using materialized views.

Background information

A view is a stored query accessible as a virtual table. Each time you query a view, the query statement is converted into the SQL statement that is used to define the view. A materialized view is a special physical table that occupies storage resources to store real data. For more information about the billing rules of materialized views, see the Billing rules section in this topic.

Materialized views are suitable for the following queries:

  • Queries that are in a fixed mode and are frequently executed

  • Queries that involve time-consuming operations, such as JOIN or AGGREGATE

  • Queries that involve only a small portion of data in a table

The following table compares traditional queries and queries based on materialized views.

Item

Traditional query

Query based on a materialized view

Query statements

SQL statements are used to query data.

select empid, deptname  
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2018-01-01';

Data is queried based on the materialized view that you created.

Statement that is used to create a materialized view:

create materialized view mv
as select empid, deptname, hire_date 
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2016-01-01';

Statement that is used to query data based on the materialized view that you created:

select empid, deptname from mv 
where hire_date >= '2018-01-01';

If the query rewrite feature is enabled for the materialized view, data is directly obtained from the query result that is contained in the materialized view when you execute the first SQL statement in the following code:

select empid, deptname 
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2018-01-01';
-- The preceding statement is equivalent to the following statement: 
select empid, deptname from mv 
where hire_date >= '2018-01-01';

Query characteristics

Queries involve table reading, JOIN operations, and filter operations that are performed by using WHERE clauses. If the source table contains a large amount of data, the query speed is slow. The query operation is complex and the query efficiency is low.

Queries involve table reading and filter operations. JOIN operations are not involved. MaxCompute automatically matches the optimal materialized view and reads data from the optimal materialized view. This greatly improves query efficiency.

The following table describes the operations that you can perform on materialized views by executing statements.

Operation

Description

Authorized user

Platform

Create a materialized view that supports clustering or partitioning

Creates a materialized view based on a query statement.

Users who have the CREATE TABLE permission on a project

You can perform the operations by using the following platforms:

Update a materialized view

Updates an existing materialized view.

Users who have the Alter permission on tables

Change the lifecycle of an existing materialized view

Changes the lifecycle of an existing materialized view.

Users who have the Alter permission on tables

Enable or disable the lifecycle feature for an existing materialized view

Enables or disables the lifecycle feature for an existing materialized view.

Users who have the Alter permission on tables

Query information about a materialized view

Queries the basic information about a materialized view.

Users who have the Describe permission on the metadata of a table

Query the status of a materialized view

Queries the status of a materialized view.

Users who have the Describe permission on the metadata of a table

Drop an existing materialized view

Drops an existing materialized view.

Users who have the Drop permission on tables

Drop partitions from an existing materialized view

Drops partitions from an existing materialized view.

Users who have the Drop permission on tables

Perform a penetration query based on a materialized view

Queries data from the source partitioned table if the partition data that you want to query is not contained in the materialized view.

Users who have the Write and CreateTable permissions on a project

Perform a scheduled update on an existing materialized view

Performs a scheduled update on the data of an existing materialized view.

Users who have the Alter permission on tables

Limits

Before you use materialized views, take note of the following limits:

  • Window functions are not supported.

  • User-defined table-valued functions (UDTFs) are not supported.

  • By default, non-deterministic functions, such as user-defined scalar functions (UDFs) and user-defined aggregate functions (UDAFs), are not supported. If you must use non-deterministic functions, run the set odps.sql.materialized.view.support.nondeterministic.function=true; command at the session level.

Create a materialized view that supports clustering or partitioning

You can create a materialized view that supports clustering or partitioning based on the data for materialized view scenarios.

  • Limits

    • The name of the materialized view that you want to create cannot be the same as the name of an existing table, view, or materialized view. You can run the show tables; command to view the names of all tables and materialized views in a project.

    • You cannot create a materialized view based on an existing materialized view.

    • You cannot create a materialized view based on an external table.

  • Usage notes

    • If the query statement based on which you create a materialized view fails to be executed, you cannot create the materialized view.

    • Partition key columns in a materialized view must be derived from a source table. The sequence and number of the columns in the materialized view must be the same as the sequence and number of the columns in the source table. Column names can be different.

    • You must specify comments for all columns, including partition key columns. If you specify comments only for some columns, an error is returned.

    • You can specify both the partitioning and clustering attributes for a materialized view. In this case, the data in each partition has the specified clustering attribute.

    • If the query statement based on which you create a materialized view contains operators that are not supported by the materialized view, an error is returned. For more information about the operators that are supported by materialized views, see Perform a query rewrite operation based on a materialized view.

    • By default, MaxCompute does not allow you to create materialized views by using non-deterministic functions, such as UDFs or UDAFs. If you must use non-deterministic functions, run the set odps.sql.materialized.view.support.nondeterministic.function=true; command at the session level.

    • If the source table of a materialized view contains an empty partition, you can refresh the materialized view to generate an empty partition in the materialized view.

  • Syntax

    CREATE MATERIALIZED VIEW [IF NOT EXISTS][project_name.]<mv_name>
    [LIFECYCLE <days>]    -- The lifecycle of the materialized view.
    [BUILD DEFERRED]    -- Specifies that only the schema is created and no data is updated when you create the materialized view.
    [(<col_name> [COMMENT <col_comment>],...)]    -- The column comments.
    [DISABLE REWRITE] -- Specifies whether to disable the query rewrite operation that is performed based on the materialized view.
    [COMMENT 'table comment']    -- The materialized view comments.
    [PARTITIONED ON/BY (<col_name> [, <col_name>, ...])    -- The partitions in the materialized view. This parameter is required when you create a partitioned materialized view.
    [CLUSTERED BY|RANGE CLUSTERED BY (<col_name> [, <col_name>, ...])
         [SORTED BY (<col_name> [ASC | DESC] [, <col_name> [ASC | DESC] ...])]
                     INTO <number_of_buckets> BUCKETS]    -- The shuffle and sort attributes of the materialized view. This parameter is required when you create a clustered materialized view.
    [TBLPROPERTIES("compressionstrategy"="normal/high/extreme",    -- The compression policy for data storage of the materialized view.
                    "enable_auto_substitute"="true",    -- Specifies whether to query data from the source partitioned table if the partition data that you want to query is not contained in the materialized view.
                    "enable_auto_refresh"="true",    -- Specifies whether to enable the scheduled update feature.
                    "refresh_interval_minutes"="120",    -- The update interval.
                    "only_refresh_max_pt"="true"    -- Specifies whether to automatically update the latest partition data in the source table to the materialized view.
                    )]
    AS <select_statement>;
  • Parameters

    • if not exists: optional. If you do not specify if not exists and the materialized view that you want to create already exists, an error is returned.

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view that you want to create.

    • days: optional. The lifecycle of the materialized view that you want to create. Unit: days.

    • BUILD DEFERRED: optional. If this keyword is added, only the schema is created and the data is not updated when you create the materialized view.

    • col_name: optional. The name of a column in the materialized view that you want to create.

    • col_comment: optional. The comment on a column in the materialized view that you want to create.

    • DISABLE REWRITE: optional. This parameter specifies whether to disable the query rewrite operation by using the materialized view. If you do not configure this parameter, the query rewrite feature is enabled for the materialized view. You can run the alter materialized view [project_name.]<mv_name> disable rewrite; command to disable this feature for the materialized view. You can also run the alter materialized view [project_name.]<mv_name> enable rewrite; command to enable this feature for the materialized view.

    • partitioned on: optional. The partition key columns in the materialized view that you want to create. If you want to create a partitioned materialized view, you must configure this parameter.

    • clustered by|range clustered by: optional. The shuffle attribute of the materialized view that you want to create. If you want to create a clustered materialized view, you must specify the clustered by or range clustered by parameter.

    • sorted by: optional. The sort attribute of the materialized view that you want to create. If you want to create a clustered materialized view, you must configure this parameter.

    • number_of_buckets: optional. The number of buckets in the materialized view that you want to create. If you want to create a clustered materialized view, you must specify this parameter.

    • tblproperties: optional. compressionstrategy specifies the compression policy for data storage of the materialized view that you want to create. Valid values: normal, high, and extreme. enable_auto_substitute specifies whether to automatically query data from the source table if the materialized view does not contain the partition from which you want to query data. For more information, see Perform a penetration query based on a materialized view. enable_auto_refresh: optional. Set this parameter to true if you want the system to automatically update data. refresh_interval_minutes: conditionally optional. If enable_auto_refresh is set to true, you need to configure this parameter. Unit: minutes. only_refresh_max_pt: optional. This parameter is valid for partitioned materialized views. If this parameter is set to true, the latest partition data in the source table is updated to the materialized view.

    • select_statement: required. The SELECT statement. For more information about the syntax of the SELECT statement, see SELECT syntax.

  • Examples

    • Example 1: Create a materialized view. Sample statements:

      -- Create a table named mf_t.
      create table if not exists mf_t(
           id     bigint,
           value   bigint,
           name   string)
      partitioned by (ds STRING);
      alter table mf_t add partition(ds='1');
      -- Insert partition data into the created table.
      insert into mf_t partition (ds='1') values(1,10,'kyle'),(2,20,'xia') ;
      -- Query data from the partitioned table.
      select * from mf_t where ds ='1';
      +------------+------------+------------+------------+
      | id         | value      | name       | ds         |
      +------------+------------+------------+------------+
      | 1          | 10         | kyle       | 1          |
      | 2          | 20         | xia        | 1          |
      +------------+------------+------------+------------+
      
      -- Create a table named mf_t1.
      create table if not exists mf_t1(
           id     bigint,
           value   bigint,
           name   string)
      partitioned by (ds STRING);
      alter table mf_t1 add partition(ds='1');
      -- Insert partition data into the created table.
      insert into mf_t1 partition (ds='1') values(1,10,'kyle'),(3,20,'john') ;
      -- Query data from the partitioned table.
      select * from mf_t1 where ds ='1';
      +------------+------------+------------+------------+
      | id         | value      | name       | ds         |
      +------------+------------+------------+------------+
      | 1          | 10         | kyle       | 1          |
      | 3          | 20         | john       | 1          |
      +------------+------------+------------+------------+
      
      
      -- Sample 1: Create a materialized view that contains a partition key column named ds.
      create materialized view mf_mv lifecycle 7
      (
        key comment 'unique id',
        value comment 'input value',
        ds comment 'partitiion'
        )
      partitioned on (ds)
      as select t1.id as key, t1.value as value, t1.ds as ds
           from mf_t as t1 join mf_t1 as t2
             on t1.id = t2.id and t1.ds=t2.ds and t1.ds='1';
      -- Query data from the created materialized view.
      select * from mf_mv where ds =1;
      +------------+------------+------------+
      | key        | value      | ds         |
      +------------+------------+------------+
      | 1          | 10         | 1          |
      +------------+------------+------------+
      
      -- Sample 2: Create a non-partitioned materialized view that is clustered.
      create materialized view mf_mv2 lifecycle 7
      clustered by (key) sorted by (value) into 1024 buckets
      as  select t1.id as key, t1.value as value, t1.ds as ds
            from mf_t as t1 join mf_t1 as t2
              on t1.id = t2.id and t1.ds=t2.ds and t1.ds='1';
      
      -- Sample 3: Create a partitioned materialized view that is clustered.
      create materialized view mf_mv3 lifecycle 7
      partitioned on (ds)
      clustered by (key) sorted by (value) into 1024 buckets
      as  select t1.id as key, t1.value as value, t1.ds as ds
            from mf_t as t1 join mf_t1 as t2
              on t1.id = t2.id and t1.ds=t2.ds and t1.ds='1';
                                  
    • Example 2: Update a materialized view whose source table contains an empty partition. An empty partition is generated in the materialized view.

      create table mf_blank_pts(id bigint ,name string) partitioned by (ds bigint);
      alter table mf_blank_pts add partition (ds = 1);
      alter table mf_blank_pts add partition (ds = 2);
      insert into table mf_blank_pts partition(ds=1) values(1,"aba"),(2,"cbd");
      
      -- Initialize partition data.
      set odps.sql.materialized.view.initial.partition={"ds": 1};
      create materialized view if not exists mf_mv_blank_pts partitioned on (ds)
      as select id,name,ds from mf_blank_pts;
      
      alter materialized view mf_mv_blank_pts rebuild partition (ds>0);
      
      show partitions mf_mv_blank_pts;
      -- The partition ds=2 in the source table does not contain any data. After you execute an ALTER statement that contains the expression ds>0, an empty partition is generated in the materialized view.
      ds=1
      ds=2
      select * from mf_mv_blank_pts where ds>0;
      -- The returned result shows that only the ds=1 partition has data.
      +------------+------------+------------+
      | id         | name       | ds         |
      +------------+------------+------------+
      | 1          | aba        | 1          |
      | 2          | cbd        | 1          |
      +------------+------------+------------+

Update a materialized view

If you perform operations, such as insert, overwrite, update, and delete operations on the table or partition that corresponds to a materialized view, the materialized view becomes invalid and cannot be used for query rewrite operations. You can check the status of a materialized view. If the materialized view is invalid, you must update the materialized view. For more information about how to query the status of a materialized view, see Query the status of a materialized view.

  • Usage notes

    • Data in a materialized view or a partition in a materialized view is updated only when data in the source table or a partition in the source table changes.

    • You can trigger scheduled updates on a materialized view in the DataWorks console. For more information about how to trigger scheduling operations in the DataWorks console, see Scheduling configuration.

  • Syntax

    alter materialized view [<project_name>.]<mv_name>
          rebuild [partition(<ds>=max_pt(<table_name>),<expression1>...)];
  • Parameters

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view whose lifecycle you want to update.

    • ds: optional. The name of the partition field in the materialized view.

    • max_pt: optional. The maximum partition value of the table or materialized view that is specified by table_name.

    • expression: optional. The expression that is used to specify the partitions that you want to update. If you want to update a partitioned materialized view, you must configure this parameter.

  • Examples

    • Example 1: Update a non-partitioned materialized view. Sample statement:

      alter materialized view count_mv rebuild;
    • Example 2: Update a partition of a partitioned materialized view. Sample statement:

      alter materialized view mv rebuild partition (ds='20210101');
    • Example 3: Update the partitions of a partitioned materialized view. The partitions meet the specified conditions. Sample statement:

      alter materialized view mv rebuild partition(ds>='20210101', ds<='20210105');
    • Example 4: Update the latest partition data to a partitioned materialized view. Sample statements:

      -- Create a partitioned table.
      create table if not exists sale_detail_jt
      (shop_name STRING ,
      customer_id STRING ,
      total_price DOUBLE )
      PARTITIONED BY (sale_date STRING ,region STRING );
      
      alter TABLE  sale_detail_jt ADD PARTITION (sale_date='2013',region='china');
      
      INSERT INTO  sale_detail_jt PARTITION (sale_date='2013',region='china') VALUES 
          ('s1','c1',100.1),
          ('s2','c2',100.2),
          ('s3','c3',100.3);
      
      alter TABLE  sale_detail_jt ADD PARTITION (sale_date='2013',region='en');
      INSERT INTO  sale_detail_jt PARTITION (sale_date='2013',region='en') VALUES 
          ('t1','c5',200.0),
          ('t2','c6',300.0);
          
      -- View partition data.
      select * from sale_detail_jt where sale_date='2013' and region='china';
      +-----------+-------------+-------------+-----------+--------+
      | shop_name | customer_id | total_price | sale_date | region |
      +-----------+-------------+-------------+-----------+--------+
      | s1        | c1          | 100.1       | 2013      | china  |
      | s2        | c2          | 100.2       | 2013      | china  |
      | s5        | c2          | 100.2       | 2013      | china  |
      +-----------+-------------+-------------+-----------+--------+
      -- View partition data.
      select * from sale_detail_jt where sale_date='2013' and region='en';
      +-----------+-------------+-------------+-----------+--------+
      | shop_name | customer_id | total_price | sale_date | region |
      +-----------+-------------+-------------+-----------+--------+
      | t1        | c5          | 200.0       | 2013      | en     |
      | t2        | c6          | 300.0       | 2013      | en     |
      +-----------+-------------+-------------+-----------+--------+
      
      -- Create a materialized view.
      CREATE MATERIALIZED VIEW mv_deferred
      BUILD DEFERRED
      AS select * from sale_detail_jt;
      
      -- Query data from the mv_deferred materialized view.
      select * from mv_deferred;
      -- The following result is returned:
      +-----------+-------------+-------------+-----------+--------+
      | shop_name | customer_id | total_price | sale_date | region |
      +-----------+-------------+-------------+-----------+--------+
      +-----------+-------------+-------------+-----------+--------+
      
      -- Create a partitioned table.
      create table mf_part (id bigint,name string) partitioned by (dt string);
      -- Insert data.
      insert into mf_part partition(dt='2013') values(1,'name1'),(2,'name2');
      -- Query data from the partitioned table.
      select * from mf_part where dt='2013';
      -- The following result is returned:
      +------------+------+----+
      | id         | name | dt |
      +------------+------+----+
      | 1          | name1 | 2013 |
      | 2          | name2 | 2013 |
      +------------+------+----+
      -- Create a partitioned materialized view.
      create materialized view mv_rebuild
      BUILD DEFERRED
      partitioned on (dt) as
      select * from mf_part;
      
      -- Query data from the materialized view.
      select * from mv_rebuild where dt='2013';
      -- The following result is returned:
      +------------+------+----+
      | id         | name | dt |
      +------------+------+----+
      +------------+------+----+
      -- Update the latest partition data to the materialized view.
      alter materialized view mv_rebuild
            rebuild partition(dt=max_pt('mf_part'));
      
      -- Query data from the materialized view.
      select * from mv_rebuild where dt='2013';
      -- The following result is returned:
      +------------+------+----+
      | id         | name | dt |
      +------------+------+----+
      | 1          | name1 | 2013 |
      | 2          | name2 | 2013 |
      +------------+------+----+

Change the lifecycle of an existing materialized view

You can change the lifecycle of an existing materialized view.

  • Syntax

    alter materialized view [<project_name>.]<mv_name> set lifecycle <days>;
  • Parameters

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view whose lifecycle you want to update.

    • days: required. The new lifecycle of the materialized view. Unit: days.

  • Examples

    -- Change the lifecycle of an existing materialized view to 10 days. 
    alter materialized view mv set lifecycle 10;

Enable or disable the lifecycle feature for an existing materialized view

You can enable or disable the lifecycle feature for an existing materialized view.

  • Syntax

    alter materialized view [<project_name>.]<mv_name> [<pt_spec>] enable|disable lifecycle;
  • Parameters

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view for which you want to enable or disable the lifecycle feature.

    • pt_spec: optional. The partition information of the materialized view for which you want to enable or disable the lifecycle feature. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format. partition_col specifies the column name, and partition_col_value specifies the column value.

    • enable|disable: required. The enable option specifies that the lifecycle feature is enabled for a materialized view or a partition of a materialized view. The disable option specifies that the lifecycle feature is disabled for a materialized view or a partition of a materialized view. If you disable the lifecycle feature, lifecycle management is not required for the materialized view or partition.

  • Examples

    • Example 1: Enable the lifecycle feature for a materialized view. Sample statement:

      alter materialized view mv partition (ds='20210101') enable lifecycle;
    • Example 2: Disable the lifecycle feature for a materialized view. Sample statement:

      alter materialized view mv partition (ds='20210101') disable lifecycle;

Query the information about a materialized view

You can query the information about a materialized view, including the schema and modification time of the materialized view.

  • Syntax

    DESC EXTENDED [<project_name>.]<mv_name>;
  • Parameters

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view that you want to query.

  • Examples

    DESC EXTENDED mv;

    The following code shows a sample query result:

    Note

    The following sample result can be returned only if the version of the MaxCompute client is V0.43 or later. For more information, see the "Version updates" section in MaxCompute client (odpscmd).

    +------------------------------------------------------------------------------------+
    | Owner:                    ALIYUN$$****@***.aliyunid.com                       |
    | Project:                  m****                                                |
    | TableComment:                                                                      |
    +------------------------------------------------------------------------------------+
    | CreateTime:               2023-05-30 13:16:07                                      |
    | LastDDLTime:              2023-05-30 13:16:07                                      |
    | LastModifiedTime:         2023-05-30 13:16:07                                      |
    +------------------------------------------------------------------------------------+
    | MaterializedView: YES                                                              |
    | ViewText: select id,name from mf_refresh                                           |
    | Rewrite Enabled: true                                                              |
    | AutoRefresh Enabled: true                                                          |
    | Refresh Interval Minutes: 10                                                       |
    +------------------------------------------------------------------------------------+
    | Native Columns:                                                                    |
    +------------------------------------------------------------------------------------+
    | Field    | Type   | Label | ExtendedLabel | Nullable | DefaultValue | Comment      |
    +------------------------------------------------------------------------------------+
    | id       | bigint |       |               | true     | NULL         |              |
    | name     | string |       |               | true     | NULL         |              |
    +------------------------------------------------------------------------------------+
    | Extended Info:                                                                     |
    +------------------------------------------------------------------------------------+
    | IsOutdated:               false                                                    |
    | TableID:                  569ec712873e44b3868e79b7a8beabab                         |
    | IsArchived:               false                                                    |
    | PhysicalSize:             1875                                                     |
    | FileNum:                  2                                                        |
    | StoredAs:                 CFile                                                    |
    | CompressionStrategy:      normal                                                   |
    | odps.timemachine.retention.days: 1                                                        |
    | ColdStorageStatus:        N/A                                                      |
    | encryption_enable:        false                                                    |
    +------------------------------------------------------------------------------------+
    | AutoRefresh History:                                                               |
    +------------------------------------------------------------------------------------+
    | InstanceId                | Status     | StartTime           | EndTime             |
    +------------------------------------------------------------------------------------+
    | 20230619070546735ghwl1****** | TERMINATED | 2023-06-19 15:05:46 | 2023-06-19 15:05:47 |
    | 20230619065545586gwllc****** | TERMINATED | 2023-06-19 14:55:45 | 2023-06-19 14:55:46 |
    | 20230619064544463gcjgom****** | TERMINATED | 2023-06-19 14:45:44 | 2023-06-19 14:45:45 |
    | 20230619063543334gzxs2d****** | TERMINATED | 2023-06-19 14:35:43 | 2023-06-19 14:35:44 |
    | 2023061906254257gi21w2****** | TERMINATED | 2023-06-19 14:25:42 | 2023-06-19 14:25:43 |
    | 20230619061540813giacg8****** | TERMINATED | 2023-06-19 14:15:41 | 2023-06-19 14:15:41 |
    | 20230619060539674gswjq9****** | TERMINATED | 2023-06-19 14:05:39 | 2023-06-19 14:05:40 |
    | 20230619055538578gvdjk****** | TERMINATED | 2023-06-19 13:55:38 | 2023-06-19 13:55:40 |
    | 20230619054537356glqdne****** | TERMINATED | 2023-06-19 13:45:37 | 2023-06-19 13:45:38 |
    | 2023061905353687gcc5pl****** | TERMINATED | 2023-06-19 13:35:36 | 2023-06-19 13:35:37 |
    +------------------------------------------------------------------------------------+

Query the status of a materialized view

You can query the status of a materialized view. This operation allows you to view changes to the source table and determines whether the materialized view is valid. A materialized view can be in one of the following states:

  • Valid

    When you execute a query statement, MaxCompute queries data from the materialized view instead of querying data from the source table.

  • Invalid

    When you execute a query statement, MaxCompute cannot directly query data from the materialized view. In this case, MaxCompute queries data from the source table. As a result, the query speed is not accelerated.

You can use a function to check whether data in a materialized view is valid.

  • Syntax

    Boolean materialized_view_is_valid(<mv_name>,<partition_value>);
  • Examples

    Check whether data in the materialized view mf_mv_refresh4 is consistent with the latest data in the source table. If the data is consistent, true is returned. Otherwise, false is returned.

    • select materialized_view_is_valid("mf_mv_refresh4");
    • select materialized_view_is_valid("mf_mv_refresh_pts","20220810");

Drop an existing materialized view

You can drop an existing materialized view.

  • Syntax

    drop materialized view [if exists] [<project_name>.]<mv_name> [purge];
  • Parameters

    • if exists: optional. If you do not specify if exists and the materialized view that you want to drop does not exist, an error is returned.

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the materialized view that you want to drop.

    • purge: optional. If you specify the purge parameter, data in a materialized view is deleted when you drop the materialized view.

  • Examples

    • Drop the mv materialized view.

      drop materialized view mv;
    • Drop the mv materialized view and delete the data in the materialized view at the same time.

      drop materialized view mv purge;

Drop partitions from an existing materialized view

You can drop one or more partitions from an existing materialized view.

  • Syntax

    alter materialized view [<project_name>.]<mv_name> drop [if exists] partition <pt_spec> [partition <pt_spec>, partition <pt_spec>....];
  • Parameters

    • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. To view the name of a MaxCompute project, perform the following steps: Log on to the MaxCompute console. In the top navigation bar, select a region. View the name of the MaxCompute project on the Projects page.

    • mv_name: required. The name of the partitioned materialized view from which you want to drop partitions.

    • if exists: optional. If you do not specify if exists and the materialized view that you want to drop does not exist, an error is returned.

    • pt_spec: The partitions that you want to drop. You must specify at least one partition. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format. partition_col specifies the column name, and partition_col_value specifies the column value.

  • Examples

    • Example 1: Drop a partition from a partitioned materialized view. Sample statement:

      alter materialized view mv drop partition (ds='20210101');
    • Example 2: Drop the partitions that meet the specific conditions from a partitioned materialized view. Sample statement:

      alter materialized view mv drop partition (ds>='20210101', ds<='20210105');

Perform a query rewrite operation based on a materialized view

The most important feature of materialized views is to perform query rewrite operations on query statements. To perform query rewrite operations on a query statement based on a materialized view, you must add set odps.sql.materialized.view.enable.auto.rewriting=true; before the query statement. If a materialized view is invalid, the materialized view cannot be used for query rewrite operations. In this case, data is queried from the source table, and the query speed is not accelerated.

Note

By default, a MaxCompute project can use only its own materialized views for query rewrite operations. If you need to perform query rewrite operations on query statements based on the materialized views of other MaxCompute projects, you must add set odps.sql.materialized.view.source.project.white.list=<project_name1>,<project_name2>,<project_name3>; before the query statements to specify the MaxCompute projects. If the query statement of a materialized view contains the LEFT/RIGHT JOIN or UNION ALL keyword, you must add the set odps.sql.materialized.view.enable.substitute.rewriting=true; flag before the query statement.

The following table describes the types of operators that are supported by query rewrite operations of materialized views in MaxCompute and other services.

Operator

Type

MaxCompute

BigQuery

Amazon RedShift

Hive

FILTER

Exact match of expressions

Supported

Supported

Supported

Supported

Partial match of expressions

Supported

Supported

Supported

Supported

AGGREGATE

Single AGGREGATE operation

Supported

Supported

Supported

Supported

Multiple AGGREGATE operations

Not supported

Not supported

Not supported

Not supported

JOIN

JOIN operation

INNER JOIN

Not supported

INNER JOIN

INNER JOIN

Single JOIN operation

Supported

Not supported

Supported

Supported

Multiple JOIN operations

Supported

Not supported

Supported

Supported

AGGREGATE+JOIN

-

Supported

Not supported

Supported

Supported

The query rewrite operations based on a materialized view require that the data in a query statement be obtained from the materialized view. The data includes output columns, the columns required by filter operations, the columns required by aggregate functions, and the columns required by JOIN operations. If the columns that are required in the query statement are not included in the materialized view or are not supported by the aggregate functions, you cannot perform query rewrite operations based on the materialized view. Configuration examples:

  • Rewrite a query statement that contains filter conditions. The following statement creates a materialized view.

    create materialized view mv as select a,b,c from src where a>5;

    Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

    Query statement before the query rewrite operation

    Query statement after the query rewrite operation

    select a,b from src where a>5;

    select a,b from mv;

    select a, b from src where a=10;

    select a,b from mv where a=10;

    select a, b from src where a=10 and b=3;

    select a,b from mv where a=10 and b=3;

    select a, b from src where a>3;

    (select a,b from src where a>3 and a<=5) union (select a,b from mv);

    select a, b from src where a=10 and d=4;

    The query rewrite operation fails because Column d does not exist in mv.

    select d, e from src where a=10;

    The query rewrite operation fails because Column d and Column e do not exist in mv.

    select a, b from src where a=1;

    The query rewrite operation fails because data that meets the expression of a=1 does not exist in mv.

  • Rewrite a query statement with aggregate functions

    • If the SQL statement of a materialized view has the same aggregate key as the query statement, the query statement with any aggregate functions can be rewritten. If different aggregate keys are used, the query statement with the aggregate functions SUM, MIN, or MAX can be rewritten.

      The following statement creates a materialized view.

      create materialized view mv as 
      select a, b, sum(c) as sum, count(d) as cnt from src group by a, b;

      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation

      Query statement after the query rewrite operation

      select a, sum(c) from src group by a;

      select a, sum(sum) from mv group by a;

      select a, count(d) from src group by a, b;

      select a, cnt from mv;

      select a, count(b) from (select a, b from src group by a, b) group by a;

      select a,count(b) from mv group by a;

      select a, count(b) from src group by a;

      The query rewrite operation fails because Column a and Column b have been aggregated in the materialized view. As a result, Column b cannot be re-aggregated.

      select a, count(c) from src group by a;

      The query rewrite operation fails because the COUNT function does not support re-aggregation.

    • If an aggregate function includes DISTINCT, the query rewrite operation is supported when the SQL statement of the materialized view has the same aggregate key as the query statement. Otherwise, the query rewrite operation is not supported.

      The following statement creates a materialized view.

      create materialized view mv as 
      select a, b, sum(distinct c) as sum, count(distinct d) as cnt from src group by a, b;

      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation

      Query statement after the query rewrite operation

      select a, count(distinct d) from src group by a, b;

      select a, cnt from mv;

      select a, count(c) from src group by a, b;

      The query rewrite operation fails because the COUNT function does not support re-aggregation.

      select a, count(distinct c) from src group by a;

      The query rewrite operation fails because data in Column a needs to be re-aggregated.

  • Rewrite a query statement that contains a JOIN clause

    • Data input by using JOIN clauses

      The following statements create materialized views.

      create materialized view mv1 as select a, b from j1 where b > 10;
      create materialized view mv2 as select a, b from j2 where b > 10;

      Execute a query statement based on the created materialized views. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation

      Query statement after the query rewrite operation

      select j1.a,j1.b,j2.a from (select a,b from j1 where b > 10) j1 join j2 on j1.a=j2.a;

      select mv1.a, mv1.b, j2.a from mv1 join j2 on mv1.a=j2.a;

      select j1.a,j1.b,j2.a from(select a,b from j1 where b > 10) j1join(select a,b from j2 where b > 10) j2on j1.a=j2.a;

      select mv1.a,mv1.b,mv2.a from mv1 join mv2 on mv1.a=mv2.a;

    • JOIN clause with filter conditions

      The following statements create materialized views.

      -- Create non-partitioned materialized views. 
      create materialized view mv1 as select j1.a, j1.b from j1 join j2 on j1.a=j2.a;
      create materialized view mv2 as select j1.a, j1.b from j1 join j2 on j1.a=j2.a where j1.a > 10;
      -- Create a partitioned materialized view. 
      create materialized view mv lifecycle 7 partitioned on (ds) as select t1.id, t1.ds as ds from t1 join t2 on t1.id = t2.id;

      Execute a query statement based on the created materialized views. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation

      Query statement after the query rewrite operation

      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a=4;

      select a, b from mv1 where a=4;

      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 20;

      select a,b from mv2 where a>20;

      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 5;

      (select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 5 and j1.a <= 10) union select * from mv2;

      select key from t1 join t2 on t1.id= t2.id where t1.ds='20210306';

      select key from mv where ds='20210306';

      select key from t1 join t2 on t1.id= t2.id where t1.ds>='20210306';

      select key from mv where ds>='20210306';

      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j2.a=4;

      The query rewrite operation fails because the materialized view does not have Column j2.a.

    • JOIN clause for joining tables

      The following statement creates a materialized view.

      create materialized view mv as select j1.a, j1.b from j1 join j2 on j1.a=j2.a;

      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation

      Query statement after the query rewrite operation

      select j1.a, j1.b from j1 join j2 join j3 on j1.a=j2.a and j1.a=j3.a;

      select mv.a, mv.b from mv join j3 on mv.a=j3.a;

      select j1.a, j1.b from j1 join j2 join j3 on j1.a=j2.a and j2.a=j3.a;

      select mv.a,mv.b from mv join j3 on mv.a=j3.a;

    Note
    • The preceding types of statements can be used together. You can rewrite a query statement that meets rewrite conditions.

    • MaxCompute selects the optimal rewrite rule to execute query statements. If operations are added after a query statement is rewritten, MaxCompute does not select the query statement after the rewrite because this query statement is not optimal.

  • Rewrite a query statement that contains a LEFT JOIN clause

    The following statement creates a materialized view.

    create materialized view mv lifecycle 7(
            user_id,
            job,
            total_amount
    ) as select t1.user_id, t1.job, sum(t2.order_amount) as total_amount 
          from user_info as t1 left join sale_order as t2 on t1.user_id=t2.user_id group by t1.user_id;

    Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

    Query statement before the query rewrite operation

    Query statement after the query rewrite operation

    select t1.user_id, sum(t2.order_amout) as total_amount from user_info as t1 left join sale_order as t2 on t1.user_id=t2.user_id group by t1.user_id;

    select user_id, total_amount from mv;

  • Rewrite a query statement that contains a UNION ALL clause

    The following statement creates a materialized view.

    create materialized view mv lifecycle 7(
            user_id,
            tran_amount,
            tran_date
    ) as select user_id, tran_amount, tran_date from alipay_tran union all
     select user_id, tran_amount, tran_date from unionpay_tran;

    Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.

    Query statement before the query rewrite operation

    Query statement after the query rewrite operation

    select user_id, tran_amount from alipay_tran union all select user_id, tran_amount from unionpay_tran;

    select user_id, total_amount from mv;

Perform a penetration query based on a materialized view

A partitioned materialized view does not contain the data of all partitions in a source partitioned table if only the latest partition data is updated to the partitioned materialized view. If the partition data that you want to query does not exist in a partitioned materialized view, the system performs a penetration query to query data from the source partitioned table. The following figure shows how to perform a penetration query.

查询透穿图示

To allow a materialized view to support penetration query, you must configure the following parameters:

  • Set odps.optimizer.cbo.rule.filter.black to re at the session level to enable constant folding for partition key columns.

  • When you create a materialized view, add "enable_auto_substitute"="true" to tblproperties.

The following example shows how to perform a penetration query based on a materialized view.

  1. Create a partitioned materialized view that supports penetration query. Sample statements:

    -- Enable constant folding for partition key columns in partitions to ensure that partition data is dynamically changed. 
    set odps.optimizer.cbo.rule.filter.black=re;
    -- Initialize the data of the 20210101 partition. 
    set odps.sql.materialized.view.initial.partition={"dt": "20210101"};
    -- Create a partitioned materialized view that supports penetration query. 
    create materialized view if not exists mv lifecycle 7 
    tblproperties("enable_auto_substitute"="true") 
    partitioned on (dt)
    as select id, name, dt from src;
  2. Query the data of the 20210101 partition in the src table from the mv materialized view. Sample statement:

    select * from mv where dt='20210101';
  3. Query the data of the 20210102 partition in the src table from the mv materialized view. The mv materialized view does not contain the partition data. Therefore, a penetration query is performed to query the partition data from the src table. Sample statement:

    select * from mv where dt = '20210102';
    -- The preceding statement is equivalent to the following statement because the mv materialized view does not contain the data of the 20210102 partition and the partition data needs to be queried from the src table.
    select * from (select id, name, dt from src where dt='20210102') t;
  4. Query the data of partitions 20201230 to 20210102 from the mv materialized view. The mv materialized view does not contain all partition data. Therefore, a penetration query is performed to obtain the data that is not contained in the mv materialized view, and a UNION operation is performed on the obtained data and the data that is queried from the mv materialized view to return the final result.

    select * from mv where dt >= '20201230' and dt<='20210102' and id='5';
    -- The mv materialized view does not contain the data of the 20210102 partition and a penetration query is performed to query the data of the 20210102 partition from the source table. The preceding statement is equivalent to the following statement:
    select * from
    (select id, name, dt from src where dt='20211231' or dt='20210102';
     union all 
     select * from mv where dt='20210101'
    ) t where id = '5';

Perform a scheduled update on an existing materialized view

  • Usage notes

    • A scheduled update of materialized views is essentially a task submitted by the system background. The update completion time varies based on the amount of available cluster resources and the size of the task.

    • When a scheduled update is triggered, all data is updated.

      • If data in a partition of the partitioned base table that corresponds to the partition key columns in a partitioned materialized view changes, only data in the associated partition of the materialized view is updated. Data in other partitions is not affected. If data in other tables associated with the partitioned materialized view changes, full data in the materialized view is updated.

      • If you enable the scheduled update feature for a non-partitioned materialized view, the system automatically checks whether an update operation is required. If data in the source table is not updated, the update operation is skipped.

    • If a scheduled update is not complete before the next update time, the system automatically cancels the next scheduled update and triggers a new update at the latest scheduled time.

      For example, a scheduled update is triggered at an interval of 20 minutes. If the first update is scheduled at 10:00:00, the second update is planned to be scheduled at 10:20:00, and the third update is planned to be scheduled at 10:40:00.

      If the first update starts at 10:00:00 and ends at 10:30:00, the update that is planned to be scheduled at 10:20:00 is canceled, and the next update is scheduled at 10:40:00.

    • MaxCompute cannot detect materialized views that are scheduled to be updated in real time. The update detection latency is within 30 minutes.

  • Enable the scheduled update feature for a materialized view.

    You can enable the scheduled update feature for a materialized view in MaxCompute by using one of the following methods.

    • Method 1: Use the EVERY parameter.

      When you create a materialized view, specify the interval at which a material view is updated. The interval can be N minutes, hours, or days. The next update time is the previous update time plus N minutes, hours, or days.

      • Syntax

        CREATE MATERIALIZED VIEW [IF NOT EXISTS] [project_name.]mv_name
        [LIFECYCLE days] -- The lifecycle of the materialized view.
        [(col_name [COMMENT col_comment], ...)] -- The column comments.
        [DISABLE REWRITE] -- Specifies whether to disable the query rewrite operation that is performed based on the materialized view.
        [PARTITIONED ON (col_name [, col_name, ...]) -- The partitions in the materialized view. This parameter is required when you create a partitioned materialized view.
        [CLUSTERED BY|RANGE CLUSTERED BY (col_name [, col_name, ...])
        [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])]
        INTO number_of_buckets BUCKETS]-- The shuffle and sort attributes of the materialized view. This parameter is required when you create a clustered materialized view.
        
        [REFRESH EVERY xx MINUTES/HOURS/DAYS] -- The scheduled update interval of the materialized view.
        TBLPROPERTIES("enable_auto_refresh"="true") -- Enable the scheduled update feature for the materialized view.
        
        AS <select_statement>;
      • Examples

        Create a materialized view that is updated at an interval of 10 minutes.

        --create table
        create table mf_refresh (id bigint, name string);
        --insert values
        insert into table mf_refresh values(1,'kyle'),(2,'tom');
        --create mv
        create materialized view if not exists mf_mv_refresh
        refresh every 10 minutes
        tblproperties("enable_auto_refresh"="true")
        as select id,name from mf_refresh;
        --select data from mv
        select * from mf_mv_refresh;
        +------------+------------+
        | id         | name       |
        +------------+------------+
        | 1          | kyle       |
        | 2          | tom        |
        +------------+------------+
        
        --after created mv,insert value to table
        insert into table mf_refresh values(3,'jen'),(4,'cathy');
        
        --after 10m, select the data from mv
        select * from mf_mv_refresh;
        +------------+------------+
        | id         | name       |
        +------------+------------+
        | 1          | kyle       |
        | 2          | tom        |
        | 3          | jen        |
        | 4          | cathy      |
        +------------+------------+
    • Method 2: Use the refresh_interval_minutes parameter in tblproperties.

      When you create a materialized view, specify the scheduled update interval. For example, set the scheduled update interval to N minutes. The next update time point is the previous update time point plus N minutes.

      • Syntax

        CREATE MATERIALIZED VIEW [IF NOT EXISTS] [project_name.]mv_name
        [LIFECYCLE days] -- The lifecycle of the materialized view.
        [(col_name [COMMENT col_comment], ...)] -- The column comments.
        [DISABLE REWRITE] -- Specifies whether to disable the query rewrite operation that is performed based on the materialized view.
        [PARTITIONED ON (col_name [, col_name, ...]) -- The partitions in the materialized view. This parameter is required when you create a partitioned materialized view.
        [CLUSTERED BY|RANGE CLUSTERED BY (col_name [, col_name, ...])
        [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])]
        INTO number_of_buckets BUCKETS]-- The shuffle and sort attributes of the materialized view. This parameter is required when you create a clustered materialized view.
        TBLPROPERTIES("enable_auto_refresh"="true", "refresh_interval_minutes"="xx")
        -- Enable the scheduled update feature.
        AS <select_statement>;
      • Examples

        Create a materialized view that is updated at an interval of 20 minutes.

        --create mv with fresh data by20 min
        create materialized view if not exists mf_mv_refresh2
        tblproperties("enable_auto_refresh"="true", "refresh_interval_minutes"="20")
        as select id,name from mf_refresh;
        
        --select data from mv
        select * from mf_mv_refresh2;
        +------------+------------+
        | id         | name       |
        +------------+------------+
        | 1          | kyle       |
        | 2          | tom        |
        | 3          | jen        |
        | 4          | cathy      |
        +------------+------------+
        --after mv,insert value to table
        insert into table mf_refresh values(5,'roger'),(6,'david');
        
        -- after 30m, select data from mv
        -- The following result is returned:
        +------------+------------+
        | id         | name       |
        +------------+------------+
        | 1          | kyle       |
        | 2          | tom        |
        | 3          | jen        |
        | 4          | cathy      |
        | 5          | roger      |
        | 6          | david      |
        +------------+------------+
    • Method 3: Use the refresh_cron parameter in tblproperties.

      You can use a Cron expression to specify the interval or time point for scheduled updates.

      • Syntax

        CREATE MATERIALIZED VIEW [IF NOT EXISTS] [project_name.]mv_name
        [LIFECYCLE days] -- The lifecycle of the materialized view.
        [(col_name [COMMENT col_comment], ...)] -- The column comments.
        [DISABLE REWRITE] -- Specifies whether to disable the query rewrite operation that is performed based on the materialized view.
        [PARTITIONED ON (col_name [, col_name, ...]) -- The partitions in the materialized view. This parameter is required when you create a partitioned materialized view.
        [CLUSTERED BY|RANGE CLUSTERED BY (col_name [, col_name, ...])
        [SORTED BY (col_name [ASC | DESC] [, col_name [ASC | DESC] ...])]
        INTO number_of_buckets BUCKETS]-- The shuffle and sort attributes of the materialized view. This parameter is required when you create a clustered materialized view.
        
        TBLPROPERTIES("enable_auto_refresh"="true", "refresh_cron"="xx")
        -- Enable the scheduled update feature.
        AS <select_statement>;

        The refresh_cron parameter is a string in the Quartz Cron format. For more information, see Cron expression examples.

      • Examples

        • Create a materialized view that is updated at an interval of 5 minutes.

          --cron base create mv
          create materialized view if not exists mf_mv_refresh3
          tblproperties("enable_auto_refresh"="true", "refresh_cron"="* */5 * * * ? *")
          as select id,name from mf_refresh;
          --select data from mv
          select * from mf_mv_refresh3;
          -- The following result is returned:
          +------------+------------+
          | id         | name       |
          +------------+------------+
          | 1          | kyle       |
          | 2          | tom        |
          | 3          | jen        |
          | 4          | cathy      |
          | 5          | roger      |
          | 6          | david      |
          +------------+------------+
          --insert value to table
          insert into table mf_refresh values(7,'ethan'),(8,'cal');
          --after 5min,select data from mv
          select * from mf_mv_refresh3;
          -- The following result is returned:
          +------------+------------+
          | id         | name       |
          +------------+------------+
          | 1          | kyle       |
          | 2          | tom        |
          | 7          | ethan      |
          | 8          | cal        |
          | 3          | jen        |
          | 4          | cathy      |
          | 5          | roger      |
          | 6          | david      |
          +------------+------------+
                                                          
        • Create a materialized view that is updated at 20:00 every night.

          --cron base create mv
          create materialized view if not exists mf_mv_refresh4
          tblproperties("enable_auto_refresh"="true","refresh_cron"="0 0 20 * * ? *")
          as select id,name from mf_refresh;
          --select data from mv
          select * from mf_mv_refresh4;
          -- The following result is returned:
          +------------+------------+
          | id         | name       |
          +------------+------------+
          | 1          | kyle       |
          | 2          | tom        |
          | 7          | ethan      |
          | 8          | cal        |
          | 3          | jen        |
          | 4          | cathy      |
          | 5          | roger      |
          | 6          | david      |
          +------------+------------+
          --insert value to table
          insert into table mf_refresh values(9,'annie'),(10,'zoe');
          --when 20:00,select data from mv
          select * from mf_mv_refresh4;
          -- The following result is returned:
          +------------+------------+
          | id         | name       |
          +------------+------------+
          | 1          | kyle       |
          | 2          | tom        |
          | 9          | annie      |
          | 10         | zoe        |
          | 7          | ethan      |
          | 8          | cal        |
          | 3          | jen        |
          | 4          | cathy      |
          | 5          | roger      |
          | 6          | david      |
          +------------+------------+
                                                          
  • Change the interval for scheduled updates of a materialized view.

    Execute one of the following statements to change the interval for scheduled updates of a materialized view.

    ALTER MATERIALIZED VIEW mv SET TBLPROPERTIES("enable_auto_refresh"="true", "refresh_interval_minutes"="xx");
    ALTER MATERIALIZED VIEW mv SET TBLPROPERTIES("enable_auto_refresh"="true", "refresh_cron"="xx");
  • Disable the scheduled update feature.

    Execute the following statement to disable the scheduled update feature for a materialized view:

    ALTER MATERIALIZED VIEW mv SET TBLPROPERTIES("enable_auto_refresh"="false");
  • View the update history of a materialized view.

    Execute the following statement to view the update history of a materialized view:

    DESC EXTENDED mv;

    For information about the returned result, see Query information about a materialized view.

Examples

Scenario: A page visit table named visit_records contains the page ID, user ID, and visit time of each user. The numbers of visits to different pages need to be queried and analyzed. The following code shows the structure of the visit_records table.

+------------------------------------------------------------------------------------+
| Field           | Type       | Label | Comment                                     |
+------------------------------------------------------------------------------------+
| page_id         | string     |       |                                             |
| user_id         | string     |       |                                             |
| visit_time      | string     |       |                                             |
+------------------------------------------------------------------------------------+

You can create a materialized view for the visit_records table to collect the number of visits to each page based on the page ID. Then, you can perform subsequent query operations based on the materialized view.

  1. Execute the following statement to create a materialized view:

    create materialized view count_mv as select page_id, count(*) from visit_records group by page_id;
  2. Execute the following query statements:

    set odps.sql.materialized.view.enable.auto.rewriting=true; 
    select page_id, count(*) from visit_records group by page_id;

    When you execute the query statement, MaxCompute can match the materialized view count_mv and read the aggregated data from count_mv.

  3. Execute the following statement to check whether the query statement matches the materialized view:

    explain select page_id, count(*) from visit_records group by page_id;

    The following result is returned:

    job0 is root job
    
    In Job job0:
    root Tasks: M1
    
    In Task M1:
        Data source: doc_test_dev.count_mv
        TS: doc_test_dev.count_mv
            FS: output: Screen
                schema:
                  page_id (string)
                  _c1 (bigint)
    
    
    OK

    In the returned result, the value of Data source contains count_mv. This indicates that the materialized view is valid and the query is rewritten.

Billing rules

When you use materialized views, you are charged for the following items:

  • Storage

    Materialized views occupy physical storage space. You are charged for the physical storage space that is occupied by materialized views. For more information about storage pricing, see Storage pricing (pay-as-you-go).

  • Computing

    Data is queried when you create, update, and query materialized views, and rewrite queries (if materialized views are valid). These operations consume computing resources and generate computing costs.

    • If your MaxCompute project uses the subscription billing method, no extra computing costs are generated.

    • If your MaxCompute project uses the pay-as-you-go billing method, fees are calculated based on the SQL complexity and the amount of input data. For more information about billing, see the "Billing for standard SQL jobs" section in Computing pricing. Take note of the following points:

      • The SQL statements used to update a materialized view are the same as the SQL statements used to create a materialized view. If the project to which the materialized view belongs is bound to a subscription resource group, the purchased subscription resources are used, and no additional fees are generated. If the project is bound to a pay-as-you-go resource group, the fees vary based on the amount of input data and complexity of the SQL statements. After you update a materialized view, you are charged storage fees based on the storage space that is used.

      • If the materialized view is valid, data is read from the materialized view when the query rewrite operation is performed. The amount of input data that is read from the materialized view of the query statement is related to the materialized view and is irrelevant to the source table of the materialized view. If the materialized view is invalid, the query rewrite operation cannot be performed, and data is queried from the source table. The amount of input data for the query statement is related to the source table. For more information about how to query the status of a materialized view, see Query the status of a materialized view in this topic.

      • Data bloat may occur if a materialized view is generated based on the association of multiple tables. Therefore, the amount of data read by a materialized view may not be absolutely less than the amount of data in the source table. MaxCompute cannot ensure that data reading from a materialized view costs less than data reading from the source table.