All Products
Search
Document Center

MaxCompute:Partition operations

Last Updated:Dec 16, 2024

MaxCompute allows you to perform operations on partitions in an existing table. For example, you can add a partition, drop a partition, and change a value in a partition key column based on your business requirements.

The following table describes the operations on partitions in a MaxCompute table.

Operation

Description

Authorized user

Operation platform

Add partitions

Adds partitions to an existing partitioned table.

Users who have the Alter permission on tables

You can execute the statements that are described in this topic on the following platforms:

Drop partitions

Drops partitions from an existing partitioned table.

Change the value of LastModifiedTime for a partition

Changes the value of LastModifiedTime for a partition in a partitioned table.

Change a value in a partition key column

Changes a value in a partition key column.

Merge partitions

Merges multiple partitions of a partitioned table into one partition. This operation deletes the dimension information about the partitions that are merged and transfers the partition data to the specified partition.

Clear data from a partition

Clears data from the specified partition.

Limits

  • Only columns of the TINYINT, SMALLINT, INT, BIGINT, CHAR, VARCHAR, and STRING data types can be configured as partition key columns in a MaxCompute table.

  • A MaxCompute table can contain a maximum of six levels of partitions.

  • A MaxCompute table can contain a maximum of 60,000 partitions.

  • MaxCompute allows you to query data from a maximum of 10,000 partitions at a time.

  • You cannot merge partitions of a partitioned transactional table.

  • You cannot clear data in a non-partitioned Delta table or change the write.bucket.num property of a Delta table.

Add partitions

Adds partitions to an existing partitioned table.

  • Limits

    • A MaxCompute table can have a maximum of 60,000 partitions.

    • To add the values of partition key columns to a table that has multi-level partitions, you must specify all the partitions.

    • This operation can add only the values of partition key columns. The names of partition key columns cannot be added.

  • Syntax

    ALTER TABLE <table_name> ADD [if NOT EXISTS] PARTITION <pt_spec> [PARTITION <pt_spec> PARTITION <pt_spec>...];
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the partitioned table to which you want to add partitions.

    if not exists

    No

    If you do not specify the if not exists option and a partition with the same name already exists, this operation fails and an error is returned.

    pt_spec

    Yes

    The partitions that you want to add. The value of this parameter is in the (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...) format. partition_col indicates the names of the partition key columns. partition_col_value indicates the values of the partition key columns. The names of partition key columns are not case sensitive, but their values are case sensitive.

  • Examples:

    • Example 1: Add a partition to the sale_detail table. The partition stores the sales records in the China (Hangzhou) region in December 2013.

      ALTER TABLE sale_detail ADD if NOT EXISTS PARTITION (sale_date='201312', region='hangzhou');
    • Example 2: Add two partitions to the sale_detail table. The partitions store the sales records in the China (Beijing) and China (Shanghai) regions in December 2013.

      ALTER TABLE sale_detail ADD if NOT EXISTS PARTITION (sale_date='201312', region='beijing') partition (sale_date='201312', region='shanghai');
    • Example 3: Add a partition to the sale_detail table and specify only the partition key column sale_date for the partition. An error is returned because you must specify the two partition key columns sale_date and region.

      ALTER TABLE sale_detail ADD if NOT EXISTS PARTITION (sale_date='20111011');
    • Example 4: Add a partition to a Delta table.

      -- Create a Delta table.
      CREATE TABLE mf_tt (pk bigint NOT NULL PRIMARY key, 
                          val bigint NOT NULL) 
                   partitioned BY (dd string, hh string) 
                   tblproperties ("transactional"="true"); 
                   
      -- Add a partition to the table.           
      ALTER TABLE mf_tt ADD PARTITION (dd='01', hh='01');
    • Example 5: Change the properties of a Delta table.

      -- Change the write.bucket.num property of a Delta table. You can change only the property of a partitioned Delta table. You cannot change the property of a non-partitioned Delta table.
      ALTER TABLE mf_tt3 SET tblproperties("write.bucket.num"="64");
      
      -- Change the acid.data.retain.hours property of a Delta table.
      ALTER TABLE mf_tt3 SET tblproperties("acid.data.retain.hours"="60");

Drop partitions

Drops partitions from an existing partitioned table.

MaxCompute allows you to drop partitions based on the specified filter condition. If you want to drop multiple partitions that meet the specified filter condition at a time, you can use an expression to specify the filter condition, use the filter condition to match partitions, and drop the partitions at a time.

  • Limits

    • You can specify the information of only one partition key column in a PARTITION (<partition_filtercondition>) clause.

    • If you use an expression to specify PARTITION (<partition_filtercondition>), the function used in the expression must be a built-in scalar function.

  • Precautions

    • After you drop a partition from a table in your MaxCompute project, the volume of stored data in your MaxCompute project decreases.

    • You can specify a lifecycle for a partitioned table. This way, MaxCompute automatically reclaims partitions whose data is not updated within the time specified by the lifecycle. For more information about the lifecycle, see Lifecycle.

  • Syntax

    • The filter condition is not specified.

      -- Drop one partition at a time. 
      ALTER TABLE <table_name> DROP [if EXISTS] PARTITION <pt_spec>;
      -- Drop multiple partitions at a time. 
      ALTER TABLE <table_name> DROP [if EXISTS] PARTITION <pt_spec>,PARTITION <pt_spec>[,PARTITION <pt_spec>....];
    • The filter condition is specified.

      ALTER TABLE <table_name> DROP [if EXISTS] PARTITION <partition_filtercondition>;
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the partitioned table from which you want to drop partitions.

    if exists

    No

    If you do not specify the if exists option and the specified partition does not exist, an error is returned.

    pt_spec

    Yes

    The partitions that you want to drop. Format: (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...). partition_col indicates the names of the partition key columns. partition_col_value indicates the values of the partition key columns. The names of partition key columns are not case sensitive, but their values are case sensitive.

    partition_filtercondition

    No

    The filter condition. This parameter is required when you specify the filter condition. It is not case-sensitive. Format:

    partition_filtercondition
        : PARTITION (<partition_col> <relational_operators> <partition_col_value>)
        | PARTITION (scalar(<partition_col>) <relational_operators> <partition_col_value>)
        | PARTITION (<partition_filtercondition1> AND|OR <partition_filtercondition2>)
        | PARTITION (NOT <partition_filtercondition>)
        | PARTITION (<partition_filtercondition1>)[,PARTITION (<partition_filtercondition2>), ...]

    Parameters:

    • partition_col: the name of the partition key column.

    • relational_operators: the relational operator. For more information, see Operator.

    • partition_col_value: A comparison value or regular expression. The data type of this value must be the same as the data type of the partition key column.

    • scalar(): a scalar function. The scalar function generates a scalar based on the input value, processes the values in the column specified by partition_col, and uses relational operators specified by relational_operators to compare the processed values with the value specified by partition_col_value.

    • The filter conditions support the logical operators NOT, AND, and OR. You can use PARTITION (NOT <partition_filtercondition>) to obtain the complementary set of the filter conditions that you specified. You can use PARTITION (<partition_filtercondition1> AND|OR <partition_filtercondition2>) to obtain the condition that is used to match partitions.

    • Multiple PARTITION (<partition_filtercondition>) clauses are supported. If these clauses are separated by commas (,), the logical relationship between the clauses is OR. The filter condition is obtained based on the OR logical relationship and used to match partitions.

  • Examples:

    • The filter condition is not specified.

      -- Drop a partition from the sale_detail table. The partition stores the sales record in the China (Hangzhou) region in December 2013. 
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date='201312',region='hangzhou'); 
      -- Drop two partitions from the sale_detail table. The partitions store the sales records in the China (Hangzhou) and China (Shanghai) regions in December 2013. 
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date='201312',region='hangzhou'),PARTITION(sale_date='201312',region='shanghai');
    • The filter condition is specified.

      -- Create a partitioned table.
      CREATE TABLE if NOT EXISTS sale_detail(
      shop_name     STRING,
      customer_id   STRING,
      total_price   DOUBLE)
      partitioned BY (sale_date STRING);
      -- Add partitions to the table.
      ALTER TABLE sale_detail ADD if NOT EXISTS
      PARTITION (sale_date= '201910')
      PARTITION (sale_date= '201911')
      PARTITION (sale_date= '201912')
      PARTITION (sale_date= '202001')
      PARTITION (sale_date= '202002')
      PARTITION (sale_date= '202003')
      PARTITION (sale_date= '202004')
      PARTITION (sale_date= '202005')
      PARTITION (sale_date= '202006')
      PARTITION (sale_date= '202007');
      -- Drop multiple partitions from the table at a time.
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date < '201911');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date >= '202007');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date LIKE '20191%');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date IN ('202002','202004','202006'));
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date BETWEEN '202001' AND '202007');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(substr(sale_date, 1, 4) = '2020');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date < '201912' OR sale_date >= '202006');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date > '201912' AND sale_date <= '202004');
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(NOT sale_date > '202004');
      -- Drop partitions by using multiple PARTITION (<partition_filtercondition>) clauses. The logical relationship between these clauses is OR.
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date < '201911'), PARTITION(sale_date >= '202007');
      -- Add partitions in other formats.
      ALTER TABLE sale_detail ADD IF NOT EXISTS
      PARTITION (sale_date= '2019-10-05') 
      PARTITION (sale_date= '2019-10-06') 
      PARTITION (sale_date= '2019-10-07');
      -- Use regular expressions to match the partitions that you want to drop and drop these partitions at a time. 
      ALTER TABLE sale_detail DROP if EXISTS PARTITION(sale_date RLIKE '2019-\\d+-\\d+');
      -- Create a table named region_sale_detail. The table contains multi-level partitions. 
      CREATE TABLE if NOT EXISTS region_sale_detail(
      shop_name     STRING,
      customer_id   STRING,
      total_price   DOUBLE)
      partitioned BY (sale_date STRING , region STRING );
      -- Add partitions to the table. 
      ALTER TABLE region_sale_detail ADD IF NOT EXISTS
      PARTITION (sale_date= '201910',region = 'shanghai')
      PARTITION (sale_date= '201911',region = 'shanghai')
      PARTITION (sale_date= '201912',region = 'shanghai')
      PARTITION (sale_date= '202001',region = 'shanghai')
      PARTITION (sale_date= '202002',region = 'shanghai')
      PARTITION (sale_date= '201910',region = 'beijing')
      PARTITION (sale_date= '201911',region = 'beijing')
      PARTITION (sale_date= '201912',region = 'beijing')
      PARTITION (sale_date= '202001',region = 'beijing')
      PARTITION (sale_date= '202002',region = 'beijing');
      -- Execute the following statement to drop multi-level partitions at a time. The logical relationship between the two PARTITION (<partition_filtercondition>) clauses is OR. After the statement is executed, the partitions in which the value of the sale_date column is earlier than 201911 or the partitions in which the value of the region column is beijing are dropped. 
      ALTER TABLE region_sale_detail DROP if EXISTS PARTITION(sale_date < '201911'),PARTITION(region = 'beijing');
      -- Execute the following statement to drop partitions in which the value of the sale_date column is earlier than 201911 and the value of the region column is beijing. 
      ALTER TABLE region_sale_detail DROP if EXISTS PARTITION(sale_date < '201911', region = 'beijing');

      When you drop multi-level partitions at a time, you cannot specify a filter condition that is based on multiple partition key columns in one PARTITION (<partition_filtercondition>) clause. Otherwise, the following error is returned: FAILED: ODPS-0130071:[1,82] Semantic analysis exception - invalid column reference region, partition expression must have one and only one column reference.

      -- If you specify the information of multiple partition key columns in a PARTITION (<partition_filtercondition>) clause, an error is returned. Example of incorrect usage: 
      ALTER TABLE region_sale_detail DROP if EXISTS PARTITION(sale_date < '201911' AND region = 'beijing');

Change the value of LastModifiedTime for a partition

Changes the value of LastDataModifiedTime for a partition in a partitioned table by executing the TOUCH statement that is provided by MaxCompute SQL. This statement changes the value of LastModifiedTime to the current time. In this case, MaxCompute considers that data is updated, and a new lifecycle of the partition starts from the time specified by LastModifiedTime.

  • Limits

    If a table contains multi-level partitions, you must specify all partition levels when you change the value of LastModifiedTime for a partition.

  • Syntax

    ALTER TABLE <table_name> touch PARTITION (<pt_spec>);
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the partitioned table for which you want to change the value of LastModifiedTime. If the table does not exist, an error is returned.

    pt_spec

    Yes

    The partition for which you want to change the value of LastModifiedTime. Format: (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...). partition_col indicates the names of the partition key columns. partition_col_value indicates the values of the partition key columns. If the specified column name or column value does not exist, an error is returned.

  • Examples

    -- Change the value of LastModifiedTime for the partition in which the value of the sale_date column is 201312 and the value of the region column is shanghai in the sale_detail table. 
    ALTER TABLE sale_detail touch PARTITION (sale_date='201312', region='shanghai');

Change a value in a partition key column

Changes a value in a partition key column. MaxCompute SQL allows you to execute the RENAME statement to change values in partition key columns.

  • Limits

    • The RENAME statement can change values in partition key columns but cannot change the column names.

    • If a table contains multi-level partitions, you must specify all partition levels when you change the value of LastModifiedTime for a partition.

  • Syntax

    ALTER TABLE <table_name> PARTITION (<pt_spec>) rename TO PARTITION (<new_pt_spec>);
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the table in which you want to change a value in a partition key column.

    pt_spec

    Yes

    The partition in which you want to change partition key column values. Format: (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...). partition_col indicates the names of the partition key columns. partition_col_value indicates the values of the partition key columns. If the specified column name or column value does not exist, an error is returned.

    new_pt_spec

    Yes

    The new partition information. Format: (partition_col1 = new_partition_col_value1, partition_col2 = new_partition_col_value2, ...). partition_col indicates the names of the partition key columns. new_partition_col_value indicates the new values of the partition key columns.

  • Examples

    -- Change the partition key column values of the sale_detail table. 
    ALTER TABLE sale_detail PARTITION (sale_date = '201312', region = 'hangzhou') rename TO PARTITION (sale_date = '201310', region = 'beijing');

Merge partitions

Merges multiple partitions of a partitioned table into one partition. MaxCompute SQL allows you to execute the MERGE PARTITION statement to merge multiple partitions of a table into one partition. This operation deletes the dimension information about the merged partitions and migrates the partition data to the specified partition.

  • Limits

    • Partitions of external tables cannot be merged. After partitions of a clustered table are merged into one partition, the partition does not have the clustering attribute.

    • You can merge a maximum of 4,000 partitions at a time.

  • Syntax

    ALTER TABLE <table_name> MERGE [if EXISTS] PARTITION (<predicate>) [, PARTITION(<predicate2>) ...] overwrite PARTITION (<fullpartitionSpec>) [purge];
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the partitioned table whose partitions you want to merge.

    if exists

    No

    If you do not specify the if exists option and the specified partition does not exist, this operation fails and an error is returned. If if exists is specified but no partitions meet the merge condition, a new partition cannot be generated after merging. If the source data is modified by an operation such as insert, rename, or drop during the merging process, an error is returned even if if exists is specified.

    predicate

    Yes

    The condition that is used to match the partitions that you want to merge.

    fullpartitionSpec

    Yes

    The partition that is generated after merging.

    purge

    No

    Optional keyword. If you configure this parameter, the session directory is cleared. By default, logs that are generated in the last three days are deleted. For more information, see Purge.

  • Examples

    • Example 1: Merge the partitions that meet the specified condition into the destination partition.

      -- View the partitions in a partitioned table. 
      SHOW partitions intpstringstringstring;
      
      ds=20181101/hh=00/mm=00
      ds=20181101/hh=00/mm=10
      ds=20181101/hh=10/mm=00
      ds=20181101/hh=10/mm=10
      
      -- Merge all partitions that meet the hh='00' condition into the ds='20181101', hh='00', mm='00' partition. 
      ALTER TABLE intpstringstringstring MERGE PARTITION(hh='00') overwrite PARTITION(ds='20181101', hh='00', mm='00');
      
      -- View the partition that is generated after merging. 
      SHOW partitions intpstringstringstring;
      
      ds=20181101/hh=00/mm=00
      ds=20181101/hh=10/mm=00
      ds=20181101/hh=10/mm=10                    
    • Example 2: Merge specified partitions into one partition.

      -- Merge specified partitions into one partition. 
      ALTER TABLE intpstringstringstring MERGE if EXISTS PARTITION(ds='20181101', hh='00', mm='00'), PARTITION(ds='20181101', hh='10', mm='00'),  partition(ds='20181101', hh='10', mm='10') overwrite partition(ds='20181101', hh='00', mm='00') purge;
      -- View the partitions in the partitioned table. 
      SHOW partitions intpstringstringstring;
      
      ds=20181101/hh=00/mm=00

Clear data from a partition

Clears the data from the specified partition of a partitioned table.

MaxCompute allows you to clear the data in partitions that meet the specified filter condition. If you want to drop one or more partitions that meet a filter condition at a time, you can use an expression to specify the condition, use the condition to match partitions, and then clear data from the matched partitions.

  • Syntax

    • The filter condition is not specified.

      TRUNCATE TABLE <table_name> PARTITION <pt_spec>[, PARTITION <pt_spec>....];
    • The filter condition is specified.

      TRUNCATE TABLE <table_name> PARTITION <partition_filtercondition>;
  • Parameters

    Parameter

    Required

    Description

    table_name

    Yes

    The name of the partitioned table of which you want to clear partition data.

    pt_spec

    Yes

    The partition in which you want to clear data. Format: (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...). partition_col indicates the names of the partition key columns. partition_col_value indicates the values of the partition key columns. The names of partition key columns are not case sensitive, but their values are case sensitive.

    partition_filtercondition

    No

    The filter condition. This parameter is required when you specify the filter condition. It is not case-sensitive. Format:

    partition_filtercondition
        : PARTITION (<partition_col> <relational_operators> <partition_col_value>)
        | PARTITION (scalar(<partition_col>) <relational_operators> <partition_col_value>)
        | PARTITION (<partition_filtercondition1> AND|OR <partition_filtercondition2>)
        | PARTITION (NOT <partition_filtercondition>)
        | PARTITION (<partition_filtercondition1>)[,PARTITION (<partition_filtercondition2>), ...]

    Parameters:

    • partition_col: the name of the partition key column.

    • relational_operators: the relational operator. For more information, see Operator.

    • partition_col_value: A comparison value or regular expression. The data type of this value must be the same as the data type of the partition key column.

    • scalar(): a scalar function. The scalar function generates a scalar based on the input value, processes the values in the column specified by partition_col, and uses relational operators specified by relational_operators to compare the processed values with the value specified by partition_col_value.

    • The filter conditions support the logical operators NOT, AND, and OR. You can use PARTITION (NOT <partition_filtercondition>) to obtain the complementary set of the filter conditions that you specified. You can use PARTITION (<partition_filtercondition1> AND|OR <partition_filtercondition2>) to obtain the condition that is used to match partitions.

    • Multiple PARTITION (<partition_filtercondition>) clauses are supported. If these clauses are separated by commas (,), the logical relationship between the clauses is OR. The filter condition is obtained based on the OR logical relationship and used to match partitions.

  • Examples

    • The filter condition is not specified.

      -- Clear the data from a partition of the sale_detail table. The partition stores the sales record in the China (Hangzhou) region in December 2013. 
      TRUNCATE TABLE sale_detail PARTITION(sale_date='201312',region='hangzhou');
      
      -- Clear the data from two partitions of the sale_detail table. The partitions store the sales records in the China (Hangzhou) and China (Shanghai) regions in December 2013. 
      TRUNCATE TABLE sale_detail PARTITION(sale_date='201312',region='hangzhou'),  PARTITION(sale_date='201312',region='shanghai');
    • The filter condition is specified.

      -- Clear the data from multiple partitions of the sale_detail table. The partitions store the sales records whose sales dates start with 2013 in the China (Hangzhou) region. 
      TRUNCATE TABLE sale_detail PARTITION(sale_date LIKE '2013%' AND region='hangzhou');
    • Clear the data from a non-partitioned Delta table.

      -- Clear the data from a non-partitioned Delta table. The table must be a non-partitioned table. Otherwise, an error is reported.
      TRUNCATE TABLE mf_tt2; 

References

For more information about table operations, see the following topics: