All Products
Search
Document Center

Hologres:Update the data storage format of existing column-oriented tables

Last Updated:Sep 02, 2024

For Hologres instances of V0.10 or later, the data storage format of column-oriented tables is updated to AliORC. AliORC provides better performance in terms of the volume of stored data and thus can reduce storage costs. This topic describes how to update the data storage format of existing column-oriented tables in Hologres.

Limits

Take note of the following limits when you update the data storage format:

  • This feature is supported only in Hologres V0.10 and later. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your Hologres instance is earlier than V0.10, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Upgrade instances. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

  • The data storage format of existing column-oriented tables is segment, whereas the data storage format of row-oriented tables is sst. For column-oriented tables that are newly created in Hologres instances of V0.10 or later, the data storage format is updated to AliORC. The field value orc indicates the AliORC format. Hologres V0.10 and later allow you to update the data storage format of existing column-oriented tables, but the following limits are imposed:

    • In Hologres V0.10 and later, you can update the data storage format of only one column-oriented table at a time. Batch operations are not supported.

    • If a column-oriented table is a partitioned table and you want to update the data storage format of the table, you need to only update the data storage format of the relevant parent table.

    • After you perform the update operation, data of the column-oriented table in the previous storage format is asynchronously switched to a table in the updated storage format by using background threads. Such asynchronous switchover performed at the backend consumes system resources, such as CPU, I/O, and network resources. During the switchover, the performance of online tasks may be affected. In addition, the time required for the switchover varies based on the size of existing data and the size of data written in real time. Therefore, we recommend that you update the data storage format during off-peak hours.

  • You cannot convert a column-oriented table to a row-oriented table by updating the data storage format of the original table. If you want to use a row-oriented table, you must create a table by using the required schema.

Query the data storage format

You can execute the following SQL statement to query the data storage format of each table in the specified database.

  • Sample statement

    SELECT  *
    FROM    hologres.hg_table_properties
    WHERE   property_key = 'storage_format'
    ;
  • Query result

     table_namespace |     table_name     |  property_key  | property_value 
    -----------------+--------------------+----------------+----------------
     public          | part               | storage_format | segment
     public          | partsupp           | storage_format | segment
     public          | customer           | storage_format | segment
     public          | orders_row         | storage_format | sst
     public          | sp_orders          | storage_format | segment
     public          | sp_orders_20161231 | storage_format | segment
     public          | sp_orders_20171231 | storage_format | segment
     public          | sp_orders_20181231 | storage_format | segment
     public          | lineitem           | storage_format | segment
     public          | nation             | storage_format | orc
     public          | region             | storage_format | orc
     public          | supplier           | storage_format | orc
     public          | orders             | storage_format | orc
    (13 rows)
  • Description of field values

    Field value

    Description

    segment

    The default data storage format of column-oriented tables in Hologres instances of versions earlier than V0.10.

    sst

    The default data storage format of row-oriented tables.

    orc

    The default data storage format of column-oriented tables in Hologres instances of V0.10 and later. If the value of the property_value field is orc, the data storage format of the table is AliORC.

Update the data storage format

After you upgrade your Hologres instance to V0.10, newly created column-oriented tables in the instance adopt the AliORC format by default. However, you must execute the following statement to update the data storage format of existing column-oriented tables:

call set_table_property ('table_name', 'storage_format', 'orc');

Replace the table_name parameter with the name of a table. For example, you can specify the table named part. After you execute the following statement, the data storage format of the part table is updated to AliORC.

call set_table_property ('public.part', 'storage_format', 'orc');

Convert the segment format of column-oriented tables to the AliORC format in batches

Hologres V1.3 and later do not allow you to set the storage format of column-oriented tables to segment when you create the tables. We recommend that you convert the storage format to AliORC at the earliest opportunity. Format conversion of column-oriented tables involves the following scenarios:

  • Newly created column-oriented tables

    In Hologres V1.3.29 and later, if the format of newly created column-oriented tables is specified as segment, Hologres automatically converts the segment format of the tables to the AliORC format. Therefore, when you execute statements to create the tables in the segment format, the following message is returned:

    NOTICE:storage format 'segment' is no longer supported,converted to 'orc'
  • Existing column-oriented tables in the segment format

    For existing column-oriented tables in the segment format, you can perform the following steps to convert the segment format of the tables to the AliORC format in batches:

    1. Query existing column-oriented tables in the segment format in the current database.

      Note

      If the query result is empty, the column-oriented tables in the current database do not require format conversion, or the column-oriented tables that require format conversion are being processed at the backend.

      select * from hologres.hg_table_properties
      where property_key = 'storage_format' and property_value like '%segment%';
    2. Convert the segment format of the existing column-oriented tables to the AliORC format in batches.

      In Hologres V1.3.35 and later, you can execute the following statement to convert the segment format of existing tables into the AliORC format in batches:

      call hg_convert_segment_orc([<BATCH_SIZE>]);

      The BATCH_SIZE parameter is optional. This parameter specifies the number of tables for which the data storage format is converted at the same time. The value of this parameter can be left empty, and the default value is 100.

      Statement description:

      • The call hg_convert_segment_orc statement is asynchronously executed. Therefore, after you execute this statement, a success message is returned but the format is being converted at the backend. Format conversion consumes CPU and memory resources and affects the online business. Therefore, we recommend that you perform format conversion during off-peak hours. Format conversion takes several minutes to several hours to complete. You can check the CPU utilization and memory usage metrics in the Hologres console. If the values of the metrics decrease, the format conversion process is complete.

      • If the query result is empty, the tables in the current database do not require format conversion, or the tables that require format conversion are being processed at the backend.

      • If a message similar to the message continue to call hg_convert_segment_orc() if you want to convert all tables in current database is returned, some tables require format conversion. In this case, you can continue to execute the statement to convert the segment format of the column-oriented tables based on the cluster load until the query result is empty.