Starting from Hologres V0.10, the data storage format for column-oriented tables is upgraded to AliORC. This format improves data compression and reduces storage costs. This topic describes how to change the data storage format of a column-oriented table in Hologres.
Limits
The following limits apply when you change the data storage format of a column-oriented table in Hologres:
-
This feature is supported only in Hologres V0.10 and later. Check your instance version on the instance details page in the Hologres console. If your instance version is earlier than V0.10, you must upgrade it. If you encounter errors during the upgrade, see Common errors during upgrade preparation or join the Hologres DingTalk group for feedback. For more information, see How do I get more online support?.
-
Before the upgrade to V0.10, the default storage format for column-oriented tables was segment. After the upgrade, new column-oriented tables default to the AliORC format, for which the parameter value is orc. Row-oriented tables default to the sst format. You can convert historical data from before the upgrade, but the following limits apply:
-
You can convert the storage format for only one table at a time. Batch conversion is not supported.
-
For a partitioned table, convert the data storage format of the parent table only.
-
After you change the storage format, a background thread asynchronously converts the data to the new format. This asynchronous background process consumes system resources, such as CPU, I/O, and network bandwidth. The conversion may affect the performance of online tasks. The duration of the conversion depends on the amount of historical data and the real-time write volume. Perform this operation during off-peak hours.
-
-
You cannot convert a column-oriented table into a row-oriented table by changing the storage format. To do this, create a new table with the new table schema.
View the data storage format
Use the following SQL statement to view the storage format of each table in the database:
-
Syntax example
SELECT * FROM hologres.hg_table_properties WHERE property_key = 'storage_format' ; -
Query result
table_namespace | table_name | property_key | property_value -----------------+--------------------+----------------+---------------- public | part | storage_format | segment public | partsupp | storage_format | segment public | customer | storage_format | segment public | orders_row | storage_format | sst public | sp_orders | storage_format | segment public | sp_orders_20161231 | storage_format | segment public | sp_orders_20171231 | storage_format | segment public | sp_orders_20181231 | storage_format | segment public | lineitem | storage_format | segment public | nation | storage_format | orc public | region | storage_format | orc public | supplier | storage_format | orc public | orders | storage_format | orc (13 rows) -
Metric description
Parameter
Description
segment
The default format for column-oriented tables in earlier versions of Hologres.
sst
The default format for row-oriented tables.
orc
The default format for column-oriented tables in Hologres V0.10 and later. A table with a `property_value` of `orc` uses the AliORC format for data storage.
Update the data storage format of a table
After you upgrade a Hologres instance to V0.10, new tables use the AliORC format by default. You must manually change the storage class for existing tables. Use the following command syntax:
call set_table_property ('table_name', 'storage_format', 'orc');
Replace table_name with the name of your table. The following example changes the storage class for the part table to the AliORC format.
call set_table_property ('public.part', 'storage_format', 'orc');
Batch convert tables from Segment to ORC format
Hologres V1.3 and later no longer support creating tables with the Segment storage format. Convert tables from Segment to ORC format as soon as possible. This change affects the following scenarios.
-
New tables
Starting from Hologres V1.3.29, if you attempt to create a table with the Segment format, the system automatically converts the format to ORC. When you execute the CREATE TABLE statement, you receive the following notice:
NOTICE:storage format 'segment' is no longer supported,converted to 'orc' -
Existing Segment tables
For existing tables that use the Segment format, follow these steps to perform a batch conversion to the ORC format.
-
View the tables that use the Segment format in the current database.
NoteIf the query returns an empty result, no tables in the current database need to be converted, or the conversion is already in progress for all required tables. No further action is required.
select * from hologres.hg_table_properties where property_key = 'storage_format' and property_value like '%segment%'; -
You can batch convert from Segment format to ORC format.
Hologres V1.3.35 and later support batch conversion of tables from Segment format to ORC format. The command syntax is as follows.
call hg_convert_segment_orc([<BATCH_SIZE>]);BATCH_SIZEis an optional parameter. It specifies the number of tables to convert concurrently. If you do not specify this parameter, the default value of 100 is used.Command description:
-
This command runs asynchronously. It returns a success message immediately while the format conversion runs in the background. The conversion process consumes CPU and memory resources, which causes related metrics to rise. This can affect your online services. Therefore, run the conversion during off-peak hours. The process usually takes from several minutes to several hours. You can monitor CPU and memory metrics. A drop in these metrics indicates that the conversion is complete.
-
If the command returns an empty result, no tables in the current database need to be converted, or the conversion is already in progress for all required tables. No further action is required. You can then proceed to the next database.
-
If the command returns a message such as
continue to call hg_convert_segment_orc() if you want to convert all tables in current database., more tables need to be converted. Continue to run the command based on your cluster's load until the command returns an empty result.
-
-