This topic describes how to optimize storage costs in terms of data partitions, table lifecycles, and the periodic deletion of deprecated tables.
- Properly configure data partitions.
- Configure reasonable lifecycles for tables.
- Periodically delete deprecated tables.
Properly configure data partitions
- If the minimum period for data collection is one day, we recommend that you use the date field as a partition field. The system migrates data to the specified partitions every day. Then, it reads the data from the specified partitions for subsequent operations.
- If the minimum period for data collection is one hour, we recommend that you use the combination of the date and hour fields as a partition field. The system migrates data to the specified partitions every hour. Then, it reads the data from the specified partitions for subsequent operations. If data that is collected on an hourly basis is partitioned based on dates, data in each partition is appended every hour. As a result, the system reads large amounts of unnecessary data, which increases storage costs.
You can use partition fields based on your business needs. In addition to the date and time fields, you can use other fields that have a relatively fixed number of enumerated values, such as channel, country, or province. Alternatively, you can use a combination of time and other fields as a partition field. We recommend that you specify two levels of partitions in a table. Each table supports a maximum of 60,000 partitions.
Configure reasonable lifecycles for tables
When you create a table, you can configure its lifecycle based on data usage. MaxCompute deletes data that exceeds the lifecycle threshold in a timely manner. This saves storage space.
CREATE TABLE test3 (key boolean) PARTITIONED BY (pt string, ds string) LIFECYCLE 100;
The lifecycle takes a partition as the smallest unit. If some partitions in a partitioned table reach the lifecycle threshold, these partitions are deleted. Partitions that do not reach the lifecycle threshold are not affected.
ALTER TABLE table_name SET lifecycle days;
Periodically delete deprecated tables
- Tables that are not accessed within the last three months
- Non-partitioned tables that are not accessed within the last month
- Tables that do not consume storage resources