Causes and troubleshooting of OOM errors in Hologres - Hologres

If the memory resources that are consumed by queries exceed the system quota, the system reports an out of memory (OOM) error. This topic describes how to query memory usage, analyze high memory usage, and identify OOM errors. This topic also provides causes of and solutions to OOM errors.

How do I analyze memory usage?

View the memory usage
- You can view the total memory usage of an instance in the Hologres console. The memory usage values of all nodes in a Hologres instance are aggregated. For more information, see Hologres metrics.
- The memory_bytes field in a slow query log displays the memory usage of a single query. The field values are only for reference and may be imprecise. For more information, see Query and analyze slow query logs.
Deal with high memory usage
You can view the Memory Usage and Instance Memory Distribution Usage metrics in the Hologres console. For more information, see Hologres metrics. The memory usage of an instance is considered high if the memory usage remains over 80% for a long period of time. In Hologres, memory resources are reserved. Even if no queries are performed, some metadata, index data, and cached data still consume memory resources. The metadata helps improve the computing speed. In this case, a memory usage of 30% to 50% is normal. If the memory usage continues to rise or even approaches 100%, system performance, instance stability, and instance performance may degrade. This section describes the possible causes, impacts, and solutions to this issue:
- Causes
  - Metadata occupies excessive memory.
    As the number of tables and data volume grow, the memory resources that are consumed by metadata also increase. In this case, even if no queries are performed, the memory usage still grows. We recommend that you include no more than 10,000 tables, including child partitioned tables but not foreign tables, in each table group. The more shards a table group contains, the more parts and metadata are generated, and the more memory resources are consumed by metadata.
  - Excessive memory resources are consumed for computing.
    A large amount of data needs to be scanned or the computing logic is extremely complex in the following scenarios: a large number of Count Distinct operations, complex JOIN operations, GROUP BY operations based on multiple fields, and window operations. In this case, excessive memory resources are consumed for computing.
- Impacts
  - Impacts on stability
    If a large amount of metadata occupies the memory space available for queries, errors such as SERVER_INTERNAL_ERROR, ERPC_ERROR_CONNECTION_CLOSED, and Total memory used by all existing queries exceeded memory limitation may occur during queries.
  - Impacts on performance
    If a large amount of metadata occupies the cache space available for queries, the number of cache hits decreases, and the query latency increases.
- Solutions
  - If the memory usage of your instance is high due to excessive metadata, manage tables based on the hg_table_info table. For more information, see Query and analyze table statistics. We recommend that you delete unnecessary data and tables and reduce table partitioning operations to free up memory resources.
  - If the memory usage of your instance is high due to complex computing, we recommend that you separately optimize the SQL statements for data reads and data writes. For more information, see the What can I do if an OOM error is reported for a query? and What can I do if an OOM error is reported during data import or export? sections in this topic.
  - Scale out your instance to increase the computing and storage resources. For more information, see Instance list.

How do I identify OOM errors?

If the size of memory resources consumed for computing reaches or exceeds 20 GB, an OOM error is reported. The following error message is an example:

Total memory used by all existing queries exceeded memory limitation. 
memory usage for existing queries=(2031xxxx,184yy)(2021yyyy,85yy)(1021121xxxx,6yy)(2021xxx,18yy)(202xxxx,14yy); Used/Limit: xy1/xy2 quota/sum_quota: zz/100

The error contains the following parts:

queries=(2031xxxx,184yy)
This part is in the format of queries=(query_id,Consumed memory size). For example, queries=(2031xxxx,18441803528) indicates that the query with the ID 2031xxxx consumes 18 GB of memory of a node. The error information displays the IDs of top five queries that consume the most memory resources. You can view details about the queries by following the instructions in Query and analyze slow query logs.
Used/Limit: xy1/xy2
In this part, Used/Limit represents Consumed memory resources for computing/Maximum memory resources for computing of a node. The two values are both in bytes. Consumed memory resources for computing indicates the total memory resources that are consumed by all ongoing queries for computing on the node. For example, Used/Limit: 33288093696/33114697728 indicates that all ongoing queries consume 33.2 GB of memory of the node and the upper limit on memory resources for computing in the node is 33.1 GB. As a result, an OOM error occurs.
quota/sum_quota: zz/100
In this part, quota indicates a resource group, and zz indicates the percent of resources that are allocated to the resource group. For example, quota/sum_quota: 50/100 indicates that a resource group is configured for a Hologres instance and 50% of the total instance resources are allocated to the resource group.

What is the cause of an OOM error?

Some systems may cache data to the disk to implement the Spill to Disk mechanism if memory resources are insufficient. To improve query efficiency, Hologres uses memory resources for all operators in queries. As a result, OOM errors occur due to insufficient memory resources.

Allocation of and limits on memory resources

A Hologres instance is a distributed system that contains multiple nodes. The number of nodes in an instance varies based on the instance type. For more information, see Instance specifications.

Each node in a Hologres instance is allocated with 16 CPU cores and 64 GB of memory. If memory resources of a node are insufficient during a query, an OOM error occurs. Memory resources are consumed by queries, backend processes, cached data, and metadata. In Hologres versions earlier than V1.1.24, up to 20 GB of memory can be used for a worker node. From Hologres V1.1.24, this limit is removed, and the system dynamically adjusts the memory resources. The system regularly checks the memory usage of nodes and allocates available memory resources for queries as much as possible if a small amount of metadata exists. This way, sufficient memory resources can be reserved for queries.

What can I do if an OOM error is reported for a query?

In queries, an OOM error may be reported due to one of the following reasons:
- The execution plan is improper. The collected statistics are invalid, or the join order is invalid.
- A large number of queries are executed at the same time, and each query consumes a large amount of memory resources.
- The query is complex, or a large amount of data needs to be scanned.
- The query contains the UNION ALL keyword, which increases executor parallelism.
- A resource group is configured for the instance, but only few resources are allocated to the resource group.
- A data skew issue occurs or shard pruning is enabled, which leads to an unbalanced load. Some nodes have insufficient memory resources.

This section provides more details of each cause and the relevant solutions:

Insufficient resources allocated to a resource group
Use the Serverless Computing feature to perform queries. The Serverless Computing feature allows you to use serverless computing resources rather than instance exclusive resources to perform queries. The Serverless Computing feature provides more computing resources than the amount of exclusive resources of an instance, and queries do not need to compete for resources. This helps resolve OOM issues. For more information about the Serverless Computing feature, see Overview of Serverless Computing. For more information about how to use the Serverless Computing feature, see User guide on Serverless Computing.

Improper execution plan

Type 1: Invalid statistics
Execute the explain <SQL> statement to query the execution plan. rows=1000 shown in the following figure indicates that statistics are missing or invalid. In this case, the generated execution plan is improper, and a large amount of memory resources are consumed for computing. As a result, an OOM error occurs.
Solutions:
- Execute the analyze <tablename> statement to update statistics.
- Use the auto-analyze feature to enable the automatic update of statistics. For more information, see ANALYZE and auto-analyze.

Type 2: Invalid join order

When you perform a hash join operation on two tables, it is reasonable to generate the hash table based on the table with a smaller amount of data. You can execute the explain <SQL> statement to query the execution plan. If the table with a smaller amount of data is displayed above the table with a larger amount of data, as shown in the following figure, the table with a larger amount of data is used to generate the hash table. In this case, the join order is invalid, and an OOM error is likely to occur. Possible causes:

Table statistics are not updated in a timely manner. In the example shown in the following figure, the number of data rows of the first table in a red rectangle is not updated before the execution plan is generated. As a result, rows=1000 is displayed, which is invalid.
The optimizer fails to generate the optimal execution plan.

Solutions:

Execute the analyze <tablename> statement on tables that you want to join to update statistics in time. This way, a valid join order can be generated.

Modify the relevant Grand Unified Configuration (GUC) parameter if the generated join order is still invalid after you execute the analyze <tablename> statement. For example, you can configure the optimizer_join_order=query setting to enable the optimizer to determine the join order based on the SQL statement. This method is suitable for complex queries.

SET optimizer_join_order = query;
SELECT * FROM a JOIN b ON a.id = b.id; -- Use the table b as the build side of the hash table.

You can also adjust the join order based on your business requirements.

Syntax

Description

set optimizer_join_order = <value>

Specifies the join order algorithm to be used by the optimizer. Valid values:

query: uses the join order that is specified in the SQL statement. This algorithm has the lowest optimizer overheads.
greedy: uses a greedy algorithm to obtain a better join order. This algorithm has moderate optimizer overheads.
exhaustive: uses a dynamic planning algorithm to obtain the optimal join order. This is the default value and generates the optimal execution plan. However, this algorithm has the highest optimizer overheads.

Type 3: Inaccurate data size estimation
A join operation usually uses a small table or a subquery with a small data size as the build side to generate a hash table. This optimizes the performance and saves memory resources. However, if a query is too complex or statistics are invalid, the estimated data size is inaccurate, and a large table or a subquery with a large data size is used as the build side. In this case, excessive memory resources are consumed to generate a hash table, and an OOM error occurs.
In the following figure, Hash (cost=727353.45..627353.35 , rows=970902134 width=94) indicates the build side, and rows=970902134 indicates the estimated data size that is used to generate a hash table. If the actual data size is smaller, the estimation result is inaccurate.
Solutions:
- Check whether statistics are updated for the table on which the subquery is performed or whether the existing statistics are valid. If they are invalid, execute the analyze <tablename> statement.
- Use the following GUC parameter to disable data size estimation for hash tables.
  Note
  By default, this estimation is disabled. It may be enabled for optimization purposes. You can disable it as required.
```
SET hg_experimental_enable_estimate_hash_table_size =off;
```
Type 4: Broadcasting of a large table
A broadcast operation replicates the data of a table to all shards. The Broadcast Motion operator is superior only in scenarios where both the number of shards and the number of broadcast tables are small. In the execution plan of join operations, a broadcast operation is performed first. A hash table is generated after the build side data is broadcast. In this case, the data that is used to generate the hash table in each shard is the full build side data. If a large number of shards exist or the data size is large, excessive memory resources are consumed, which leads to an OOM error.
In the execution plan shown in the following figure, the estimated number of data rows is 1 and 80 data rows are used for broadcasting. However, the table contains 80,000,000 data rows. All these data rows are broadcast during actual operations, which consumes excessive memory resources. As a result, an OOM error occurs.
Solutions:
- Check whether the estimated number of data rows in the execution plan is valid. If the estimated number is invalid, execute the analyze <tablename> statement to update statistics.
- Use the following GUC parameter to disable broadcasting and use the redistribution operator instead.
```
SET optimizer_enable_motion_broadcast = off;
```

High query concurrency
The query concurrency is high if only a small size of memory is consumed for a single query but the queries per second (QPS) metric shows a prominent value increase or an error similar to the following one is returned: HGERR_detl memory usage for existing queries=(2031xxxx,184yy)(2021yyyy,85yy)(1021121xxxx,6yy)(2021xxx,18yy)(202xxxx,14yy);. In this case, you can use the following solutions:
- Reduce the concurrency of data writes. For more information, see the What can I do if an OOM error is reported during data import or export? section in this topic.
- Implement multi-instance high-availability deployment. For more information, see Configure read/write splitting for primary and secondary instances (shared storage).
- Increase the number of worker nodes for your instance.
Complex queries
If an OOM error occurs for a single query because the query is complex or a large amount of data needs to be scanned, you can use the following solutions:
- Write cleansed data to Hologres to implement pre-computing. This prevents extract, transform, and load (ETL) operations on a large amount of data in Hologres.
- Add filter conditions to the query.
- Use a fixed plan or the COUNT DISTINCT operator to optimize SQL statements. For more information, see Optimize query performance.
UNION ALL
The following sample statement contains a large number of subqueries that use the UNION ALL keyword. In this case, the executor needs to concurrently process each subquery, which requires excessive memory resources. As a result, an OOM error occurs.
```
subquery1 UNION ALL subquery2 UNION ALL subquery3 ...
```
You can use the following GUC parameters to force the executor to perform serial execution. This lowers the risk of OOM but reduces the query speed.
```
SET hg_experimental_hqe_union_all_type=1;
SET hg_experimental_enable_fragment_instance_delay_open=on;
```
Improper resource group configurations
If an error message similar to the following one is returned, the resource group settings are improper: memory usage for existing queries=(3019xxx,37yy)(3022xxx,37yy)(3023xxx,35yy)(4015xxx,30yy)(2004xxx,2yy); Used/Limit: xy1/xy2 quota/sum_quota: zz/100. In the sample error message shown in the following figure, the value of zz is 10. This indicates that the resource group has only 10% of the instance resources.
Solution: Adjust the resource group quota to ensure that each resource group can have at least 30% of the instance resources.
Data skew or shard pruning
An OOM error may occur even if the overall memory usage of an instance is acceptable. This may be caused by high memory usage of one or more nodes in the instance when data skew or shard pruning occurs.
Note
Shard pruning is a pruning feature for queries. After shard pruning is enabled, only data in partial shards are scanned.
- Execute the following SQL statement to identify whether a data skew error occurs. In this statement, hg_shard_id is a built-in hidden field in a table and indicates the shard in which the table resides.
```
SELECT hg_shard_id, count(1) FROM t1 GROUP BY hg_shard_id;
```
- View the information about shard pruning from the execution plan. In this example, the value of Shard Selector is l0[1]. This indicates that only one shard is selected for the query.
```
-- The distribution key is x. You can quickly locate the shard based on the x=1 filter condition.
SELECT count(1) FROM bbb WHERE x=1 GROUP BY y;
```
Solutions:
- Use an appropriate distribution key to prevent data skew.
- Rebuild your business if a data skew error occurs.
High-cardinality multi-stage GROUP BY
In Hologres V3.0 and later, if you aggregate high-cardinality data in multiple stages and the distribution key is not a subset of the GROUP BY key, a large hash table is generated on each concurrent instance in the low-stage aggregation after GROUP BY key-based aggregations. As a result, the memory usage increases, and OOM issues may occur. You can configure the GUC parameter to allow aggregation operations in different stages.
```
-- Use the GUC parameter hg_experimental_partial_agg_hash_table_size to configure the maximum number of rows in a hash table. The maximum value is 8192. The default value is 0, which indicates that the number of rows in a hash table is not limited. 
SET hg_experimental_partial_agg_hash_table_size = 8192;
```

What do I do if an OOM error is reported during data import or export?

You can import and export data between two Hologres tables or between an internal table and a foreign table. For example, you can import data from a MaxCompute table to a Hologres table. In such scenarios, OOM errors may occur.

Data import or export by using the Serverless Computing feature
The Serverless Computing feature allows you to use serverless computing resources rather than exclusive resources of an instance to import or export data. The Serverless Computing feature provides more computing resources than the amount of exclusive resources of an instance, and tasks do not need to compete for resources. This helps resolve OOM issues. For more information about the Serverless Computing feature, see Overview of Serverless Computing. For more information about how to use the Serverless Computing feature, see User guide on Serverless Computing.

Tables with many columns or tables with large-sized columns and a high scan concurrency

When you import data from MaxCompute, if the source MaxCompute table contains a large number of columns or the table contains large-sized columns and has a high scan concurrency, an OOM error occurs. You can use GUC parameters to control the concurrency of data import and lower the risk of OOM.

Tables with many columns (common)

Note

Use the following GUC parameters together with your SQL statements. We recommend that you configure at least the first two parameters. If an OOM error persists, specify smaller parameter values.

-- Configure the maximum parallelism to read a MaxCompute table. The default value is the same as the number of CPU cores of the instance. The maximum value is 128. We recommend that you do not use a large value. This prevents your queries from affecting other queries and causing errors due to system overloading, especially in data import scenarios. This parameter takes effect in Hologres V1.1 and later. 
SET hg_foreign_table_executor_max_dop = 32;

-- Configure the amount of data to be read at a time from a MaxCompute table. Default value: 8192. 
SET hg_experimental_query_batch_size = 4096;

-- Configure the maximum number of DML statements that can be executed at the same time when you read data from a foreign table. Default value: 32. This parameter is specifically provided for data import and export scenarios to prevent import operations from occupying excessive system resources. This parameter takes effect in Hologres V1.1 and later. 
SET hg_foreign_table_executor_dml_max_dop = 16;

-- Configure the size of each shard of a MaxCompute table. Default value: 64. Unit: MB. The shard size affects the concurrency. If a table is large in size, you can increase the value of this parameter to prevent excessive shards from degrading query performance. This parameter takes effect in Hologres V1.1 and later. 
SET hg_foreign_table_split_size = 128;

Tables with large-sized columns and a high scan concurrency

If you have adjusted the values of the preceding parameters, but an OOM error persists, check whether the table has large-sized columns. If yes, adjust the parameter configurations.

-- Adjust the maximum number of batches that can be processed by a single shuffle operation. This parameter can reduce the amount of accumulated data in large-sized columns.
SET hg_experimental_max_num_record_batches_in_buffer = 32;

-- Adjust the amount of data to be read at a time from a MaxCompute table. Default value: 8192. 
SET hg_experimental_query_batch_size=128;

Excessive duplicate data in foreign tables
If a foreign table contains excessive duplicate data, data import is slow, and an OOM error may occur. You can determine the threshold based on your business requirements. For example, you have a foreign table that contains 100,000,000 data rows. You can determine that the table contains excessive duplicate data if 80,000,000 data rows are duplicate.
Solution: Deduplicate the data in the foreign table before data import or import the data in batches. This prevents a large amount of duplicate data from being imported at the same time.