What is the resource manager feature - PolarDB - Alibaba Cloud Documentation Center

This topic describes the resource manager feature of PolarDB for PostgreSQL.

Prerequisites

Your PolarDB for PostgreSQL cluster runs the following engine:

PostgreSQL 14 (revision version 14.5.1.0 or later)
PostgreSQL 11 (revision version 1.1.1 or later)

Note

You can run the following statement to view the minor version of your PolarDB for PostgreSQL cluster:

PostgreSQL 14
```
select version();
```
PostgreSQL 11
```
show polar_version;
```

Background information

The memory of a PolarDB for PostgreSQL cluster contains the following parts:

Shared memory
Dynamic shared memory areas
Process global areas

The dynamic shared memory areas and process global areas are dynamically allocated. Their usage depends on the workloads of the cluster. Excessive usage of dynamic shared memory areas may cause that the operating system limit is reached. This triggers the kernel memory limit mechanism, crash of cluster processes, cluster restarts, and event cluster unavailability.

The memory context in a process global area can be further divided into the following parts:

Work-area memory: the memory required for business operation. This part of memory affects the normal operation of the business.
Cache memory: The database stores part of the internal metadata in the process. This part of memory only affects the database performance.

To resolve the preceding issues, PolarDB for PostgreSQL provides a resource manager feature to periodically check resource usage during cluster running. If a process exceeds the resource threshold, a resource limit is imposed to reduce the risks of cluster unavailability.

The resource manager feature can limit the following resources: memory, CPU, and I/O. Currently, only memory resources can be limited.

How it works

Memory limits depend on cgroups. If no cgroup is available, memory limits are blocked. As a background process of PolarDB for PostgreSQL , the resource manager feature periodically reads memory usage data from cgroups and uses the data as criteria for memory limits. When the memory usage of specific processes exceeds the specified threshold, the resource manager feature reads the memory usage records by process, sorts the memory sizes, and sends the interrupt process signal (SIGTERM) or cancel operation signal (SIGINT) to the processes whose memory usage exceeds the specified threshold in turn.

Memory limit method

The resource manager daemon is created when the cluster is started and works on the primary, read-only, and secondary nodes. You can change the resource manager behaviors by modifying the following parameters.

The resource manager feature sends SIGTERM signals to processes that use memory exceeding the threshold specified by using the resource manager parameter to terminate the processes and release memory. The following table describes the parameters.

Parameter	Description
enable_resource_manager	Specifies whether to enable the resource manager feature. Default value: on. Valid values: on off
stat_interval	The interval to check memory usage. Unit: milliseconds. Valid values: 10 to 10000. Default value: 500.
total_mem_limit_rate	The memory usage of the cluster in percentage. When the specified percentage is reached, a memory limit is imposed. Default value: 95.
total_mem_limit_remain_size	The reserved memory size of the cluster. When the specified value is reached, a memory limit is imposed. Unit: KB. Valid values: 131072 to MAX_KILOBYTES (the maximum integer value). Default value: 524288.
mem_release_policy	The policy that is used to limit memory resources. Valid values: `none`: no action. `default`: interrupts idle processes and then active processes. This is the default policy. `cancel_query`: interrupts active processes. `terminate_idle_backend`: interrupts idle processes. `terminate_any_backend`: interrupts all processes. `terminate_random_backend`: interrupts random processes.

Examples

When the session process receives the SIGTERM signal, the current process is terminated and the termination information is written to logs. Sample logs:

2022-11-28 14:07:56.929 UTC [18179] LOG:  [polar_resource_manager] terminate process 13461 release memory 65434123 bytes
2022-11-28 14:08:17.143 UTC [35472] FATAL:  terminating connection due to out of memory
2022-11-28 14:08:17.143 UTC [35472] BACKTRACE:
        postgres: primary: postgres postgres [local] idle(ProcessInterrupts+0x34c) [0xae5fda]
        postgres: primary: postgres postgres [local] idle(ProcessClientReadInterrupt+0x3a) [0xae1ad6]
        postgres: primary: postgres postgres [local] idle(secure_read+0x209) [0x8c9070]
        postgres: primary: postgres postgres [local] idle() [0x8d4565]
        postgres: primary: postgres postgres [local] idle(pq_getbyte+0x30) [0x8d4613]
        postgres: primary: postgres postgres [local] idle() [0xae1861]
        postgres: primary: postgres postgres [local] idle() [0xae1a83]
        postgres: primary: postgres postgres [local] idle(PostgresMain+0x8df) [0xae7949]
        postgres: primary: postgres postgres [local] idle() [0x9f4c4c]
        postgres: primary: postgres postgres [local] idle() [0x9f440c]
        postgres: primary: postgres postgres [local] idle() [0x9ef963]
        postgres: primary: postgres postgres [local] idle(PostmasterMain+0x1321) [0x9ef18a]
        postgres: primary: postgres postgres [local] idle() [0x8dc1f6]
        /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f888afff445]
        postgres: primary: postgres postgres [local] idle() [0x49d209]