The memory diagnostics feature can help you identify common memory issues in Container Service for Kubernetes (ACK) clusters, including memory leaks, memory fragmentation, and out of memory (OOM) errors. Diagnostic results are displayed in charts and tables, and the container caches and shared memory occupied by files in each folder are displayed to help you gain insights into the overall memory usage and make your O&M work easier. This topic introduces memory diagnostics.
Memory diagnostics consist of memory overview, memory analysis, and OOM analysis. You can view the memory usage of nodes and pods.
Memory overview
The memory overview feature displays diagnostic items related to memory risks. The following table describes the diagnostic items.
Diagnostic item | Description |
---|---|
Leaked Memory | Checks for system kernel memory leaks in the Slab, Vmalloc, and buddy system (allocpage). |
Memory Usage | Displays the utilization of system memory. |
Memcg | Evaluates whether the unreleased memory cgroups compromise system performance and cause statistical errors. |
Memory Fragmentation | Checks for memory fragmentation, which compromises system performance. |
THPZeroPage | Evaluates the ratio of THP waste. |
- Kernel memory (kernel): the total amount of memory used by the operating system kernel.
- Application memory (app): the total amount of memory used by programs in user mode.
- Free memory (free): the amount of free system memory.
Terms
Term | Description |
---|---|
memory leaks | Memory leaks refer to the release of memory resources that are dynamically allocated to programs, which causes the system memory utilization to increase. Memory leaks can compromise the performance of programs or even cause system crash. |
memory utilization | The following formula is used to calculate memory utilization: Memory utilization = (Total memory - Free memory) × 100/Total memory. File caches are free memory, which does not affect the memory utilization. |
unreleased Memcg | Memory cgroups that are not released due to system exceptions. These memory cgroups may compromise system performance. |
memory fragmentation | Memory fragmentation refers to the failure to fulfil the contiguous memory allocation request because free contiguous memory blocks are too small after the system has been running for a long period of time. The failure delays memory allocation and causes business jitters. |
ratio of THP waste | Transparent Huge Pages (THPs) are huge pages whose size is 2 MiB or 1 GiB in the kernel. The size of a subpage is 4 KiB. When THPs are enabled, the kernel dynamically allocates THPs to reduce Translation Lookaside Buffer (TLB) misses and improve application performance. However, THPs may cause memory bloat. The kernel allocates 2 MiB blocks of memory as THPs, which are equivalent to 512 subpages. This causes memory waste and results in memory overcommitment. Memory bloat may lead to OOM errors. For example, when an application that requests only 8 KiB of memory (2 subpages) is assigned a 2-MiB THP, the remaining 510 subpages are zero subpages, which result in a waste of resident set size (RSS) and cause an OOM error. Ratio of THP waste = Number of zero THPs × 100%/Total number of THPs |
buddy system | The buddy system is an algorithm used by the Linux kernel to manage memory pages. It divides memory pages into 11 groups. In most cases, a memory page is 4 KB in size. The buddy system manages the number of memory pages in each memory block in the power of two increments, such as 4 KB, 8 KB, 16 KB, 32 KB……4 MB. |
Slab | A memory allocator that allocates small pieces of memory based on the buddy system of Linux. |
Vmalloc | A memory allocator that uses nonlinear mapping based on the buddy system of Linux. |
filecache | When Linux reads or writes a file, it caches the file content in memory. This way, programs can directly read or write the content in memory, which is much faster than reading or writing the file. |
anonymous memory | Anonymous memory is dynamically allocated to the heap and stack of a process through new, malloc, or mmap. Anonymous memory is not backed by a file system. |
shared memory | A memory block shared by two or more processes for communication. |
tmpfs | A temporary file system of Linux based on memory. The file system caches the content that it reads or writes in memory. |
hugetlb | The amount of memory consumed by huge pages in a file system. |
Kernel memory
In most cases, memory leaks occur if the memory usage of Sunreclaim and the buddy system is abnormal. Pay close attention to their memory usage in kernel mode.
Metric | Description |
---|---|
Sreclaimable | Memory that can be reclaimed by the Slab. |
Sunreclaim | Memory that cannot be reclaimed by the Slab. |
PageTables | Memory occupied by kernel page tables. |
Vmalloc | Memory allocated by calling the Vmalloc function. |
KernelStack | Total memory occupied by the heap and stack of a process. |
AllocPages | Memory allocated from the buddy system by calling functions such as alloc_pages. The memory cannot be retrieved by using any node file. Excessive use of the memory causes a blackhole. |
Application memory
You need to pay close attention to anonymous memory, shared memory, and file caches when you view the memory usage of applications in user mode.
Metric | Description |
---|---|
filecache | File caches that can be reclaimed by performing drop caches. |
anon | The anonymous memory occupied by the heap and stack of a program. If a large amount of anonymous memory is occupied, you need to check for memory leaks in the process and check whether THPs are enabled. |
mlock | Memory locked by the system. |
huge | Memory occupied by huge pages. |
buffer | The memory occupied by the metadata of the block device and file system. |
shmem | Shared memory (tmpfs). If the tmpfs file is not deleted after the process is terminated or the tmpfs file is deleted while the file is open, shared memory leaks occur. |
Memory analysis
Memory analysis consists of process memory analysis and pod memory analysis.
Process memory
Memory usage information is displayed by process, including anonymous memory, file caches, and shared memory.
Pod memory
The pod memory analysis feature allows you to view the files that occupy the file caches and shared memory of containers and pods, the ratio of active caches, and the ratio of inactive caches.
Diagnostic item | Description |
---|---|
Pod | The name of the pod. |
Container | The name of the container. |
File | The full path of the file, which includes the file name. |
Cache | The file cache (filecache) occupied by the file. |
Container Cache | The container cache occupied by the file. Different processes in a container may manage the same file. |
Active Cache | The file cache that is in use. |
Inactive Cache | The file cache that is not in use. |
OOM analysis
The OOM analysis feature can quickly diagnose OOM errors and display the following diagnostic items.
Diagnostic item | Description |
---|---|
OS OOM Count | The total number of OOM errors that have occurred from the time when the host starts up to the time when the diagnostic is performed. |
Available Memory | The amount of free system memory. |
Low Watermark | The specified low memory usage threshold. When the memory usage drops below the low threshold, an asynchronous memory reclaim operation is triggered. |
Container | The name of the pod, ID of the container, or name of the cgroup. |
limit | The memory limit of the container. |
usage | The amount of memory used by the container. |
OOM Count | The total number of OOM errors that have occurred in the container. |
OOM Type | The type of OOM error, which can be Host or cgroup. |