All Products
Search
Document Center

Alibaba Cloud Linux:Code duptext feature

Last Updated:Mar 14, 2024

In a non-uniform memory access (NUMA) architecture, especially on an Arm-based Elastic Compute Service (ECS) instance, each NUMA node has local memory. When a program or process on one NUMA node needs to access code snippets on other NUMA nodes, cross-node access causes additional latency and performance overheads. To resolve the preceding issues, you can use the code duptext feature to copy code snippets from a remote node to an on-premises node.

Background information

  • Code snippet cross-node access and impacts on the system

    • Code snippet cross-node access

      NUMA is a computer architecture that is designed to resolve issues such as memory access latency and bandwidth load imbalance in multiprocessor systems. In the NUMA architecture, a node consists of processor cores and memory subsystems. Each node has local memory. One NUMA node may run an application or process whose code snippets are stored on another NUMA node. This occurs because the operating system may allocate memory to multiple nodes to resolve insufficient memory issues or meet load balancing requirements. In the preceding cross-node access scenario, the application or process must obtain instructions from the code snippets that are stored on another node.

    • Impacts on the system

      Cross-node access to code snippets may cause specific performance issues, such as increased access latency, reduced execution efficiency, and increased memory bandwidth usage. The preceding issues occur because cross-node access requires data transfer over a bus or interconnection network and memory access to remote nodes. These operations are slower than the operations involved in on-premises memory access. We recommend that you do not perform operations that require cross-node access.

  • Principles of the code duptext feature that resolves issues caused by cross-node access to code snippets

    When cross-node access is detected for a code snippet, the code duptext feature creates a copy for the code snippet from a remote node and stores the copy on an on-premises node. This way, an application or process on the on-premises node can access the code snippet copy without cross-node access, which prevents additional latency and memory overheads. The following figure shows the workflow of the code duptext feature.

    1. Process 1 on node 1 accesses a libc.so snippet on node 0, or process 0 on node 0 accesses a test snippet on node 1.

    2. The code duptext feature is used to create a libc.so copy on node 1 and a test copy on node 0.

    3. Process 1 can access the libc.so copy, and process 0 can access the test copy, which eliminates cross-node access.

    Note

    The kernel manages memory at the page-level granularity. Each page is 4 KB in size. The code snippets of a program are stored on one or more memory pages. The first time the code snippets of a program are read into memory from a hard disk and added to the page cache of a memory page, the page on which the code snippets are stored is called the main page. If the kernel detects that a program on an on-premises node accesses code snippets on other nodes, the kernel uses the code duptext feature to create a copy of the code snippets on the on-premises node at the page-level granularity. The page that stores the code snippet copy is called a subpage.

    image

Limits

Only the following instance types and images support the code duptext feature:

  • Instance type: ECS Bare Metal Instance families. For more information, see Overview.

  • Image: Alibaba Cloud Linux 3 images that run kernel version 5.10.112-11 or later.

    Note

    To query the kernel version of an image, run the uname -r command.

Enable or disable the code duptext feature

The code duptext feature can be controlled globally or by using the memcg. The kernel can use the code duptext feature for processes only when both the global switch and memcg switch are turned on to enable the code duptext feature.

Switch

Description

/sys/kernel/mm/duptext/enabled

The global switch is used to control whether the code duptext feature is enabled for the current kernel system. Valid values: 0 and 1. Default value: 0.

  • 1: enables the code duptext feature.

  • 0: does not enable or disables the code duptext feature.

    Note

    When the code duptext feature is disabled, the kernel automatically clears all subpages on an instance.

/sys/fs/cgroup/memory/<memcg directory name>/memory.allow_duptext

When the global switch is turned on, the memcg switch can be used to control whether the code duptext feature is enabled for processes in each memcg. Valid values: 0 and 1. Default value: 0.

  • 1: enables the code duptext feature for processes in each memcg.

  • 0: does not enable the code duptext feature for processes in each memcg.

Note

In addition to using the preceding switches, you can use the following methods to query subpage statistics.

  • Query the nr_duptext field in the /proc/vmstat file or the DupText field in the /proc/meminfo file to view the subpage statistics on an instance.

    • nr_duptext indicates the number of subpages marked as duptext in the kernel.

    • DupText indicates the amount of memory that stores duptext data, in KB. A typical memory page is 4 KB in size.

  • Query the /proc/pid/smaps file to view the subpage statistics of processes.

Use the code duptext feature

In this example, a test program named test.c is compiled and executed on an ECS instance that has two NUMA nodes.

  1. Connect to the ECS instance.

    For more information, see Connect to a Linux instance by using a password or key.

  2. (Optional) Run the following command to view information about the NUMA nodes of the ECS instance:

    numactl -H
    Note

    If you do not install the numactl tool, run the sudo yum install numactl command to install the tool.

    The following figure shows that the instance has two NUMA nodes: node 0 and node 1.

    image.png

  3. Run the following command to compile the test program and generate an executable file.

    In this example, the source code file of test.c is compiled on node 1, and node 1 generates the page cache of the test file.

    numactl -N 1 -m 1 gcc test.c -o test
  4. Run the following command to turn on the global switch for the code duptext feature:

    sudo sh -c 'echo 1 > /sys/kernel/mm/duptext/enabled'
  5. Run the following commands to create a memcg directory and enable the duptext feature for the memcg:

    sudo mkdir /sys/fs/cgroup/memory/test
    sudo sh -c 'echo 1 > /sys/fs/cgroup/memory/test/memory.allow_duptext'
  6. Run the following command to use the code duptext feature to avoid cross-node access.

    In this example, the cgexec and numactl tools are used to run the executable file named test and bind the process to node 0. In this case, a copy of the test-related code snippet is created on node 0. The test program can access the code snippet copy on node 0, without cross-node access.

    sudo cgexec -g "memory:test" numactl -N 0 -m 0 ./test
    Note

    If you do not install the cgexec tool, run the sudo yum install -y libcgroup-tools command to install the tool.

  7. Run the following command to view statistics about subpages of the test program:

    sudo cat /proc/$(pidof test)/smaps

    The following sample command output shows statistics about the subpages of the test program. A code copy of the test program is generated on node 0.

    image.png

    Note

    You can also run the following commands to view the subpage statistics on the instance:

    cat /proc/vmstat | grep -i duptext
    cat /proc/meminfo | grep -i duptext

Disable the code duptext feature

You can disable the code duptext feature based on your business requirements. When the code duptext feature is disabled, the kernel automatically clears all subpages on the instance.

  1. Connect to the ECS instance.

    For more information, see Connect to a Linux instance by using a password or key.

  2. Run the following command to disable the code duptext feature:

    sudo sh -c 'echo 0 > /sys/kernel/mm/duptext/enabled'
  3. Run the following command to verify that the code duptext feature is disabled:

    cat /proc/vmstat | grep -i duptext
    cat /proc/meminfo | grep -i duptext

    The following sample command output shows that all subpages on the instance are cleared, which indicates that the code duptext feature is disabled.

    image.png