By Chao Qian (Xixie)
Memory is an essential part of a server, regardless of whether it is a physical machine or virtual machine. But the question is, how is memory performance measured?
Currently, most new CPUs have three levels of cache, including L1 cache (32 KB - 256 KB), L2 cache (128 KB - 2 MB), and L3 cache (1 MB - 32 MB). Cache sizes are getting bigger. A CPU searches the caches for data first. If the data is unavailable there, it then searches the memory.
CPUs can obtain data faster from closer locations. LMBench can be used to test the data reading latency.
The preceding figure shows that:
This means, if you want your service code to have higher execution efficiency, ensure that execution is performed closer to the CPU. However, the preceding figure indicates that the memory latency is measured in nanoseconds, while the actual service speeds are measured in milliseconds. Therefore, optimization should focus on those operations that take milliseconds, and memory latency optimization is a matter of long tail.
Memory latency is closely related to caches. Without a good misunderstanding of memory latency, you may mistake cache latency for memory latency. If memory bandwidth tests are performed incorrectly, the cache bandwidth may be checked instead.
To understand memory bandwidth, it is necessary for us to learn about the architecture of memory and CPUs. In the past, CPUs were connected to the memory through a northbridge. Now, CPUs directly read data from memory using the integrated memory controller (IMC) of the CPU.
You can test memory bandwidth using various tools. In Linux systems, the stream algorithm is generally used for this testing. The stream algorithm is briefly described as follows:
According to the preceding figure, the principle of the stream algorithm is very simple. Data is read from one memory block and put into another after simple computing. The memory bandwidth is calculated using this formula: Size of data moved/Time elapsed. An appropriate test over the entire machine can reveal the bandwidth of the IMC. The following figure shows the memory bandwidth data of a cloud product:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 128728.5 0.134157 0.133458 0.136076
Scale: 128656.4 0.134349 0.133533 0.137638
Add: 144763.0 0.178851 0.178014 0.181158
Triad: 144779.8 0.178717 0.177993 0.180214
-------------------------------------------------------------
Memory bandwidth is undoubtedly important, as it indicates the maximum data throughput of the memory. However, correct and suitable testing is very important. You need to pay attention to the following points:
33 posts | 12 followers
Followhyj1991 - June 20, 2019
Alibaba Clouder - November 19, 2019
Alibaba Cloud Community - September 10, 2024
Alibaba Container Service - March 12, 2021
Alibaba Clouder - August 29, 2017
hyj1991 - June 20, 2019
33 posts | 12 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreMore Posts by Alibaba Cloud ECS