By Hao Tang
Part 1 of this 3-part series introduced the basic concepts of Z Garbage Collector (ZGC) and the large-scale ZGC practice of Alibaba. The business and cloud customers of Alibaba enjoyed the optimized response time brought by ZGC but encountered some practical problems. If you want to use ZGC better, we need to understand the principles of ZGC and learn to analyze ZGC logs to tune ZGC.
From a macro perspective, ZGC is a concurrent and compacting GC algorithm:
Compared with the original 100-millisecond paused Parallel GC and G1 in Java and CMS that has not solved the fragmentation problem, concurrent and compacting ZGC can be regarded as a major leap forward in the capability for GC in Java. While a GC thread sorts out memory, a Java thread can continue its execution.
ZGC uses a mark-compact strategy to collect the Java heap. ZGC first concurrently marks live objects in the heap and then concurrently relocates the live objects to some regions together. The difference between ZGC and the earlier GC in Java is that ZGC is a single-generational garbage collector that traverses all objects in the heap during the marking phase.
Then, the question arises – how does ZGC achieve concurrent marking and relocating? This will introduce the core technologies of ZGC: load barrier and colored pointer.
The load barrier of ZGC aims to insert a processing logic for a pointer when the pointer is loaded:
The load barrier ensures the correct object can be accessed every time the pointer is loaded, when GC threads and Java threads are running concurrently.
The colored pointer of ZGC uses the unused upper bits of the pointer as the color of the pointer to indicate the state of the pointer. Therefore, when the load barrier processes the pointer, the load barrier can directly obtain the state of the pointer and decide how to process the pointer. The product-ready ZGC supports the address space of 2 ^ 44=16TB. 44+4=48 bits are used as the address of the colored pointer, and the upper 4 bits are the color of the pointer. The colored pointer and the load barrier cooperate with each other to convert the partial conditional judgment in the load barrier into the judgment of the pointer color. If the pointer color is wrong, the load barrier will correct the pointer.
Three short pauses are required during the actual execution of a single ZGC cycle. Each pause is followed by several concurrent phases.
[2020-12-23T13:30:57.402+0800] GC(10) Garbage Collection (Allocation Rate) [2020-12-23T13:30:57.408+0800] GC(10) Pause Mark Start 2.918ms [2020-12-23T13:30:58.083+0800] GC(10) Concurrent Mark 674.216ms [2020-12-23T13:30:58.087+0800] GC(10) Pause Mark End 1.336ms [2020-12-23T13:30:58.105+0800] GC(10) Concurrent Process Non-Strong References 18.293ms [2020-12-23T13:30:58.111+0800] GC(10) Concurrent Reset Relocation Set 5.533ms [2020-12-23T13:30:58.111+0800] GC(10) Concurrent Destroy Detached Pages 0.001ms [2020-12-23T13:30:58.121+0800] GC(10) Concurrent Select Relocation Set 10.148ms [2020-12-23T13:30:58.130+0800] GC(10) Concurrent Prepare Relocation Set 9.083ms [2020-12-23T13:30:58.136+0800] GC(10) Pause Relocate Start 2.452ms [2020-12-23T13:30:58.203+0800] GC(10) Concurrent Relocate 66.595ms ... (Omit some data statistics here) [2020-12-23T13:30:58.203+0800] GC(10) Garbage Collection (Allocation Rate) 62020M(76%)->41270M(50%)
The GC log above shows a typical ZGC cycle. The phase that starts with Pause in each cycle in each row is the pause phase. The three pause phases are listed below:
In the GC log above, the periods of the three pause phases of ZGC are significantly lower than 10ms. These three pause phases are mainly responsible for marking and relocating GC Roots and marking the thread synchronization.
The concurrency phase that starts with Concurrent is after these three pause phases. The two core concurrency phases are Concurrent Mark and Concurrent Relocate.
The other concurrency phases are mainly some preparatory work before the Concurrent Relocate.
An Illustration of the ZGC Stages
Currently, the Concurrent Mark of ZGC marks all live objects in the entire heap, which is different from generational GCs like G1/CMS/Parallel GC and belongs to a single-generation GC. During the process of Concurrent Mark, the wrong pointer in the heap will be corrected. The strategy of the Concurrent Mark of ZGC will select a certain region where the degree of fragmentation reaches a certain threshold (ZFragmentationLimit) to reduce the burden of relocating objects, which is similar to the Garbage First strategy of the G1.
The following part describes the tuning details related to ZGC. Users should complete the basic tuning part at least.
In general, ZGC sets the heap space size (Xmx) and the number of concurrent GC threads (ConcGCThreads). All ZGC users should enable GC logs and enable Xlog:gc*:gc.log:time to record more ZGC details.
GC usually requires a developer to specify the heap space size. The specific value will be greater than the total size of the live objects in the heap. The higher proportion of redundant space, the better the GC performance is. For example, if the total size of estimated objects reaches 32GB, the heap space size is set as Xmx40g, which means 40GB of the heap is enabled.
ZGC differs from traditional GC. While ZGC collects objects, Java threads are also allocating new objects. Therefore, ZGC requires a higher proportion of redundant space than traditional GCs.
The total size of objects allocated during each round of ZGC can be estimated by allocation speed and single round ZGC time, so the size of heap space should be greater than the total size of live objects + the total size of objects allocated during a single ZGC.
You can find the preceding allocation speed and single round ZGC time in GC logs.
The default number of concurrent GC threads in ZGC is one-eighth of the CPU cores, such as a 16-core machine. If ConcGCThreads is not specified, ZGC will use two concurrent GC threads.
In GC logs, if Allocation Stall frequently appears, it means the collection cannot keep up with the allocation. Therefore, ConcGCThreads may be required to be increased. However, ConcGCThreads cannot be increased indefinitely because too many concurrent GC threads will occupy CPU resources and affect the normal execution of Java threads.
Note: Concurrent GC threads (ConcGCThreads) are different from parallel GC threads (ParallelGCThreads). The former can be executed concurrently with Java threads, and the latter is the GC threads during GC pauses.
The feature of product-ready ZGC also supports several advanced ZGC tuning options. Please see the instruction of Alibaba Dragonwell 11.0.11.7 for more information.
The core part of advanced tuning is the control of GC trigger timing. Since ZGC still allocates objects during collecting, ZGC is required to trigger GC sometime in advance, not when the heap space is full. Therefore, the heap space will not become full during the ZGC execution or result in Allocation Stall or OOM. However, if ZGC is triggered too frequently, CPU resources will be consumed more, thus reducing the throughput rate.
Dragonwell 11 supports the following options related to GC trigger timing:
ZGC is triggered as long as one of the conditions above for GC trigger timing is met.
The SoftMaxHeapSize option can set a soft upper limit of the ZGC heap space between Xmx and Xms. The ZAllocationSpikeTolerance, ZProactive, and ZHighUsagePercent above all use the SoftMaxHeapSize value as the soft upper limit of the ZGC heap space. When the allocation speed is too fast, the heap space can be expanded up to Xmx. When the allocation speed is slow, the heap space can be contracted to Xms. SoftMaxHeapSize usually needs to enable -XX:+ ZUncommit.
In addition, there are some useful advanced tuning features:
ZHighUsagePercent, ZUnloadClassesFrequency, and ZRelocationReservePercent above are unique options for Dragonwell 11. If you switch to other versions of OpenJDK, avoid using these options.
Part 3 of this 3-part series will introduce Alibaba Dragonwell 11 in terms of its production-ready transformation for ZGC.
Dragonwell has joined the Java language and virtual machine SIG in the Anolis community (OpenAnolis). At the same time, Anolis operating system (Anolis OS) 8 supports Dragonwell cloud-native Java. You are welcome to join the SIG Community and construct the community together.
Hao Tang joined the Alibaba Cloud programming language and compiler team in 2019 and is currently engaged in JVM memory management optimization.
SIG Address: https://openanolis.cn/sig/java/doc/216166872482840581
Alibaba Dragonwell ZGC – Part 1: New Garbage Collector ZGC Unboxing and the First Experience of ZGC
Alibaba Dragonwell ZGC – Part 3: How Does Dragonwell 11 Transform the New Garbage Collector ZGC?
85 posts | 5 followers
FollowOpenAnolis - April 20, 2022
OpenAnolis - April 22, 2022
Alibaba Clouder - October 21, 2020
Aliware - April 10, 2020
Alibaba Cloud Community - August 22, 2024
Alibaba Cloud Community - March 16, 2023
85 posts | 5 followers
FollowHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreConnect your on-premises render farm to the cloud with Alibaba Cloud Elastic High Performance Computing (E-HPC) power and continue business success in a post-pandemic world
Learn MoreBuild business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MoreMore Posts by OpenAnolis