By Hao Tang
What are the specific applicable scenarios for ZGC in Dragonwell 11? How can we transform the new garbage collector ZGC?
This article is Part 3 of the 3-part series on Alibaba Dragonwell ZGC. It focuses on the product-ready transformation for ZGC in Dragonwell 11 to effectively deal with the risks of the experimental ZGC in OpenJDK 11 (mentioned in Part 1 of this series). The end of this article summarizes the articles on Dragonwell ZGC and future development.
Part 1 of this 3-part series mentioned that Alibaba transformed ZGC in Dragonwell 11 into a product-ready version. It also mentioned that OpenJDK12-16 is not a long-term support version and is difficult to deploy on a large scale in production. Then, the question arises – why is the ZGC of Dragonwell 11 required instead of OpenJDK 11? What are the specific applicable scenarios for the ZGC of Dragonwell 11?
We believe that as long as Java 11 is adopted, you should choose the product-ready ZGC of Dragonwell 11 instead of the experimental ZGC of OpenJDK 11. One of the most important reasons is the experimental ZGC has a probability of crashes. (Please refer to Part 1 of this 3-part series.) This problem is addressed in the product-ready ZGC. The ZGC of Dragonwell 11 has also improved many features, allowing the ZGC to unlock more scenarios.
The product-ready ZGC of Dragonwell 11 has many advantages:
Dragonwell 11 has ported most of the ZGC-related code of OpenJDK 15 (the first official version of JDK supporting product-ready ZGC). The code improves the features of ZGC, is supported on more platforms, and fixes major bugs of load barriers.
Part 1 mentioned the GC pause may happen between the ZGC load barrier and the load operation. This happens because the ZGC load barrier uses C2 to compile and generate platform-related code in real-time, while the ZGC C2 load barrier of OpenJDK 11 may bring the problem above, thus causing an error. According to the ZGC C2 load barrier transformation of OpenJDK 14, we cancel C2 nodes of the load barrier, transform C2 nodes related to the load operation, and make C2 load barrier generate the correct load barrier and load operation in the machine code generation phase.
Our subsequent practice shows that C2 load barrier reconstruction can eliminate ZGC crashes and improve the availability of the ZGC in production practice.
Dragonwell 11 adds support for AArch64/Linux. A large amount of Alibaba businesses and its many cloud customers hope to use ZGC capabilities on the AArch64 platform. They also need the long-term support of OpenJDK 11. Therefore, Dragonwell 11 has ported the product-ready code related to the ZGC and AArch64, thus broadening the applicable machines of the ZGC.
Class unloading is an integral part of a complete GC and is responsible for unloading classes no longer active in Java. A large number of classes are generated in much Java code. Classes and objects that are no longer active need to be collected in time, otherwise, the meta-space of class information will be filled up, and subsequent code execution will be affected.
OpenJDK completed the concurrent class unloading feature of ZGC in OpenJDK 12. This process is accompanied by the concurrent transformation of a large number of public data structures (non-ZGC code). The transformation of these massive public data structures includes over one hundred code changes. The slightest misstep can lead to uncontrollable code risks, increasing the subsequent costs of synchronization with upstream.
Dragonwell 11 refers to the existing code of GC class unloading. Combined with the code of ZGC on class unloading in OpenJDK 12, Dragonwell 11 realizes the class unloading of ZGC. The class unloading of ZGC is not concurrent, but our current practice shows that this class unloading process can keep the pause at the 10ms level. Since class unloading does not occur frequently in much business, we make Dragonwell 11 support the option of ZUnloadClassesFrequency to adjust the frequency of class unloading.
Compared with OpenJDK 11, Dragonwell 11 ZGC adds support for returning physical memory, expansion of memory usage, and parallel pre-touching. The new support helps Dragonwell 11 ZGC to be applied to more detailed scenarios.
Returning Physical Memory: This feature of ZGC applies to scenarios where multiple instances are deployed on the same machine. Dragonwell 11 ZGC can enable the feature of returning physical memory by setting ZUncommit. Developers only need to set the upper limit Xmx, lower limit Xms, and SoftMaxHeapSize of the heap. The heap size usually used in Java applications will remain around the SoftMaxHeapSize. When burst traffic arrives, the Java applications can temporarily expand the size of the heap to cope with the burst traffic. When the burst traffic passes, it can also return the temporarily unused memory to the operating system.
Expansion of Memory Usage: Dragonwell 11 expands the applicable memory usage of ZGC and allows it to support a super-large heap of 16TB and an ultra-small heap of 8MB. Therefore, it is more convenient for the same business to deploy machines with different specifications.
Parallel Pre-Touching: The pre-touching capability of GC (Enable -XX:+AlwaysPreTouch) can prevent the RT of the application from being affected by the memory touch when it is started. However, the ZGC pre-touching in JDK 11 is single thread, which takes a long time to start the application. The pre-touching processes of a large heap can reach the minute level. The parallelization of Dragonwell 11 transforms the pre-touching process, which improves the start-up speed of large-heap applications.
The response time is about non-pause factors affecting RT P99/P999.
We may see Page Cache Flush during the ZGC practice of JDK 11.
[2019-09-05T14:14:04.242+0800] GC(10816) Page Cache Flushed: 28M requested, 28M(11424M->11396M) flushed
[2019-09-05T14:14:04.248+0800] Page Cache Flushed: 32M requested, 32M(11928M->11896M) flushed
[2019-09-05T14:14:04.259+0800] Page Cache Flushed: 32M requested, 32M(11912M->11880M) flushed
[2019-09-05T14:14:04.271+0800] Page Cache Flushed: 32M requested, 32M(11878M->11846M) flushed
[2019-09-05T14:14:04.276+0800] Page Cache Flushed: 32M requested, 32M(11846M->11814M) flushed
... (Omit 35 "Page Cache Flushed")
[2019-09-05T14:14:04.462+0800] Page Cache Flushed: 32M requested, 32M(10596M->10564M) flushed
[2019-09-05T14:14:04.467+0800] Page Cache Flushed: 32M requested, 32M(10564M->10532M) flushed
[2019-09-05T14:14:04.471+0800] Page Cache Flushed: 32M requested, 32M(10522M->10490M) flushed
[2019-09-05T14:14:04.477+0800] GC(10816) Page Cache Flushed: 32M requested, 32M(10490M->10458M) flushed
At the same time, we will see that RT P99 arises above 200ms on the monitoring of the application. As shown in the above figure, since Page Cache Flushed has occurred several times consecutively, the duration is more than 200ms. At this time, the Page Cache Flush causes thread blocking, and dozens of object allocation threads are waiting on the same lock.
This happens because ZGC divides the heap into several ZPages similar to the concept of Region of G1, including small (2MB), medium (32MB), and large (2*N MB) specifications. Objects are allocated to ZPages with the corresponding specification according to the size of objects. Page Cache is a data structure that stores free ZPages.
We encountered a problem during real-world operations. The allocation speed of objects with different specifications is unstable. Sometimes more medium-sized objects will lead to fewer medium-sized ZPages, and small/large ZPages need to be transformed into medium-sized ZPages. This transformation is Page Cache Flush. The Page Cache Flush takes a long time and requires multiple mmap system calls (large overhead). In addition, the impact of Page Cache Flush is so great that the ZPage allocation global lock needs to be locked.
The solution of Dragonwell 11 is to port the feature of improving the allocation concurrency of ZPage. This feature avoids using the ZPage allocation global lock as much as possible and executes mmap asynchronously. Another solution of Dragonwell 11 is to adjust the object size threshold for medium-sized ZPages. (The original range is 256KB to 4MB.) We add support to set the ZMediumObjectUpperBound, such as -XX:ZMediumObjectUpperBound=10MB, which represents the range of medium-sized ZPages after adjustment from 256KB to 10MB. According to practice, Dragonwell 11 can reduce thread blocking caused by Page Cache Flush, thus optimizing RT P99/P999.
ZGC has a probability of encountering insufficient throughput in production practice, including two phenomena: Allocations Stall and Out of Memory (OOM).
Allocation Stall means the collection speed cannot keep up with the allocation speed.
Developers can alleviate this problem by increasing the heap size (Xmx) or the number of concurrent GC threads (ConcGCThreads). However, the computing resources of the machine are limited, and it is impossible to increase the heap size and the number of threads indefinitely. Therefore, it is time to consider the timing of the ZGC trigger:
ZGC reserves a fixed space as the area for object relocation. However, if Java threads access the object too fast, the speed of object relocation may also be too fast. Thus, the reserved space is still insufficient, which eventually leads to OOM and program crashes.
Dragonwell 11 can adjust the parameter ZRelocationReservePercent so the ZRelocationReservePercent% of the heap is used as the reserved space, which avoids OOM to a greater extent.
Dragonwell 11 updates the details of the GC log, including the information correction of error live objects and statistics display of different specifications of ZPages.
Dragonwell 11 also introduces ZGC-related JFR events: ZAllocationStall, ZPageAllocation, ZRelocationSet, ZRelocationSetGroup, ZUncommit, and ZUnmap. These JFR events can monitor the current status of ZGC and help troubleshoot abnormal conditions that occur in the ZGC. Dragonwell 11 also updates ZGC-related GarbageCollectorMXBean to monitor two types of ZGC metrics: ZGC cycle and ZGC pause.
Alibaba Dragonwell 11 selectively ports the code of product-ready ZGC and reasonably transforms the code. Therefore, Dragonwell 11 has the ZGC capabilities of OpenJDK15 and enjoys the stabile quality of long-term support of OpenJDK11.
We have noticed that if all ZGC code is ported without control, it would be possible to modify a large amount of code in the common part of Dragonwell 11. The consequences include:
Therefore, we need to control code risks. We are required to control the product-ready transformation within the scope of ZGC code as much as possible and select the ZGC code that is most relevant to product-ready for reasonable transformation.
We adopt the compile-time check and runtime check to keep the transformation within the scope of ZGC code and ensure the ZGC transformation code will not pollute the common code. (This part of the work refers to the work of the Shenandoah GC Backport to JDK11.)
The compile-time check uses the method of macro isolation. When ZGC compilation is disabled, code will not be compiled, thus ensuring the code has no problems. This approach ensures the code quality when ZGC is enabled and compiled. This approach ensures the quality of the code when ZGC is enabled in compile-time. Macro isolation is isolating in the manner of macro:
#if INCLUDE_ZGC … #endif
ZGC_ONLY( … )
Runtime check uses the method of conditional isolation to ensure that other GCs will not execute our ported code when they are enabled, further reducing the risk for Dragonwell 11. Conditional isolation uses the if statement to isolate:
if (UseZGC) { … }
We have made the process of ZGC product-ready transformation open source in GitHub, which is documented in the milestone. The milestone includes more than 200 patches related to ZGC. Each patch has been carefully reviewed by experts from Alibaba.
We maintain the Nightly build pipeline responsible for testing to ensure it compiles normally on x64 and AArch64 platforms every night and ensure that enabling/disabling ZGC can pass the OpenJDK test.
Some recent advances in ZGC can further optimize the performance of ZGC, including:
We also evaluated the ZGC's ilk: Shenandoah GC. According to our preliminary evaluation, Shenandoah GC works well on a heap within 32GB, and the most important factor is its support for pointer compression.
The 3-part Alibaba Dragonwell ZGC series discusses the GC concept, ZGC and its applicable scenarios, and the product-ready transformation of ZGC by Dragonwell 11.
This work has maintained the stability of Dragonwell 11 and upgraded ZGC to product-ready ZGC, repairing major defects in ZGC, adding support for the AArch64 platform, and improving many new features. Dragonwell 11 has also added several common features to meet the needs of internal and cloud customers of Alibaba.
We will update the Alibaba Dragonwell ZGC series from time to time to share our experience using ZGC and contributions to OpenJDK.
Hao Tang joined the Alibaba Cloud programming language and compiler team in 2019 and is currently engaged in JVM memory management optimization.
Alibaba Dragonwell ZGC – Part 2: The Principles and Tuning of ZGC | A New Garbage Collector
84 posts | 5 followers
FollowOpenAnolis - April 21, 2022
OpenAnolis - April 20, 2022
Aliware - April 10, 2020
Alibaba Clouder - October 21, 2020
Alibaba Clouder - April 15, 2021
Alibaba Cloud Community - March 16, 2023
84 posts | 5 followers
FollowApsaraMQ for RocketMQ is a distributed message queue service that supports reliable message-based asynchronous communication among microservices, distributed systems, and serverless applications.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreConnect your on-premises render farm to the cloud with Alibaba Cloud Elastic High Performance Computing (E-HPC) power and continue business success in a post-pandemic world
Learn MoreMore Posts by OpenAnolis