Alibaba Dragonwell ZGC – Part 3: How Does Dragonwell 11 Transform the New Garbage Collector ZGC?

By Hao Tang

What are the specific applicable scenarios for ZGC in Dragonwell 11? How can we transform the new garbage collector ZGC?

This article is Part 3 of the 3-part series on Alibaba Dragonwell ZGC. It focuses on the product-ready transformation for ZGC in Dragonwell 11 to effectively deal with the risks of the experimental ZGC in OpenJDK 11 (mentioned in Part 1 of this series). The end of this article summarizes the articles on Dragonwell ZGC and future development.

Alibaba Dragonwell ZGC – Part 1: New Garbage Collector ZGC Unboxing and the First Experience of ZGC
Alibaba Dragonwell ZGC – Part 2: The Principles and Tuning of ZGC | A New Garbage Collector

Product-Ready Transformation of ZGC

Part 1 of this 3-part series mentioned that Alibaba transformed ZGC in Dragonwell 11 into a product-ready version. It also mentioned that OpenJDK12-16 is not a long-term support version and is difficult to deploy on a large scale in production. Then, the question arises – why is the ZGC of Dragonwell 11 required instead of OpenJDK 11? What are the specific applicable scenarios for the ZGC of Dragonwell 11?

We believe that as long as Java 11 is adopted, you should choose the product-ready ZGC of Dragonwell 11 instead of the experimental ZGC of OpenJDK 11. One of the most important reasons is the experimental ZGC has a probability of crashes. (Please refer to Part 1 of this 3-part series.) This problem is addressed in the product-ready ZGC. The ZGC of Dragonwell 11 has also improved many features, allowing the ZGC to unlock more scenarios.

The product-ready ZGC of Dragonwell 11 has many advantages:

Solving problems encountered in practice with improved ZGC features
Maintaining the stable quality of long-term support of Java 11
Having complete open-source and testing processes

Improved ZGC Features

Dragonwell 11 has ported most of the ZGC-related code of OpenJDK 15 (the first official version of JDK supporting product-ready ZGC). The code improves the features of ZGC, is supported on more platforms, and fixes major bugs of load barriers.

ZGC Reconstructing C2 Load Barrier

Part 1 mentioned the GC pause may happen between the ZGC load barrier and the load operation. This happens because the ZGC load barrier uses C2 to compile and generate platform-related code in real-time, while the ZGC C2 load barrier of OpenJDK 11 may bring the problem above, thus causing an error. According to the ZGC C2 load barrier transformation of OpenJDK 14, we cancel C2 nodes of the load barrier, transform C2 nodes related to the load operation, and make C2 load barrier generate the correct load barrier and load operation in the machine code generation phase.

Our subsequent practice shows that C2 load barrier reconstruction can eliminate ZGC crashes and improve the availability of the ZGC in production practice.

The Multi-Platform Support of ZGC

Dragonwell 11 adds support for AArch64/Linux. A large amount of Alibaba businesses and its many cloud customers hope to use ZGC capabilities on the AArch64 platform. They also need the long-term support of OpenJDK 11. Therefore, Dragonwell 11 has ported the product-ready code related to the ZGC and AArch64, thus broadening the applicable machines of the ZGC.

The Support of ZGC Class Unloading

Class unloading is an integral part of a complete GC and is responsible for unloading classes no longer active in Java. A large number of classes are generated in much Java code. Classes and objects that are no longer active need to be collected in time, otherwise, the meta-space of class information will be filled up, and subsequent code execution will be affected.

OpenJDK completed the concurrent class unloading feature of ZGC in OpenJDK 12. This process is accompanied by the concurrent transformation of a large number of public data structures (non-ZGC code). The transformation of these massive public data structures includes over one hundred code changes. The slightest misstep can lead to uncontrollable code risks, increasing the subsequent costs of synchronization with upstream.

Dragonwell 11 refers to the existing code of GC class unloading. Combined with the code of ZGC on class unloading in OpenJDK 12, Dragonwell 11 realizes the class unloading of ZGC. The class unloading of ZGC is not concurrent, but our current practice shows that this class unloading process can keep the pause at the 10ms level. Since class unloading does not occur frequently in much business, we make Dragonwell 11 support the option of ZUnloadClassesFrequency to adjust the frequency of class unloading.

The Optimization of ZGC Memory Usage

Compared with OpenJDK 11, Dragonwell 11 ZGC adds support for returning physical memory, expansion of memory usage, and parallel pre-touching. The new support helps Dragonwell 11 ZGC to be applied to more detailed scenarios.

Returning Physical Memory: This feature of ZGC applies to scenarios where multiple instances are deployed on the same machine. Dragonwell 11 ZGC can enable the feature of returning physical memory by setting ZUncommit. Developers only need to set the upper limit Xmx, lower limit Xms, and SoftMaxHeapSize of the heap. The heap size usually used in Java applications will remain around the SoftMaxHeapSize. When burst traffic arrives, the Java applications can temporarily expand the size of the heap to cope with the burst traffic. When the burst traffic passes, it can also return the temporarily unused memory to the operating system.

Expansion of Memory Usage: Dragonwell 11 expands the applicable memory usage of ZGC and allows it to support a super-large heap of 16TB and an ultra-small heap of 8MB. Therefore, it is more convenient for the same business to deploy machines with different specifications.

Parallel Pre-Touching: The pre-touching capability of GC (Enable -XX:+AlwaysPreTouch) can prevent the RT of the application from being affected by the memory touch when it is started. However, the ZGC pre-touching in JDK 11 is single thread, which takes a long time to start the application. The pre-touching processes of a large heap can reach the minute level. The parallelization of Dragonwell 11 transforms the pre-touching process, which improves the start-up speed of large-heap applications.

The Optimization of ZGC Response Time

The response time is about non-pause factors affecting RT P99/P999.

We may see Page Cache Flush during the ZGC practice of JDK 11.

[2019-09-05T14:14:04.242+0800] GC(10816) Page Cache Flushed: 28M requested, 28M(11424M->11396M) flushed
[2019-09-05T14:14:04.248+0800] Page Cache Flushed: 32M requested, 32M(11928M->11896M) flushed
[2019-09-05T14:14:04.259+0800] Page Cache Flushed: 32M requested, 32M(11912M->11880M) flushed
[2019-09-05T14:14:04.271+0800] Page Cache Flushed: 32M requested, 32M(11878M->11846M) flushed
[2019-09-05T14:14:04.276+0800] Page Cache Flushed: 32M requested, 32M(11846M->11814M) flushed
... （Omit 35 "Page Cache Flushed"）
[2019-09-05T14:14:04.462+0800] Page Cache Flushed: 32M requested, 32M(10596M->10564M) flushed   
[2019-09-05T14:14:04.467+0800] Page Cache Flushed: 32M requested, 32M(10564M->10532M) flushed
[2019-09-05T14:14:04.471+0800] Page Cache Flushed: 32M requested, 32M(10522M->10490M) flushed
[2019-09-05T14:14:04.477+0800] GC(10816) Page Cache Flushed: 32M requested, 32M(10490M->10458M) flushed

At the same time, we will see that RT P99 arises above 200ms on the monitoring of the application. As shown in the above figure, since Page Cache Flushed has occurred several times consecutively, the duration is more than 200ms. At this time, the Page Cache Flush causes thread blocking, and dozens of object allocation threads are waiting on the same lock.

This happens because ZGC divides the heap into several ZPages similar to the concept of Region of G1, including small (2MB), medium (32MB), and large (2*N MB) specifications. Objects are allocated to ZPages with the corresponding specification according to the size of objects. Page Cache is a data structure that stores free ZPages.

We encountered a problem during real-world operations. The allocation speed of objects with different specifications is unstable. Sometimes more medium-sized objects will lead to fewer medium-sized ZPages, and small/large ZPages need to be transformed into medium-sized ZPages. This transformation is Page Cache Flush. The Page Cache Flush takes a long time and requires multiple mmap system calls (large overhead). In addition, the impact of Page Cache Flush is so great that the ZPage allocation global lock needs to be locked.

The solution of Dragonwell 11 is to port the feature of improving the allocation concurrency of ZPage. This feature avoids using the ZPage allocation global lock as much as possible and executes mmap asynchronously. Another solution of Dragonwell 11 is to adjust the object size threshold for medium-sized ZPages. (The original range is 256KB to 4MB.) We add support to set the ZMediumObjectUpperBound, such as -XX:ZMediumObjectUpperBound=10MB, which represents the range of medium-sized ZPages after adjustment from 256KB to 10MB. According to practice, Dragonwell 11 can reduce thread blocking caused by Page Cache Flush, thus optimizing RT P99/P999.

The Solution for ZGC Throughput Rate

ZGC has a probability of encountering insufficient throughput in production practice, including two phenomena: Allocations Stall and Out of Memory (OOM).

Phenomenon 1: Allocation Stall

Allocation Stall means the collection speed cannot keep up with the allocation speed.

Developers can alleviate this problem by increasing the heap size (Xmx) or the number of concurrent GC threads (ConcGCThreads). However, the computing resources of the machine are limited, and it is impossible to increase the heap size and the number of threads indefinitely. Therefore, it is time to consider the timing of the ZGC trigger:

ZAllocationSpikeTolerance: This has already been supported by ZGC in JDK11. Adding this option can handle glitches of allocation speed but is not suitable for daily situations. Excessive triggering of ZGC leads to too much CPU consumption.
ZHighUsagePercent: Online monitoring connected with some applications will raise an alert when the level of the heap usage is too high. Experimental ZGC has no absolute limits on the level of ZGC. Product-ready ZGC set 95% as the highest level of the heap. Dragonwell 11 can adjust the highest level of the heap with ZHighUsagePercent and trigger ZGC when the level of the heap exceeds ZHighUsagePercent%.

Phenomenon 2: OOM

ZGC reserves a fixed space as the area for object relocation. However, if Java threads access the object too fast, the speed of object relocation may also be too fast. Thus, the reserved space is still insufficient, which eventually leads to OOM and program crashes.

Dragonwell 11 can adjust the parameter ZRelocationReservePercent so the ZRelocationReservePercent% of the heap is used as the reserved space, which avoids OOM to a greater extent.

Upgrading Monitoring of ZGC

Dragonwell 11 updates the details of the GC log, including the information correction of error live objects and statistics display of different specifications of ZPages.

Dragonwell 11 also introduces ZGC-related JFR events: ZAllocationStall, ZPageAllocation, ZRelocationSet, ZRelocationSetGroup, ZUncommit, and ZUnmap. These JFR events can monitor the current status of ZGC and help troubleshoot abnormal conditions that occur in the ZGC. Dragonwell 11 also updates ZGC-related GarbageCollectorMXBean to monitor two types of ZGC metrics: ZGC cycle and ZGC pause.

Maintaining the Stabile Quality

Alibaba Dragonwell 11 selectively ports the code of product-ready ZGC and reasonably transforms the code. Therefore, Dragonwell 11 has the ZGC capabilities of OpenJDK15 and enjoys the stabile quality of long-term support of OpenJDK11.

We have noticed that if all ZGC code is ported without control, it would be possible to modify a large amount of code in the common part of Dragonwell 11. The consequences include:

Difficulties in Subsequent Upgrades: Dragonwell 11 will regularly synchronize the latest OpenJDK11 code in the upstream section. If the update of OpenJDK11 and the transformation of Dragonwell 11 ZGC modify this part of code at the same time, this part of code will be difficult to be maintained, increasing the risk of code error.
Impact on the Correctness of Other Parts of Code in Dragonwell 11: ZGC depends on the changes of common code, including the changes of some class loading and C2 common code. Other GCs (including G1/CMS) and the rest of the JDK call this part of the code. If you do not carefully port the changes of common code or confirm that these changes will not affect the correctness, users may encounter unexpected risks.

Therefore, we need to control code risks. We are required to control the product-ready transformation within the scope of ZGC code as much as possible and select the ZGC code that is most relevant to product-ready for reasonable transformation.

We adopt the compile-time check and runtime check to keep the transformation within the scope of ZGC code and ensure the ZGC transformation code will not pollute the common code. (This part of the work refers to the work of the Shenandoah GC Backport to JDK11.)

The compile-time check uses the method of macro isolation. When ZGC compilation is disabled, code will not be compiled, thus ensuring the code has no problems. This approach ensures the code quality when ZGC is enabled and compiled. This approach ensures the quality of the code when ZGC is enabled in compile-time. Macro isolation is isolating in the manner of macro:

#if INCLUDE_ZGC … #endif
ZGC_ONLY( … )

Runtime check uses the method of conditional isolation to ensure that other GCs will not execute our ported code when they are enabled, further reducing the risk for Dragonwell 11. Conditional isolation uses the if statement to isolate:

if (UseZGC) { … }

Open-Source and Test Process

We have made the process of ZGC product-ready transformation open source in GitHub, which is documented in the milestone. The milestone includes more than 200 patches related to ZGC. Each patch has been carefully reviewed by experts from Alibaba.

We maintain the Nightly build pipeline responsible for testing to ensure it compiles normally on x64 and AArch64 platforms every night and ensure that enabling/disabling ZGC can pass the OpenJDK test.

Outlook

Some recent advances in ZGC can further optimize the performance of ZGC, including:

Compressed Class Pointers: Our internal experiments show that ZGC performance is significantly improved because of compressed class pointers (although object pointers are not compressed). Since code porting affects the stability of JDK11, it is not an open-source project yet.
In-Situ Object Relocation: JDK16 ZGC uses the technology of in-situ object relocation to avoid OOM.
Sub-Millisecond Pause: JDK16 ZGC supports concurrent thread stack processing, so GC Roots is also processed in concurrent threads, reaching a pause time of less than 1ms.
Improvement in Throughput Rate: ZGC has recently disclosed the code of general ZGC in its code base, which is expected to improve the throughput rate of ZGC.

We also evaluated the ZGC's ilk: Shenandoah GC. According to our preliminary evaluation, Shenandoah GC works well on a heap within 32GB, and the most important factor is its support for pointer compression.

Summary

The 3-part Alibaba Dragonwell ZGC series discusses the GC concept, ZGC and its applicable scenarios, and the product-ready transformation of ZGC by Dragonwell 11.

This work has maintained the stability of Dragonwell 11 and upgraded ZGC to product-ready ZGC, repairing major defects in ZGC, adding support for the AArch64 platform, and improving many new features. Dragonwell 11 has also added several common features to meet the needs of internal and cloud customers of Alibaba.

We will update the Alibaba Dragonwell ZGC series from time to time to share our experience using ZGC and contributions to OpenJDK.

References

About the Author

Hao Tang joined the Alibaba Cloud programming language and compiler team in 2019 and is currently engaged in JVM memory management optimization.

Community

Alibaba Dragonwell ZGC – Part 3: How Does Dragonwell 11 Transform the New Garbage Collector ZGC?

Related Articles

Product-Ready Transformation of ZGC

Improved ZGC Features

ZGC Reconstructing C2 Load Barrier

The Multi-Platform Support of ZGC

The Support of ZGC Class Unloading

The Optimization of ZGC Memory Usage

The Optimization of ZGC Response Time

The Solution for ZGC Throughput Rate

Phenomenon 1: Allocation Stall

Phenomenon 2: OOM

Upgrading Monitoring of ZGC

Maintaining the Stabile Quality

Open-Source and Test Process

Outlook

Summary

References

About the Author

Read previous post:

Read next post:

OpenAnolis

You may also like

Comments

OpenAnolis

Related Products

ApsaraMQ for RocketMQ

Elastic High Performance Computing Solution

Elastic High Performance Computing

Remote Rendering Solution

A Free Trial That Lets You Build Big!