Saturday, January 11, 2014

JRockit: Out-of-the-Box Behavior of Four Generational GC's

In [2], we have presented four Generational GCs in JRockit:
  • -Xgc:gencon
  • -Xgc:genpar
  • -Xgc:genconpar
  • -Xgc:genparcon
In this article, we will examine their Out-of-the-box (OOTB)[7] behaviors, especially on adaptive (or automatic) memory management in JRockit.

Adaptive Memory Management


JRockit was the first JVM to recognize that adaptive optimizations based on feedback could be applied to all subsystems in the runtime,[1] which include:
  • Code Generation
  • Memory Management
  • Threads and Synchronization

In this article, we will focus mainly on the memory management of R28. In R28, JRockit adaptively modifies many aspects of the garbage collection, but to a lesser extent than R27.[1]

Adaptive optimizations based on runtime feedback work in this way:[1] In the beginning, these changes are fairly frequent, but after a warm-up period and maintained steady-state behavior, the idea is that the JVM should settle upon an optimal algorithm. If, after a while, the steady-state behavior changes from one kind to another, the JVM may once again change strategies to a more optimal one.

So, JRockit may heuristically change garbage collection behavior at runtime, based on feedback from the memory system by doing:
  • Changing GC strategies
  • Automatic heap resizing
  • Getting rid of memory fragmentation at the right intervals
  • Recognizing when it is appropriate to "stop the world"
  • Changing the number of garbage collecting threads

-Xverbose:gc flag


To investigate the OOTB behavior of four mentioned Generational GC's (see also [2]), we have used:
  • -Xverbose:gc flag
Typically, the log shows things such as garbage collection strategy changes and heap size adjustments, as well as when a garbage collection take place and for how long.

OOTB Behavior


The OOTB behavior of four Generational GC's was investigated using one of our benchmarks, which has the following characteristics (see also [3,4]):
  • High churning rate
  • Allocating large objects

Here are the Average Response Time (ART; on relative scale) of four different GC strategies:
  • -Xgc:genpar
    • Baseline
  • -Xgc:gencon
    • -3.21%
  • -Xgc:genconpar
    • -36.56%
  • -Xgc:genparcon
    • -17.60%
After tests, it turns out that the default throughput GC (i.e., genpar) performs the best while other low-pausetimes GC's lag behind. There are a couple of reasons:
  • Large live data set
    • Our benchmark has large live data size (i.e., 1,471,425 KB) and we have assigned it a relative tight heap space (i.e., 2GB).
  • Concurrent sweeping phase cannot keep up with the workload generated from marking phase
    • This can be manifested by the following facts:
      • Emergency parallel sweep requested for both genconcon and genparcon
      • But, not genconpar

Changing GC Strategies at Runtime


The mark-and-sweep algorithm consists of two phases:[1,5,8]
  1. Mark phase
    • In which, it finds and marks all accessible objects (or live objects)
  2. Sweep phase
    • In which, it scans through the heap and reclaims all the unmarked objects
In the GC log of both gencon and genparcon, we have seen the following messages:

gencon
  • [INFO ][memory ][Thu Jan 9 06:02:12 2014][1389247332488][25536] [OC#6] Changing GC strategy from: genconcon to: genconpar, reason: Emergency parallel sweep requested.

genparcon
  • [INFO ][memory ][Wed Jan 8 21:15:00 2014][1389215700143][24163] [OC#1] Changing GC strategy from: genparcon to: genparpar, reason: Emergency parallel sweep requested.


Possible reasons could be that:
Sweeping and compaction (JRockit uses partial compaction to avoid fragmentation) tend to be more troublesome for parallelization. When you allow Java threads to be run concurrently in the sweep phase, it makes sweeping run longer and slower because more bookkeeping and/or synchronization needed. Also, using fewer GC threads introduces the issue that the garbage collector cannot keep up with the growing set of dead objects.

Conclusions


Ideally, an adaptive runtime would never need tuning at all, as the runtime feedback alone would determine how the application should behave for any given scenario at any given time. However, the computational complexity of mark-and-sweep algorithm is both a function of the amount of live data on the heap (for mark) and the actual heap size (for sweep). Depending on the amount of live data on the heap and system configuration, the OOTB behavior of chosen Generational GC's may or may not be able to keep up the garbage collections.

It is often argued that automatic memory management can slow down execution for certain applications to such an extent that it becomes impractical. This is because automatic memory management can introduce a high degree of non-determinism to a program that requires short response times. And, there are more bookkeeping or overhead. For example, it would need an Old GC to change garbage collection strategies. To avoid this, some manual tuning may be needed to get good application performance.

Before attempting to tune the performance of an application, it is important to know where the bottlenecks are. That way no unnecessary effort is spent on adding complex optimizations in places where it doesn't really matter.

References

  1. Oracle JRockit- The Definitive Guide
  2. JRockit: Parallel vs Concurrent Collectors (Xml and More)
  3. JRockit: A Case Study of Thread Local Area (TLA) Tuning (Xml and More)
  4. JRockit: Thread Local Area Size and Large Objects (Xml and More)
  5. Mark-and-Sweep Garbage Collection
  6. JRockit: All Posts on "Xml and More"
  7. The Unspoken - The Why of GC Ergonomics (Jon Masamitsu's Weblog)
  8. Mark and Sweep (MS) Algorithm (Xml and More)

No comments: