Understanding JVM Garbage Collection – Part 9 Garbage First (G1) Garbage Collector (GC)

Garbage First (G1) Garbage Collector (GC)

The G1GC is a server-style generational, incremental, parallel, mostly concurrent, stop-the-world evacuating garbage collector which is suitable for multiprocessor machines with a large amount of memory. G1GC aims to meets garbage collection pause-time goals with high probability, while achieving high throughput. G1GC monitors its stop the world pauses in order to try and achieve it pause time goals. Initially the main goals for G1GC where:

  • Easy to tune especially in comparison to CMS
  • Predictable GC pause times avoiding long GC pauses
  • Scales better with larger heap. Older garbage collectors would scale linearly i.e. the larger the heap the greater the pause time. G1GC aims to solve this problem.
  • Eliminate full STW

Over time G1GC has evolved into a general-purpose collector. In fact OpenJDK 9 has made G1GC its default garbage collector. The main motivation for switching to G1GC as a default collector was:

"Limiting GC pause times is, in general, more important than maximising throughput. Switching to a low-pause collector such as G1 should provide a better overall experience, for most users, than a throughput-oriented collector such as the Parallel GC, which is currently the default."

Garbage First (G1) Garbage Collector (G1)
Garbage First (G1) Garbage Collector (G1))

As with other GC algorithms G1GC has 2 algorithms one for young generation and the other for the old generation. The young generation algorithm is an stop the world, parallel, copying GC algorithm. On the other hand the old generation algorithm is a mostly concurrent marking algorithm with an incremental compacting. Note G1GC does not do any sweeping The incremental compacting phase piggyback on the young generation GC.

G1GC Regions
G1GC Regions

The G1GC divides the heap into small areas. These small areas are know as regions i.e. a small independent unit where memory can be allocated and reclaimed. The concept of regions enables G1GC to collect a subset of the heap instead of collecting the entire heap. G1GC heap is generational but generations are no longer contiguous. A region can be assigned to hold objects of any generation. Although non contiguous, regions still logically combined into Eden, Survivor and Tenured. A region can be free or assigned a particular generation i.e. young, survivor, tenured or humongous. Humongous regions are for objects that span more than one region. The number of regions per generation can change between collection sizes. G1GC adaptively resizes the number of regions per generation based on statistics collected. By default each region is one MB in size but get bigger for larger heaps. Regions size is also configurable. Permissible values include 1, 2, 4, 8, 16, 32, or 64 MB. A region size must be a power of two. By default the G1GC is looking for close to 2048 regions. The number of regions is calculated at startup. The calculation is done by dividing the total heap by permissible sizes and settling on an appropriate value which is close to or greater than 2000. Regions sizes can also be explicitly set using -XX:G1HeapRegionSize.

G1GC Collection Set
G1GC Collection Set

A regional garbage collector need to identify a subset of regions that need to be collected. This enables the GC to meet pause time goals even on large heaps. Regions grouped together for a collection cycle are called collection sets (CSets) i.e. a set of regions from where live objects are evacuated into new regions. During a young collection a CSet is made up of only young regions. G1GC introduced a new concept called mixed collections. Unlike the Parallel collector the G1GC old generations collection is are incrementally and concurrently collected. Mixed collections are made up of the all young regions and a few candidate old regions with the most amount of garbage.

Remember Set
Remember Set

Since G1 runs GC on select regions the garbage collector must be able to ascertain references between regions. This is a similar problem to tracking old generation to young generation references. As mentioned before old generation to young generation pointers are done via a card table. G1G1 uses a similar concept called remember set (RSet) to track these references. Just like card tables a RSet track references into a particular region. RSets enable us to identify references without having to scan the entire heap. RSets are used to track pointers from old generation regions to young generation regions and from old generation regions to other old generation regions. Since the young generation is collected a whole there are no young region to young region Rset. During a collection the CSets and RSets are scanned to determine inter region reference.

All free regions are added to a linked list called the "free list". Free regions are assigned as thread-local allocation buffer (TLAB) to a mutating thread. G1GC uses a compare and swap method to ensure that a TLAB is assigned to a single thread in a safe manner. The mutating thread is free to assign objects into their allocated region/TLAB without the need for any synchronisation. The thread is allocated new regions as its previously allocated region is exhausted. When eden fills up a young generation collection is triggered.

Young Generation Collection
G1GC Young Generation Collection

The young generation is collected as a whole and live objects are moved from the Eden into a survivor space. Objects in the survivor space are moved from one survivor region to another survivor region. G1GC collects statistics for each GC cycle. This enables it to resize the Eden and survivor spaces to meet pause time goals. Eden can range from 5 to 60% for the heap memory and get dynamically adjusted. By default 10% for memory must always be free thus ensuring enough survivor space. Reserved free memory space is tuneable via -XX:G1ReservePercent VM argument. Regions offer a host of benefits which include:

  • Enables Tenured to be collected in small chuck to meet latency goals
  • Generation can be resized as necessary
  • Heap defragmentation is not required with an evacuating collector. G1GC reclaims space by copying live data out of existing regions into empty regions.

Young generation G1GC is very similar to young generation GC in other collectors. The main difference is that G1GC operates on on non contiguous memory i.e. regions instead of a contiguous memory space. Young generation GC is done using a evacuating collectors as opposed to copying collectors employed by CMS or the parallel collector. Young generation garbage collection is a stop the world event. Live objects are moved from Eden to Survivor and between the two Survivor spaces. Objects from different Eden regions are compacted into survivor spaces. Objects that meet an aging threshold are promoted to old generation. The other key differences between G1GC young generation collection and other young generation collectors is that G1GC collects stats and resizes the young generation to meet pause time goals. In summary Young generation GC:

  • Operates on non-contiguous regions logically. Similar to other Young generation GC collectors the Young generation is split into Eden and two survivor spaces.
  • Young generation garbage collection is a stop the world events.
  • Young generation garbage collection is done using multiple threads.
  • Live objects are copied to survivor or old generation regions depending on the objects age.
  • Young generation size is recalculated based on
    • Young generation collection time.
    • The size of the RSet
    • Min and Max young generation limits
    • And last but not least the pause time threshold.

Over time objects get promoted into old regions. This leads to increasing heap occupancy. Humongous objects allocations also lead to increasing heap occupancy. To avoid running out of space the JVM triggers GC that not only covers the young generation but also the old generation. Old generation collection is done under the umbrella of a mixed cycle. G1 triggers a mixed generation cycle when initiating heap occupancy percent is met. The initiating heap occupancy percent is the heap size threshold that triggers a marking cycle. By default this value is 45 but can be tuned using the -XX:InitiatingHeapOccupancyPercent VM arguments.

G1GC employs an algorithm called Snapshot-At-The-Beginning (SATB). The algorithm can only identify garbage that existed at the point of the snapshot. The GC algorithm works on a virtual snapshot of the heap taken during initial mark phase. Objects considered live at the time of the snapshot are considered live for the purpose of the whole reclamation run. This provides better latency but comes at a cost i.e. retaining some memory that could actually be freed. This memory will be freed during the next GC cycle. During the marking phase the state of the references can change as application threads are working in conjunction with GC threads. G1GC uses pre-write barriers that intercept and records object reference mutation. The object reference modifications are appended to a link list called the update write buffer. The update write buffer is processed to ensure that only dead objects are collected. The update write buffers are processed using the concurrent refinement threads. The concurrent refinement threads a set of threads dedicated to processing the update write buffers and updating the remember set for the appropriate region.

The old generation collection is similar to the CMS old generation collection phases i.e. some of the old generations phases are done in conjunction with young generation phases. Old generation collections have two phases:

  • A concurrent marking cycle. The goal of the concurrent marking cycle is to finish marking before the heap is at capacity. Most of the cycle takes place concurrently while mutator threads are in progress.
  • A mixed collection where all young generations regions are collected and some and old generation regions are collected. Old generation collections are reported as G1GC mixed collections.

The G1GC mixed collection phases are:

  • Initial Mark (STW) - The Initial mark phase is a stop the world event. The initial mark phase identifies objects reachable from its GC roots. The main goal of the phase is to identifies survivor regions (not objects but regions) that might have root references to objects in the old generation. This phase is done in conjunction with the Young Generation garbage collection.
  • Scan root Regions (Concurrent phase) - Scan survivor regions to identify references to old generation objects. This is a concurrent phase that happens in conjunction with running application threads. This phase must complete before the next young generation GC.
  • Concurrent Mark (Concurrent phase) - G1 has concurrent marking phase which marks live objects found in the old generation regions. This phase may be interrupted by a young generation garbage collector. The concurrent marking extends from the end of a evacuation pause (where the initial marking work is done) to the remark phase.
  • Remark (STW) - This phase helps complete the concurrent mark phase. It completes the marking of live objects. During this phase the GC traces all remaining live objects, performs references processing (process strong and weak references) and SATB buffers.
  • Cleanup (STW) - This is a stop the world phase which scrubs the RSet and does other accounting tasks i.e. identify completely free regions. Empty regions are returned to the free list.
  • Space-reclamation phase - This phase is called the mixed collection phase. The mixed collection phase evacuates object from both young and old generation regions. The cycle evacuates all young generation regions and some old generation regions. The main goal is to maintain pause time goals. Thus statistics and pause time goals determines the number the number of old generation regions that need to be evacuated to yield enough free space. Old generation regions are selected based on the heap space that can be reclaimed and the number of mixed cycles configured. The space reclamation phase can span many cycles i.e. mixed cycles continue to happen until enough space is reclaimed or the number of configured mixed cycles is exhausted. The number of mixed cycles is configured via -XX:G1MixedGCCountTarget and defaults to 8. While -XX:G1HeapWastePercent determines if enough space has been reclaimed. This phase ends as soon as G1 determines that it has evacuated enough old generation regions. Essentially the G1 algorithm wants to ensure that GC is freeing objects faster than the allocation rate.

G1GC also has the concept of Full GC. Full GC are triggered when:

  • There are evacuation failures i.e. one cannot find free regions even after expanding the heap to its maximum size.
  • The heap size reaches its configurable hard-margin (G1ReservePercent).
  • It is unable to allocate a humongous object because it cannot find enough continuous regions to do the allocation.
  • Collection rate cannot keep up with allocation rate i.e. the garbage collector runs out of free regions before the marking phase has completed.

Frequent full GC's indicate a need for further tuning. The objective is to avoid a full GCs at all costs. Full GC's are analogous to Concurrent Mode Failures in CMS. To make matters worse until Java 9 Full GC's are single threaded STW collector i.e Serial GC collector. This makes full GC's expensive. Java 10 onwards have switched to using a parallel collector for full GC's. A full GC results in compacting the entire heap thus freeing up the most amount of memory possible.

No comments yet.

Leave a Reply

6 + eleven =