JVM GC Jargon
The jargon used to describe GC algorithms can get confusing. This section describes commonly used terms and their meaning.
Live set – The number of live objects in a heap. The live set determines the amount of work and GC pause times for the garbage collector. Modern GC’s are able to de-link live set size and GC pause times.
Heap size – Refers to the total amount of memory reserved for the heap. A garbage collectors performance is potentially linked to its heap size. The chosen garbage collection algorithm is responsible for cleaning the entire heap or a subset of the heap. The size of the heap could have a direct impact on the GC collection and stop the world pause times. Naturally, larger heaps tend to have a higher collection and stop the world pause times. This was especially true for older GC algorithms. Modern GC algorithms use innovative methods so that heap size and stop the world pause times are not linearly correlated.
Heap Fragmentation and compaction – Memory allocation and deallocation will create gaps in memory. Gaps in the heap memory are called heap fragmentation or simply fragmentation. If fragmentation is not fixed, you land up in a situation where you have plenty of space on the heap but are unable to allocate memory for new objects. New objects cannot be allocated because of a lack of contiguous free space. Fragmentation is solved using compaction. Compaction relocates live objects so that the heap has free contiguous space.
Mutation/Mutating Thread/Application Threads: GC literature often uses the term mutating threads. Mutating threads refers to application thread that is changing object references. A mutating thread is a thread that is performing the mutation, i.e. modifying object references and thus modifying the object graph. The mutation rate is the rate at which references are updated in memory. GC literature also refers to these threads as application threads.
Stop-the-world – A Stop-the-world pause is a complete halt of all application threads to run a GC cycle. A number of GC’s algorithms require stop-the-world pauses so that they can garbage collect without the application code interfering with the GC cycle. A Stop-the-world pause guarantees that the heap is unmodified by the application while a GC is in progress.
Safe point – A safe point is a mechanism by which application threads can be brought to a know well-defined point vis-à-vis its interaction with the heap. In simple words, the thread is no longer executing application code. This enables the JVM to do work that would be difficult to do while application threads are running and mutating the heap. The GC implementation signals applications threads to pause by issuing a safe point call. A safe point provides the GC algorithm with a slot to safely perform GC. Application threads performed regular safe point checks to determine if they should pause. Once a safe point call is issued, all application threads must block at a safe point. Safe point checking is done via polling. Every method regularly checks a particular memory location to determine if a safe point has been issued. The JVM injects safe point checks into every method. Safe point checks are usually performed just before a method returns. At a safe point, all heap objects are in a consistent state, and all GC roots are known. Once all application threads have been blocked at a safe point, GC can be performed. Safe point’s are used in a number of situations, i.e., not just GC. These include:
- GC pauses
- Code deoptimisation
- Flushing code cache
- Hot swapping of classes.
- During various debug operation
The “Safe Point” diagram illustrates the safe point mechanism in a JVM. Running application threads are signalled to a safe point. All application threads eventually come to a halt. Notice application threads can take a varying amount of time to halt. Once all application threads are halted, GC takes place. Once GC has finished, the safe point is removed, and all application threads can continue.
Time to Safe point – The time taken for all application threads to come to a stop is known as Time To Safe point (TTSP). It is the difference between when the safe point call is issued and when the application threads come to a halt.
Moving Collectors – Moving collectors move objects into new areas and update all references to an object. This happens after a GC cycle. A non-moving collector will leave all live objects in the same memory location. This often leads to memory fragmentation. There are two types of moving collectors:
- Evacuating/Copying Collector – A evacuating collector is a type of moving collector. An evacuating GC is one where the memory region collected is left completely empty. All live objects are moved to another memory region. Evacuating collectors perform well since there is no memory compaction overhead. Evacuating collectors naturally defragment memory.
- Compacting Collector – A compacting collector is also a type of moving collector. The main goal of a compacting collector is to eliminate memory fragmentation. A compacting GC eliminates memory fragmentation by ensuring that all allocated memory is arranged in a single contiguous region at the end of a GC cycle.
Incremental Collector – Incremental GC performs a single GC cycle in phases. Incremental GC’s interleave a GC cycle in favour of application progress. Incremental GC’s run for a specified period doing part of the GC work. Incremental collectors may pause due to various conditions. The collector could terminate due to numerous reasons such as a time budget or a higher priority GC phase that needs carrying out. Incremental collectors would have done some productive work in the allocated time. This is in contrast to a phase that needs to be fully complete for it to have been productive.
Serial vs Parallel vs Concurrent Collectors – It is important to understand the difference between Serial, Parallel and concurrent collectors in order to appreciate various GC algorithms. These three concepts are essential concepts and must be internalised.
Serial collector – A serial collector is a single-threaded garbage collector. Application threads are paused while a GC cycle takes places. All GC phases take place using a single thread.
Parallel collectors – A parallel collector uses multiple GC threads to carry out the various phases in a GC cycle. Unlike serial collectors, parallel collectors use multiple threads for garbage collection. Both serial and parallel collectors do not work in conjunction with application threads. Application threads are paused so that GC threads can collect garbage. Usually, the number of threads used is tuneable via an application argument.
Concurrent collectors – Unlike parallel collectors the concurrent collectors work in conjunction with the application. GC work is carried out while application threads are running. Concurrent collectors may or may not use multiple GC threads for doing their work. In fact, the number of threads, used by these collectors can change depending on the GC cycle phase they are in. Just like the parallel collector the number of threads used is tuneable via application arguments.
NUMA-aware – Modern multi-socket machines increasingly implement a Non-Uniform Memory Access (NUMA) architecture. This is in stark contrast to previous architectures that implemented Uniform Memory Access (UMA). NUMA architecture divides memory into separate regions called NUMA domains. A NUMA domain is a set of CPU’s/Cores and its locally connected memory. Memory access latency is dependent on the memory location relative to the processor. Processors can access their own local NUMA domain faster than a remote domain. Garbage collectors that are aware of NUMA architectures and provide memory placement taking memory location into account and called NUMA-aware GC’s. Frequently used memory is locally located, while rarely used memory is remotely located. The diagram above illustrates the difference between UMA and NUMA architectures.
UMA vs NUMA
GC Invariant – The term invariant gets widely used when describing GC algorithms. The term can be confusing for the uninitiated. An invariant is a condition that must be met before a procedure/operation is called and must continue to hold true after the operations have been completed. GC invariants ensure that the program is in a particular state before after an operation. This helps reduce side effects as a result of the GC operation.