Understanding JVM Garbage Collection Part 10 – C4 (Azul Zing)

Azul Systems offers a commercial proprietary JVM called Zing. Zing (is only available on Linux) uses a proprietary virtual machine which is compatible with the appropriate version of the Java class libraries. Porting a Java application to Zing is as simple as using Zing. Azul initially dove into the hardware appliance business. It released a hardware appliance called Vega. These appliance were massive systems with hundreds of GB of RAM and thousands of processors. Vega was tailored to run hundreds of Java applications. Azul hardware appliance bushiness was not successful and they pivoted by releasing Zing. They used all their insights running and building and running massive Java systems to release an excellent JVM with a superior garbage collectors.

Zing provides a pauseless, generational, region based always compacting garbage collector called C4. The acronym C4 is derived for the algorithms 4 main features i.e. it is a continuously concurrent compacting collector. Azul Zing was specifically designed to be insensitive to fragmentation, allocation rates, mutation rates, use of soft or weak references, and heap topology [^15]. C4 unlike other generational garbage collectors supports simultaneous generational concurrency i.e. both young and old generation are collected concurrently using the same algorithm. Different generations are collected using a concurrent (non-stop-the-world) mechanisms that can be simultaneously and independently active on each generation. C4 is not ‘mostly’ concurrent, but fully concurrent and never resorts to a stop-the-world compaction. C4 is able to continuously perform concurrent young generation collections even during periods of high memory allocation. Unlike other GC algorithms C4 never reverts to fallback STW operations even during sustained high memory allocations. This is done without sacrificing response times. C4 also does concurrent compaction without the need for STW pauses. C4’s shining feature is its concurrent compaction which is responsible for significantly reducing GC pauses. Just like other concurrent collectors C4 is also region based. In C4 regions are called pages. Although the C4 algorithm is pauseless the implementation of the C4 algorithm does require the use of safepoints. Thus like other concurrent GC C4 does have STW phases but these are very small. These safepoint are injected when the GC alogrithm is transacting for one phase to another. The pause time is not determined by the heap size, allocation rate and the object graph. In most cases pauses are close to less that 1ms.

Azul Zing takes a different approach to memory management than a regular a JVM. To understand Zing’s approach lets quickly revise virtual memory basics. Virtual memory is an abstraction that enables the OS to project a much larger memory address space than it physically posses. In simple terms virtual memory provides an illusion of physical memory that in reality it does not posses. The OS projects a much larger address space. The Memory Management Unit (MMU) enables the OS to translate virtual addresses to physical addresses. Virtual memory address space is divided into sections called pages. These pages are usually 4k in size. These pages are swapped in and out of memory as required. If an application requests access to an address that is not backed by physical memory a page fault is triggered. A page replacement algorithm is used to determine the page that needs to be swapped out. Usually a Least Recently Used (LRU) algorithm or similar variants are used to determine the page that needs to be swapped out.

Physical memory (RAM) on a machine is almost never enough to accommodate the large number of applications handled by a OS. Virtual memory enables the OS to run/accommodate a large number of application that would never fit into RAM at the same time. The size of virtual memory is determined by the CPU’s word size (32 or 64 bit) and the amount of disk space on the physical machine. In theory a 64 bit CPU is able to accommodate 256TB of virtual memory.

Apart from its main benefit, providing a much large memory address space, virtual memory has a whole host of of other benefits. These includes:

  • Reduces Fragmentation – The use of virtual memory reduces both internal and external fragmentation in RAM. Internal fragmentation occurs when memory allocated to a process exceeds its current needs but unused memory cannot be used by another process. External fragmentation is said to occur when a new process cannot be accommodated even thought enough memory is available.
  • Efficient use of RAM – A any given point of time a process only uses a fraction of its code and data. One of the main advantages of virtual memory is that it enables efficient use of physical memory. Only code that is being actively being processed is loaded in physical memory. Additionally physical memory is also assigned to the process that actually needs it at a particular point of time.
  • Provides a sand boxed addressed space – Virtual memory enables the OS to provide a sand boxed addressed space thus preventing malicious or unintended access of another process address space.

The quintessential virtual memory page has a 1 -> 0..1 mapping to physical memory. A virtual memory page is either either backed by a physical page or does not have a corresponding physical page in memory. You can also have an interesting alternative. You can have two virtual memory pages backed by the same physical page.

Most JVM behave just like any other process i.e. they allocate memory via system calls. These system calls allocate space on the OS virtual memory system. It is the OS’s virtual memory subsystems responsibility to swap VM pages in and out of physical memory. Generally for this reason you should always configure your heap to be less than physical memory. One can allocate a heap that exceeds physical memory but almost always this will be less performant than having the entire heap in memory. This is due to paging activity that will occur as the heap exceeds physical memory.

Azul Zing does not use the OS memory sub-system. It has it own kernel-level software, Zing System Tools (ZST), than enables it to tailor the virtual memory to the JVM’s needs. ZST is provides Linux virtual and physical memory management enhancement. ZST enables the JVM to bypasses the Linux memory subsystem thus providing a number of benefits. ZST enables the JVM to rapidly manipulate virtual memory. ZST is C4 secret sauce that enables the C4 algorithm to optimise virtual memory and physical for its needs. This enable C4 to sustain high object allocation rate without noticeable pauses.

The C4 algorithm is a mark and compact algorithm. The GC algorithm has three main operations. (More detail on these phases later.)

  • Mark (Mark Phase) i.e. Identifying live objects.
  • Relocation (Compact Phase) – Moving live objects to a new location.
  • Remapping (Compact Phase) – Fixing references of newly moved objects.

C4 guarantees single pass marking and single pass references remapping. Central to assuring these guarantees is Load Value Barrier (LVB). C4 uses LVB barriers to support concurrent marking, compaction, and remapping. LVB are a read barrier which ensure that reference are “sane” when loaded before any mutator thread sees them. In other words the LVB helps synchronize concurrent threads thought the GC process. The LVB places invariants when the object is loaded i.e. conditions that must be satisfied before the object can be accessed. The LVB ensures two main things:

  • The LVB ensures that the object has been moved from the from-space to the to-space. In C4 lingo a marked thought object is one that has been traversed using the marking phase i.e. moved to the to-space. Similarly a not marked through is a reference that has not been traversed during the marking phase. The LVB ensures that the reference state indicates that it has been “marked through” if a mark phase is in progress.
  • The LVB also ensures that the object reference is not pointing to an object that has been relocated i.e. the object has been moved but the reference still points to the objects original memory location. If a reference does point to a relocated object then this is immediately fixed. Reference must point to the location in memory where the object has been relocated too. This trap will get triggered during the relocate and remap phases.

If the above conditions are not met a “trap condition is triggered” i.e. the reference is corrected to meet the above condition before making the reference visible to application code.The trap either marks the object or corrects the reference of the reallocated object. These traps provide the C4 algorithm a self-healing property. The same object reference will be corrected once and all applications threads will benefit from this correction. This ensures that there is a set amount of work that needs to be done during the the mark and relocation phases.

Zing stores the state of the pointers inside the reference itself. This is know as a coloured pointer. In a 64-bit system the references pointer have a additional bits that can be used the to store the state of the references. The actual reference only takes up around 44 bits and the additional bits can be used to store the state of the pointer. These bits are used to store the generation of the object, Not Marked Through (NMT) bit and whether the object has been remapped. When the algorithm is in progress we check the state of the object against a global state to determine if the object should be trapped. This global state is stored in a registers. If the trap is not triggered we progress with the normal path other wise we continue with the normal behavior. Note the address loaded needs to be unmasked as it contains more that just the object memory location.

A> Zing does not use a Tri-Coloured abstraction. Please do not confuse coloured pointers with a Tri-Coloured abstraction.

C4 has three main stages during a GC cycle:

  • Mark Phase – The entire object graph is marked. The marking phase is what is called a precise wavefront marking i.e. simply put it marks the object graph as if it is unmodified through out the marking process. There is no need for a final remark to catch reference modifications that might have occurred due to mutator threads. C4 ensure that the marking is correct even though even though mutator threads are concurrently updating the object graph via the LVB. This allows the C4 marking to be single pass. The LVB ensures a self-healing process as it traps and marks any reference that is loaded but has not been marked. The LVB ensures any modifications to the object graph are trapped and marked appropriately.
  • Relocation Phase – The relocation phase moves live objects so that sparsely populated pages can be freed. Sparsely populated pages are the ones with a few live objects and a large number of dead objects. Live objects from these pages are moved so that a particular page can be freed. C4 also follows a garbage first philosophy and selects the pages according to the amount of dead objects it contains. C4 selects a set of virtual pages to be released. C4 starts by selecting a page with the greatest amount of garbage. This virtual page is protected and cannot be accessed. The page being released is called the “from” page and relocates objects to the to page. Reference forwarding information is stored outside the from page in a forwarding tables. As soon as the relocation is done the from page’s physical memory is released. Although the physical page is released the virtual page backing this physical page is still needed as there will be object references into the page. Any object references into this pages is protected via the LVB.
  • Remap Phase – The remap phase fixes references that are no longer valid because of the relocation phase. Remapping is lazily occurring via the LVB. As stale references are accessed by mutator threads the LVB traps and fixes stale references. However, to complete a GC cycle we need to fix all stale references. During the phase all stale references are fixed i.e. references are updated to point to the relocated area. At the end of the remap phase no stale references will exist. After all the stale references have been fixed the “from-virtual” page is recycled. C4 is not in a hurry to perform the remap phase as there are not critical physical being held up. Additionally C4 fuses the remap phase with the next GC cycles mark phase as both phases are traversing the same object graph. This phase is folded in with the next phase i.e. the next marking phase happens in conjunction with the remap phase. The last phase of the GC cycle is the start of the next GC cycle.
No comments yet.

Leave a Reply

3 − 2 =