GC Mark, Sweep, and Compact Basics
The Mark and Sweep algorithm is the basis for garbage collection in Java. Although the actual algorithms used by the JVM are considerably more complex, the mark and sweep algorithm forms the basis of garbage collection in the JVM and must be understood.
As you might have guessed, there are two main phases in a mark and sweep GC cycle, i.e. Mark and Sweep.
- Mark Phase - The mark phase involves tracing the entire heap for live objects. The JVM does the following during the mark phase:
- The GC holds a pointer to every object allocated on the heap, known as the "allocation list", aka "ordinary object pointer (oop) table". Garbage collection iterates through every pointer in the allocation list and clears the mark bit, i.e. makes them eligible for collection.
- Find all live objects by navigating from their corresponding GC root and setting the mark bit.
- Sweep Phase - The sweep phase identifies dead objects by:
- Looping through the allocation list again
- Free memory where the mark bit is clear
- Removing identified dead objects from the allocation list.[^4]
- Looping through the allocation list again
The above two phases by themselves will leave memory fragmented. In order to remove fragmentation, a number of GC algorithms have a compaction phase. This phase relocates objects, so that allocated memory is contiguous. Since objects are relocated in memory all pointers/references to that object are also updated.
|Mark Sweep Compact
The diagram above illustrates the three phase. In phase one, we allocate five objects, i.e. A, B, C, D, and E. A has a reference to B, D, and E, while B has a reference to C. Let's assume an application threads removes the reference between A and B. When a GC cycle is triggered, the mark phase sets all the mark bits to zero. It then walks the object graph from the GC roots flipping the mark bit to 1 for all live objects. Objects with the mark bit set to zero are dead objects and must be removed. The sweep phase removes all objects with the mark bit as zero. Notice the sweep phase leaves memory fragmented. The compact phase rearranges objects so that we have contiguous free memory blocks.
A> The Mark, Sweep, and Compact phases described here are just the basics. The actual GC algorithms used by the JVM are much more complicated.[^4]
To run the garbage collection safely (without any concurrency issues) older GC algorithms stop all application threads (Stop the World Pauses) so that memory can be safely reclaimed. Thus a GC cycle does the following:
- Stops all application threads
- Runs garbage collection
- Starts all application threads again
|Stop The World Pauses
Newer GC algorithms (concurrent algorithms) employ various techniques to minimise STW GC pauses. Concurrent algorithms have managed to do most of the work concurrently but still require small GC pauses.
The above steps are a very simplistic approach to GC. GC algorithms employ various variations to the above three steps depending on what they are trying to achieve. Popular variations of these phases are:
Mark Sweep - This is the classic mark-and-sweep approach described above. This combination chooses to only execute the mark and sweep phase in a GC cycle. Thus dead objects are identified (mark) and removed (sweep), but memory is not compacted. Memory is recycled/reused in place. There is no memory movement to build contiguous memory blocks. The natural consequence of this approach is memory fragmentation. The CMS collector is an excellent example of a GC algorithm that only employs the mark and sweep phases. The diagram below illustrates the memory layout before and after a mark and sweep cycle. The key thing to notice is memory fragmentation after the dead objects are collected.
Mark Sweep And Compact - This variation adds a compact phase to the classic mark-and-sweep algorithm. The GC cycle compacts memory after the sweep phase. This helps provide a contiguous free memory block for future allocations. Objects are relocated, and references are updated to build these free contiguous blocks of memory. GC algorithms employing all three phases are known as compacting collectors. The SerialGC and ParallelGC are good examples of collectors that use the Mark, Sweep, and Compact phases.
Mark Sweep Compact
Mark and Copy aka Copying Collectors - The mark and copy is a variation of the mark, sweep and compact algorithm described above. Instead of sweeping or compacting dead objects, the algorithm copies live objects to a new area of the heap. The old area of the heap is now considered empty and can be used for future allocations. Both the mark and copy happen during the marking phase, i.e. copying live objects as they are encountered. The entire GC cycle can take place in a single tracing pass. This makes it a good fit for sparsely populated heaps. GC algorithms employing the mark and copy algorithm are known as copying collectors. The main disadvantage of a copying collector is its memory requirements. Theoretically, a copying collector requires a separate additional "to-space", which is the same size as its "from-space". The diagram below illustrates a copying collector's memory layout before and after a GC cycle.
Mark and Compact aka Evacuating Collectors - The mark and compact variations omit the sweep phase. Instead of removing dead objects (sweeping) live objects are moved into a contiguous heap area aka regions. GC algorithms employing the mark and compact approach are known as evacuating collectors. This strategy seems very similar to the mark-and-copy variation. The main difference is that mark-and-copy targets a different memory region and empties the current region. The Mark and Compact target's the same memory region. Mark-and-copy is designed to work on sparsely populated heaps. In contrast, mark-and-compact is designed to suit any kind of heap. The diagram below illustrates an evacuating collectors memory layout before and after a GC cycle.
The key thing to notice is that an evacuating collector is moving objects in the same heap area. The heap is divided into regions and objects are moved from one region to another.