Garbage Collection in java

  • Garbage Collection is process of reclaiming the runtime unused memory automatically. In other words, it is a way to destroy the unused objects.
  • Unlike in C and C++ where developer had to reclaim memory manually, java does it automatically

Advantage of Garbage Collection

  • It makes java memory efficient because garbage collector removes the unreferenced objects from heap memory.
  • It is automatically done by the garbage collector(a part of JVM) so we don’t need to make extra efforts.

Why do you need to study garbage collection ?

  • Garbage Collection algorithm runs in background for all the java threads; which requires CPU time
  • If Garbage Collector collects the object which your program is going to use in future then what happens in
    • Your CPU is used to free up heap memory using GC algorithm.
    • Your program keeps on creating the same objects again in memory in near future (after garbage collection)
    • 2x the CPU is used in creating and garbage collecting the same object which eats up the speed of execution (and memory also is used inefficiently)
    • You have to allocate more ram to JVM to handle large data which is blindly garbage collected which technically is a bad practice.

How can an object be unreferenced?

  • By nulling the reference (Employee e1 = null;)
  • By assigning a reference to another(Employee e1 = e2;)
  • By anonymous object etc. (new Employee)

finalize() method

The finalize() method is invoked each time before the object is garbage collected. This method can be used to perform cleanup processing. This method is defined in Object class

protected void finalize(){}  

gc() method

The gc() method is used to invoke the garbage collector to perform cleanup processing. The gc() is found in System and Runtime classes.

public static void gc(){} 

Example :

public class TestGarbage1 {
	public void finalize() {
		System.out.println("object is garbage collected");
	}

	public static void main(String args[]) {
		TestGarbage1 s1 = new TestGarbage1();
		TestGarbage1 s2 = new TestGarbage1();
		s1 = null;
		s2 = null;
		System.gc();
	}
}

Output :

How garbage collection internally works in java ?

  • It finds the unused objects
  • delete or remove them to free up the memory.

The garbage collection mechanism uses several GC algorithms to perfrom the above mentioned task effectively

Garbage Collector Overview

  • When a program executes in Java, it uses memory in different ways.
  • The heap is a part of memory where objects live. 
  • Heap is the only part of memory that involved in the garbage collection process.
  • Also know as garbage collectible heap.
  • All the garbage collection makes sure that the heap has as much free space as possible.
  • The function of the garbage collector is to find and delete the objects that cannot be reached.

Important Points About Garbage Collector

  • It is controlled by a thread known as Garbage Collector.
  • Java provides two methods System.gc() and Runtime.gc() that sends request to the JVM for garbage collection.Remember, it is not necessary that garbage collection will happen.
  • Java programmer are free from memory management. We cannot force the garbage collector to collect the garbage, it depends on the JVM.
  • If the Heap Memory is full, the JVM will not allow to create a new object and shows an error java.lang.OutOfMemoryError.
  • When garbage collector removes object from the memory, first, the garbage collector thread calls the finalize() method of that object and then remove.

Object Allocation

  • First it checks if the object is a small object or a big object
  • Small object are usually 2kb to 128kb depending on JVM, heapsize and platform
  • Big Object are the one usually requires syncronization
  • Small Object stored in Thread Local Area
    • TLA – it is a free chunk on Heap, if space is full it requests for new TLA
  • Big Objects directly stored to heap

When an Object becomes eligible for Garbage Collection?

  • If the reference of that object is explicitly set to null.
  • The object also becomes eligible if it is created inside a block and the reference goes out of the scope once control exit from the block.
  • For each java program inside JVM there are multiple threads, each thread has its execution stack.If no threads has access to this object in their execution stack then it become eligible for garbage collection.

Who controls Garbage Collector ?

  • JVM controls the garbage collector. 
  • JVM decides when to perform the garbage collection.
  • You can also request to the JVM to run the garbage collector. But there is no guarantee under any conditions that the JVM will comply.
  • JVM runs the garbage collector if it senses that memory is running low.

Types of Garbage Collection

  • Serial GC: It uses the mark and sweeps approach for young and old generations, which is minor and major GC.
  • Parallel GC: It is similar to serial GC except that, it spawns N (the number of CPU cores in the system) threads for young generation garbage collection.
  • Parallel Old GC: It is similar to parallel GC, except that it uses multiple threads for both generations.
  • Concurrent Mark Sweep (CMS) Collector: It does the garbage collection for the old generation. You can limit the number of threads in CMS collector using XX:ParalleCMSThreads=JVM option. It is also known as Concurrent Low Pause Collector.
  • G1 Garbage Collector: It introduced in Java 7. Its objective is to replace the CMS collector. It is a parallel, concurrent, and CMS collector. There is no young and old generation space. It divides the heap into several equal-sized heaps. It first collects the regions with lesser live data.

GC Customization Options

FLAGDESCRIPTION
-Xms2048m -Xmx3gSets the initial and maximum heap size (young space plus tenured space).
-XX:+DisableExplicitGCThis will cause the JVM to ignore any System.gc() method invocations by an application.
-XX:+UseGCOverheadLimitThis is the use policy used to limit the time spent in garbage collection before an OutOfMemory error is thrown.
-XX:GCTimeLimit=95This limits the proportion of time spent in garbage collection before an OutOfMemory error is thrown. This is used with GCHeapFreeLimit.
-XX:GCHeapFreeLimit=5This sets the minimum percentage of free space after a full garbage collection before an OutOfMemory error is thrown. This is used with GCTimeLimit.
-XX:InitialHeapSize=3gSets the initial heap size (young space plus tenured space).
-XX:MaxHeapSize=3gSets the maximum heap size (young space plus tenured space).
-XX:NewSize=128mSets the initial size of young space.
-XX:MaxNewSize=128mSets the maximum size of young space.
-XX:SurvivorRatio=15Sets the size of single survivor space as a portion of Eden space size.
-XX:PermSize=512mSets the initial size of the permanent space.
-XX:MaxPermSize=512mSets the maximum size of the permanent space.
-Xss512kSets the size of the stack area dedicated to each thread in bytes.

GC logging flags

FLAGDESCRIPTION
-verbose:gc or -XX:+PrintGCThis prints the basic garbage collection information.
-XX:+PrintGCDetailsThis will print more detailed garbage collection information.
-XX:+PrintGCTimeStampsYou can print timestamps for each garbage collection event. The seconds are sequential and begin from the JVM start time.
-XX:+PrintGCDateStampsYou can print date stamps for each garbage collection event.
-Xloggc:Using this you can redirect garbage collection output to a file instead of the console.
-XX:+Print\TenuringDistributionYou can print detailed information regarding young space following each collection cycle.
-XX:+PrintTLABYou can use this flag to print TLAB allocation statistics.
-XX:+PrintReferenceGCUsing this flag, you can print the times for reference processing (that is, weak, soft, and so on) during stop-the-world pauses.
-XX:+HeapDump\OnOutOfMemoryErrorThis creates a heap dump file in an out-of-memory condition.

GC Logs Rotation

FLAGDESCRIPTION
-XX:+UseGCLogFileRotationTo enable rotation
-XX:NumberOfGCLogFiles=<number of files>must be >=1, default is one;
-XX:GCLogFileSize=<number>M (or K)default will be set to 512K.

GC Algorithms

1 – Mark and Sweep

  • It is initial and very basic algorithm which runs in two stages:
    1. Marking live objects – find out all objects that are still alive.
    2. Removing unreachable objects – get rid of everything else – the supposedly dead and unused objects.
  • To start with, GC defines some specific objects as Garbage Collection Roots. e.g. local variable and input parameters of the currently executing methods, active threads, static field of the loaded classes and JNI references. Now GC traverses the whole object graph in your memory, starting from those roots and following references from the roots to other objects. Every object the GC visits is marked as alive.

Note : The application threads need to be stopped for the marking to happen as it cannot really traverse the graph if it keeps changing. It is called Stop The World pause.

Second stage is for getting rid of unused objects to freeup memory. This can be done in variety of ways e.g.

Normal deletion (mark-sweep algorithm.)

Normal deletion removes unreferenced objects to free space and leave referenced objects and pointers. The memory allocator (kind of hashtable) holds references to blocks of free space where new object can be allocated.

Normal Deletion - Mark and Sweep
Normal Deletion – Mark and Sweep

Deletion with compacting (mark-sweep-compact algorithm.)

  • Only removing unused objects is not efficient because blocks of free memory is scattered across storage area and cause OutOfMemoryError, if created object big enough and does not find large enough memory block.
  • To solve this issue, after deleting unreferenced objects, compacting is done on the remaining referenced objects. Here compacting refer the process of moving referenced object together. This makes new memory allocation much easier and faster.
Deletion with compacting
Deletion with compacting

Deletion with copying (mark-copy algorithm.)

  • It is very similar to mark and compacing approach as they too relocate all live objects. The important difference is that the target of relocation is a different memory region.
Deletion with copying - Mark and Sweep
Deletion with copying – Mark and Sweep

2 – Concurrent Mark and Sweep (CMS)

  • CMS garbage collection is essentially an upgraded mark and sweep method. 
  • It scans heap memory using multiple threads. It was modified to take advantage of faster systems and had performance enhancements.
  • It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads.
  • It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.
  • To use CMS GC, use JVM argument: -XX:+UseConcMarkSweepGC

CMS GC Optimization Options

FLAGDESCRIPTION
-XX:+UseCMSInitiating\OccupancyOnlyIndicates that you want to solely use occupancy as a criterion for starting a CMS collection operation.
-XX:CMSInitiating\OccupancyFraction=70Sets the percentage CMS generation occupancy to start a CMS collection cycle.
-XX:CMSTriggerRatio=70This is the percentage of MinHeapFreeRatio in CMS generation that is allocated prior to a CMS cycle starts.
-XX:CMSTriggerPermRatio=90Sets the percentage of MinHeapFreeRatio in the CMS permanent generation that is allocated before starting a CMS collection cycle.
-XX:CMSWaitDuration=2000Use the parameter to specify how long the CMS is allowed to wait for young collection.
-XX:+UseParNewGCElects to use the parallel algorithm for young space collection.
-XX:+CMSConcurrentMTEnabledEnables the use of multiple threads for concurrent phases.
-XX:ConcGCThreads=2Sets the number of parallel threads used for the concurrent phases.
-XX:ParallelGCThreads=2Sets the number of parallel threads you want used for stop-the-world phases.
-XX:+CMSIncrementalModeEnable the incremental CMS (iCMS) mode.
-XX:+CMSClassUnloadingEnabledIf this is not enabled, CMS will not clean permanent space.
-XX:+ExplicitGCInvokes\ConcurrentThis allows System.gc() to trigger concurrent collection instead of a full garbage collection cycle.

3 – Serial garbage collection

  • This algorithm uses mark-copy for the Young Generation and mark-sweep-compact for the Old Generation. It works on a single thread. When executing, it freezes all other threads until garbage collection operations have concluded.
  • Due to the thread-freezing nature of serial garbage collection, it is only feasible for very small programs.
  • To use Serial GC, use JVM argument: -XX:+UseSerialGC

4 – Parallel garbage collection

  • Default for Java 8
  • Simimar to serial GC, It uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Multiple concurrent threads are used for marking and copying / compacting phases. You can configure the number of threads using -XX:ParallelGCThreads=N option.
  • Parallel Garbage Collector is suitable on multi-core machines in cases where your primary goal is to increase throughput by efficient usage of existing system resources. Using this approach, GC cycle times can be considerably reduced.

5 – G1 garbage collection

  • The G1 (Garbage First) garbage collector was available in Java 7 and is designed to be the long term replacement for the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.
  • This approach involves segmenting the memory heap into multiple small regions (typically 2048). Each region is marked as either young generation (further devided into eden regions or survivor regions) or old generation. This allows the GC to avoid collecting the entire heap at once, and instead approach the problem incrementally. It means that only a subset of the regions is considered at a time.
  • G1 keep tracking of the amount of live data that each region contains. This information is used in determining the regions that contain the most garbage; so they are collected first. That’s why it is name garbage-first collection.
  • Just like other algorithms, unfortunately, the compacting operation takes place using the Stop the World approach. But as per it’s design goal, you can set specific performance goals to it. You can configure the pauses duration e.g. no more than 10 milliseconds in any given second. Garbage-First GC will do its best to meet this goal with high probability (but not with certainty, that would be hard real-time due to OS level thread management).
  • If you want to use in Java 7 or Java 8 machines, use JVM argument as : -XX:+UseG1GC
Memory regions marked - G1

G1 Optimization Options

FLAGDESCRIPTION
-XX:G1HeapRegionSize=16mSize of the heap region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.
-XX:MaxGCPauseMillis=200Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.
-XX:G1ReservePercent=5This determines the minimum reserve in the heap.
-XX:G1ConfidencePercent=75This is the confidence coefficient pause prediction heuristics.
-XX:GCPauseIntervalMillis=200This is the pause interval time slice per MMU in milliseconds.

GC Customization Options

FLAGDESCRIPTION
-Xms2048m -Xmx3gSets the initial and maximum heap size (young space plus tenured space).
-XX:+DisableExplicitGCThis will cause the JVM to ignore any System.gc() method invocations by an application.
-XX:+UseGCOverheadLimitThis is the use policy used to limit the time spent in garbage collection before an OutOfMemory error is thrown.
-XX:GCTimeLimit=95This limits the proportion of time spent in garbage collection before an OutOfMemory error is thrown. This is used with GCHeapFreeLimit.
-XX:GCHeapFreeLimit=5This sets the minimum percentage of free space after a full garbage collection before an OutOfMemory error is thrown. This is used with GCTimeLimit.
-XX:InitialHeapSize=3gSets the initial heap size (young space plus tenured space).
-XX:MaxHeapSize=3gSets the maximum heap size (young space plus tenured space).
-XX:NewSize=128mSets the initial size of young space.
-XX:MaxNewSize=128mSets the maximum size of young space.
-XX:SurvivorRatio=15Sets the size of single survivor space as a portion of Eden space size.
-XX:PermSize=512mSets the initial size of the permanent space.
-XX:MaxPermSize=512mSets the maximum size of the permanent space.
-Xss512kSets the size of the stack area dedicated to each thread in bytes.

Summary

  1. Object life cycle is devided into 3 phases i.e. object creation, object in use and object destruction.
  2. How mark-sweepmark-sweep-compact and mark-copy mechanisms woks.
  3. Different single threaded and concurrent GC algorithms.
  4. Till java 8, parallel GC was default algorithm.
  5. Since java 9, G1 has been set as default GC algorithm.
  6. Also, various flags to control the garbage collection algorithm’s behavior and log useful information for any application.

Reference :

Leave a Comment