Sunday 15 May 2011

Java Garbage Collection Process

Efficient memory management is important to run a software system smoothly. In this article I will write my understanding of Java garbage collection process. Any feedback is welcome.  I hope that you will enjoy reading it.

So, What is garbage collection?

An application can create a large amount of short lived objects during its life span. These objects consume memory and memory is not unlimited. Garbage collection (GC) is a process of reclaiming memory occupied by objects that are no longer needed (garbage) and making it available for new objects. An object is considered garbage when it can no longer be reached from any pointer in the running program.

Heap plays a very important role in this process. Objects are allocated on the heap. In fact to understand how Java garbage collection works we need to know how heap is designed in the Java Virtual Machine (JVM).

Heap

Heap is the memory area within the virtual machine where objects are born, live and die. It is divided into two parts:

(a) First part - young space  contains recent objects, also called children.
(b) Second part - tenured space holds objects with a long life span, also called ancestors.

There is another particular memory area next to the heap is called Perm, in which the binary code of each class loaded.

Eden Survivor Survivor Virtual Objects Virtual Virtual
Young Tenured Perm

The young space is divided into Eden and two survivor spaces. Whenever a new object is allocated to the heap, the JVM puts it in the Eden. GC treats two survivors as temporary storage buckets. Young space/ generation is for recent objects and tenured space/generation is for old objects.  Both the young and tenured space contain a virtual space - a zone of memory available to the JVM but free of any data. This means those spaces might grow and shrink with time.

How the garbage collection (GC) process works:

Memory is managed in “generations” or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. The vast majority of objects are allocated in a pool dedicated to young objects (the young generation/space), and most objects die there. When the young generation fills up it causes a “minor collection”. During a minor collection, the GC runs through every object in both Eden and the occupied survivor space to determine which ones are still alive, in other words which still have external references to themselves. Each one of them will then be copied into empty survivor space.

At the end of a minor collection, both the Eden and the explored survivor space are considered empty. As minor collection are performed, living objects proceed from one survivor space to the other. As an object reaches a given age, dynamically defined at runtime by HotSpot, or as the survivor space gets too small, a copy is made to the tenured space. Yet most objects are still born and die right in the young space.

Eventually, the tenured space/generation will fill up and must be collected, resulting in a major collection, in which the entire heap is collected. It is done with the help of the Mark-Sweep-Compact algorithm. During this process the GC will run through all the objects in the heap, mark the candidates for memory reclaiming and run through the heap again to compact remaining objects and avoid memory fragmentation. At the end of this cycle, all living objects exist side by side in the tenured space.

Performance

Throughput and Pauses are the two primary measures of garbage collection performance.

Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation.

Pauses are the times when an application appears unresponsive because garbage collection is happening.

For example: in an interactive graphics program short pauses may negatively affect user experience whereas pauses during garbage collection may be tolerable in a web server.

Other two issues should be taken into considerations: Footprint and Promptness.

Footprint is the working set of a process, measured in pages and cache lines.

Promptness is the time between when  an object becomes dead and when the memory becomes available.

A very large young generation may maximize throughput at the expense of footprint, promptness and pause times. On the other hand young generation pauses can be minimized by using a small young generation at the expense of throughput.

There is no one right way to size generations. The best choice is determined by the way the application uses memory as well as user requirements.

Available Collectors

The Java HotSpot VM includes three different collectors, each with different performance characteristics:

(1) Serial Collector

  • it uses a single thread to perform all garbage collection work.
  • there is no communication overhead between threads.
  • it is best-suited to single processor machine.
  • it can be useful on multiprocessors for applications with small data sets (up to approximately 100MB).
  • it can be explicitly enabled with the option -XX:+UseSerialGC.

(2) Parallel/Throughput Collector

  • it performs minor collections in parallel, which can significantly reduce garbage collection overhead.
  • it is useful for applications with medium-to large-sized data sets that are run on multiprocessor or multithreaded hardware.
  • it can be explicitly enabled with the option -XX:+UseParallelGC

(3) Concurrent Collector

  • it performs most of its work concurrently (i.e. while the application is still running) to keep garbage collection pauses short.
  • it is designed for applications with medium-to large-sized data sets for which response time is important than overall throughput.
  • it can be explicitly enabled with the option -XX:+UseConcMarkSweepGC.

Default Settings

By default the following selections were made in the J2SE platform version 1.4.2

    - Serial Garbage Collector
    - Initial heap size of 4 Mbyte
    - Maximum heap size of 64 Mbyte
    - Client runtime compiler

In the J2SE platform version 1.5 a class of machine referred to as a server-class machine has been defined as a machine with

    1.  >= 2 physical processors
    2.  >= 2 Gbytes of physical memory

Default settings for this type of machine:

    - Throughput Garbage Collector
    - Initial heap size of 1/64 of physical memory up to 1Gbyte
    - Maximum heap size of ¼ of physical memory up to 1 Gbyte
    - Server runtime compiler

Some of the HotSpot VM options can be used for tuning:

-Xms <size>

    - specifies the minimal size of the heap.
    - this option is used to avoid frequent resizing of the heap when the application needs a lot of memory.

-Xmx <size>

    - specifies the maximum size of the heap.
    - this option is used mainly by server side applications that sometimes need several gigs of memory.

So the heap is allowed to grow and shrink between these two values defined by -Xms and -Xmx.

-XX:NewRatio = < a number>

    - specifies the size ratio between the tenured and young space.

For example: -XX:NewRatio = 2 would yield a 64 MB tenured space and a 32 MB young space, together a 96 MB heap.

-XX:SurvivorRatio = < a number >

    - specifies the size ratio between the eden and one survivor space.

For example: with a ratio of 2 and a young space of 64 MB, the eden will need 32 MB of memory whereas each survivor space will use 16MB.

-XX:+PrintGCDetails

    - causes additional information about the collections to be printed.

-XX:MaxPermSize=<N>

    - using this option the maximum permanent generation size can be increased

References:

1. Know Your Worst Friend, the Garbase Collector by Romain Guy.

2. Virtual Machine Garbage Collection Tuning

3. Ergonomics in the 5.0 JavaTM Virtual Machine

2 comments:

  1. Nice article sanjoy, you have indeed covered the topic in great detail. I have also blogged about How garbage collection Java works some time back. let me know how do you find it.

    Thanks
    Javin
    Why wait() and notify() method must be called from synchronized context

    ReplyDelete