Log Management in the Cloud Age

In traditional systems, logs are lines of text intended for offline human consumption. With the advent of Cloud and Big Data, there is a paradigm shift in what can be logged. Systems can now log any piece of structured or unstructured data, application logs, transactions, audit logs, alarms, statistics or even tweets. Add to this the scale of logs. The earlier methodology of human analysis would not work in this kind of scenario. There has to be some automated mechanism for log analysis and deciphering useful information from them.

The trio of Logstash, Kibana and Elasticsearch is one of the most popular open source solutions for logs management. The three products together are known as the ELK stack and provide an elegant solution for log management.

Elasticsearch is a distributed, flexible and powerful, RESTful, search and analytics engine based on Apache Lucene Index. It gives the ability to move beyond simple-full text search. It categorizes data using indices which can be easily divided into shards (equivalent to partitions in RDBMS) and each shard can have zero or more replicas. This helps in providing near real-time search. Elasticsearch provides robust set of APIs and query DSL in addition to clients for most of the popular programming languages.

Elasticsearch was built from the ground up to handle any kind of data and. It can slice and aggregate data on the fly, based on any field in the logs. This creates valuable insights from raw logs.

Kibana is a data visualization engine used along with Elasticsearch. It helps in natively interacting with all data in Elasticsearch via custom dashboards. You can make dynamic, shareable and exportable dashboards. Data analyses becomes a breeze with Kibana’s elegant user interface using pre – designed or custom dashboards in real-time for on-the-fly data analysis. Kibana is easy to setup and can integrate seamlessly with different log aggregators like Logstash, Apache Flume, etc. See below for a sample Kibana dashboard:

kibana

Logstash is one of the most popular open source logs and events shipper/processor. It takes as input logs, processes and other time based events from any stem and stores data in a single place for additional processing. It scrubs logs and parses all data sources into an easy to read JSON format. This means that your logging data can now be analyzed in real time. You can then use Kibana to explore and monitor the analytics. The logstash – elasticsearch – kibana is illustrated below:

file-logstash-es-kibana

The ELK stack is very powerful tool for monitoring and analytics of cloud scale logs.

Posted in Cloud | Tagged , , , , | Leave a comment

Java 8 – What’s it got for you

Java 8 was released in March 2014 and promises to be revolutionary in nature. It is packed of some really exciting features at both the JVM and language level. Java 8 encompasses both Java SE 8 and Java ME 8 and might just be the most significant expansion of the Java platform yet. Lambda expressions along with the Stream API increase the expressive power of the platform and make it easier to develop applications for modern, multicore processors. Compact Profiles in Java SE 8 is a key step towards convergence of Java SE and Java ME and allows developers to use just a subset of the platform. Java 8 facilitates similar skills to be used in diverse scenarios like embedded Internet of Things (IoT) devices or enterprise servers in the cloud.

Let’s now look at some key features of Java 8

Language Features

Lambda Expressions

The most dominant feature of Java 8 is the support for Lambda expressions (closures). This addition brings Java to the forefront of functional programming, along with other functional languages such as Scala and Clojure. Lambda expressions are anonymous methods that provide developers with a simple and compact means for representing behavior as data. This helps in development of libraries that abstract behavior leading to more-expressive, less error-prone code. Lambda expressions replace use of anonymous inner classes. Consider the example below:

processSomething(new Foo() {
     public boolean isSuitable(int value) {
     return value == 11;
   }
});

This now gets simplified to:

processSomething(result -> result == 11);

Functional Interfaces

Java 8 comes with functional interfaces which are defined as an interface with exactly one abstract method (SAM). This even applies to interfaces that were created with previous versions of Java. It can have other methods also but there can be only one abstract method.

There are several new functional interfaces introduced in the package java.util.function.

  • Function<T,R> – takes an object of type T and returns R.
  • Supplier – just returns an object of type T.
  • Predicate – returns a boolean value based on input of type T.
  • Consumer – performs an action with given object of type T.
  • BiFunction – like Function but with two parameters.
  • BiConsumer – like Consumer but with two parameters.

There are several interfaces for primitive types like:

  • IntConsumer
  • IntFunction
  • IntPredicate
  • IntSupplier

The most interesting part of functional interfaces is that they can be assigned to anything that would fulfill their contract. See the following code sample as an example:

Function<String, String> newAddress = (address) -> {return "@" + address;};
Function<String, Integer> size = (address) -> address.length();
Function<String, Integer> size2 = String::length;

The first line defines a function that adds “@” as prefix to a String. The last two lines define functions that do the similar thing – get the length of a String. Based on the context, the Java compiler converts the method reference to String’s length() method into a Function (functional interface) whose apply method takes a String and returns an Integer. For example:

for (String s : args) out.println(size2.apply(s));

This would print out the lengths of the given strings.

There can be even custom functional interfaces. Use the @FunctionalInterface annotation to ensure the contract is honored. There would be a compiler error if it is not a SAM.

Method References
Method references let us reuse a method as a lambda expression. This helps as a lambda expression is like an object-less method. For example, let’s say you want to find out file permissions. Assume you have the following set of methods for determining permissions:

public class FilePermissions {
     public static boolean fileIsReadOnly(File file) {/*code*/}
     public static boolean fileIsReadWriteTxt(File file) {/*code*/}
     public static boolean fileIsWriteOnly(File file) {/*code*/}
}

If you want to find out permissions for a list of files, you can use method reference (assuming you already defined a method getFiles() that returns a Stream):

Stream readfs = getFiles().filter(FilePermissions::fileIsReadOnly);
Stream readwritefs = getFiles().filter(FilePermissions::fileIsReadWrite);
Stream writefs = getFiles().filter(FilePermissions::fileIsWriteOnly);

Default Methods

In order to leverage functional interfaces and lambda expressions for collection framework, Java needed another new feature, Default methods (also known as Defender Methods or Virtual Extension methods). Default interface methods provide a mechanism to add new methods to existing interfaces, without breaking backwards compatibility. Default methods can be added to any interface. It gives Java multiple inheritance of behavior, as well as types. It may be noted that this is different from other languages like C++ which provide multiple inheritance of state. Any interface (even functional interface) can have default methods. If a class implements that interface but does not override the method it will get the default implementation.

As an example see the default method added in Collection interface.

public interface Set<T> extends Collection<T> {

    ... // The existing Set methods

    default Spliterator<E> spliterator() {

        return Spliterators.spliterator(this, Spliterator.DISTINCT);
}
}

Static Methods

Just like default methods, interfaces can now have static methods also. Static methods cannot be abstract. Even functional interfaces can have static methods.

static Predicate isEqual(Object target) {
     return (null == target)
     ? Objects::isNull
     : object -> target.equals(object);
}

Annotations on Java Types

Prior to Java 8, annotations could be used only on type declarations – classes, methods, variable definitions. With Java 8, use of annotations has been extended for places where annotations are used, for e.g. parameters. This facilitates error detection by type checkers, for e.g., null pointer errors, race conditions, etc. See a sample usage below:

public void compute(@notnull Set info) {...}

Core Libraries

Streams

Although Lambdas are a great way to represent behavior as a parameter, but on their own they don’t go beyond providing a simpler syntax for anonymous inner classes. To extract the power of Lambdas, we need to use them along with extension methods to enhance the Collections APIs as well as adding new classes. In Java 8, this is provided by Streams API, which along with Lambdas, brings a more functional style of programming to Java.

Streams represent a sequence of objects somewhat like the Iterator interface. However, unlike the Iterator, it is not a data structure. It can be infinite (a stream of prime numbers does not need to end). In the same way that it is very easy to write an infinite loop in Java, it’s possible to use an infinite stream and never terminate its use.

Streams can be either sequential or parallel. They start off as one and may be switched to the other using stream.sequential() or stream.parallel(). The actions of a sequential stream occur sequentially on one thread. The actions of a parallel stream may happen simultaneously on multiple threads.

The Stream interface supports the map/filter/reduce pattern and does lazy execution. You can think of a stream as a pipeline with three parts:

  • The source that provides a stream of objects to be processed.
  • Zero or more intermediate operations that take the input stream and generate a new output stream. The output stream may be the same as the input stream, reduced or enlarged in size, or be objects of a different type. It is completely flexible.
  • A terminal operation. This is one that takes an input stream and either produces a result or has some kind of side-effect. Printing a message to the screen is an example where no result is produced, but there is a side effect.

The example shows the different parts of the pipeline.

int sum = transactions.stream(). // Source
filter(t -> t.getBuyer().getCity().equals(“London”)). // Intermediate Operation
mapToInt(Transaction::getPrice). // Intermediate Operation
sum(); // Terminal Operation

The filter and map methods don’t really do any work. They just set up a pipeline of operations and return a new Stream. All work happens when we get to the sum() operation. The filter()/map()/sum() operations are fused into one pass on the data for both sequential and parallel pipelines

Streams can be created by various different ways:

    • From collections and arrays
      • Collection.stream()
      • Collection.parallelStream()
      • Arrays.stream(T array) or Stream.of()
    • Static factories
      • IntStream.range()
      • Files.walk()
    • Custom
      • java.util.Spliterator()

Stream sources manage the following aspects:

      • Access to stream elements
      • Stream sources allow for the decomposition of a stream so that it can be processed in parallel. This is done internally using fork-join framework.
      • Stream characteristics
        • ORDERED – The stream has a defined order (it does not imply that it is sorted, just that the order is fixed). For example a List has a fixed order to the elements of that list. A HashMap does not have a defined order.
        • DISTINCT – No two elements in the stream are equal.
        • SORTED – The stream has an order that has been sorted according to some criteria, e.g. alphabetical, numeric. Any stream that is sorted is also ORDERED.
        • SIZED – The stream is finite, i.e. has a known size.
        • SUBSIZED – If this stream is split for the purposes of parallel processing the size of the child streams will be known. A balanced tree is an example where this is not true. The size of the tree itself is known, but the size of subtrees may not be.
        • NONNULL – There will be no null elements in the stream
        • IMMUTABLE – The stream cannot be structurally modified (no elements can be inserted, replaced or deleted)
        • CONCURRENT – The contents of the stream may be modified concurrently in a thread-safe way

Stream Intermediate Operations – Intermediate operations can affect the characteristics of the stream. For example when you apply a map to a stream it will not change the number of elements (each element in the input stream is mapped to an element in the output stream), but it could affect the DISTINCT or SORTED characteristics. The intermediate operations use lazy evaluation wherever possible.

Stream Terminal Operations – Invoking a terminal operation executes the pipeline. The pipeline gets constructed based on the source and any intermediate operations; processing can then start either sequentially or in parallel. Lazy evaluation is used, so elements are only processed as they are required. Terminal operations can take advantage of pipeline characteristics, for e.g., if a stream is SIZED when the toArray method is used an array can be allocated of the correct size at the start rather than having to be resized as more elements are received.

Optional

Java 8 comes with the Optional interface in the java.util package that helps in avoiding null return values (and thus NullPointerException). Optional interface is like a stream that can only have either one or zero elements. To create an Optional use the of or ofNullable methods (there are no constructors). See the example below:

Optional maybeSales = Optional.of(salesData);
maybeSales = Optional.ofNullable(salesData);

maybeSales.ifPresent(SalesData::printRevenue);

SalesData sales = maybeSales.orElse(new SalesData());

maybeSales.filter(g -> g.lastRead() < 2).ifPresent(SalesData.display());

If salesData in the first line is null a NullPointerException will be thrown immediately. If salesData might be null use ofNullable returns an empty Optional. Rather than using an explicit if (salesData != null)… you can use ifPresent() with a Lambda expression (in this case a method reference). If the Optional is empty nothing will be printed, as the lambda is only used if there is a value.

When we want to use the value of an Optional we can also provide a way of generating a value if one is not present using orElse (returns the parameter if empty), orElseThrow (throws a speficied type of exception if empty) or orElseGet (which uses a Supplier if empty)

Optionals can also be filtered using a Predicate. In the example the filter would only be used if a value is present. Filter returns an Optional that either contains the value of maybeSales (if the Predicate evaluates to true) or is empty (Predicate evaluates to false). ifPresent will then handle this Optional in the same way as before.

Concurrency Updates

There are certain updates related to concurrency.

  • Scalable update variables – The new variables like DoubleAccumulator, DoubleAdder, etc are good for multi-threaded applications where there is contention between threads for access to a variable that accumulates or adds values. They are good for frequent updates, infrequent reads.
  • ConcurrentHashMap updates – Improved scanning support, key computation
  • ForkJoinPool improvements – Completion based design for IO bound applications. Thread that is blocked hands work to thread that is running.

Date and Time APIs

Java 8 introduces a new java.time API that is thread-safe, easier to read and more comprehensive than the previous API. It provides excellent support for the international ISO 8601 time standard that global businesses use and also supports the frequently used Japanese, Minguo, Hijrah, and Thai Buddhist calendars. It has following new classes:

  • Clock – access to the current date and time
  • ZoneId – To represent timezones
  • LocalTime – A time without a time zone
  • LocalDateTime – Immutable date and time

Each of these classes has a specific purpose and has explicitly defined behavior without side effects. The types are immutable to simplify concurrency issues when used in multitasking environments. The API is extensible to add new calendars, units, fields of dates, and times.

Platform

Compact Profiles

With compact profiles, three well-defined API subsets of the Java 8 specification have been introduced. They offer a convergence of the Java ME Connected Device Configuration (CDC) with Java SE 8. With full Java 8 language and API support, developers now have a single specification that will support the Java ME CDC class of devices under the Java SE umbrella. Compact Profiles enable the creation of applications that do not require the entire platform to be deployed and run on small devices with limited storage.

The different profiles are as following:

  • Compact 1 – Smallest subset of packages that supports the Java language. Includes logging and SSL. This is the migration path for people currently using the compact device configuration (CDC). Size is 11 MB
  • Compact 2 – Adds support for XML, JDBC and RMI (specifically JSR 280, JSR 169 and JSR 66). Size is 16 MB
  • Compact 3 – Adds management, naming, more securoty and compiler support. Size is 30 MB.

None of the compact profiles include any UI APIs, they are all headless.

JavaFX 8

JavaFX 8 is now integrated with Java 8 and works well with lambda expressions. It simplifies many kinds of code, such as event handlers, cell value factories, and cell value factories on TableView. Key features are:

  • New Stylesheet – JavaFX 8 includes a new stylesheet, named Modena. This provides a more modern look to JavaFX 8. The older, Caspian, stylesheet can still be used.
  • Improvements To Full Screen Mode – For applications like kiosks and terminals it is now possible to configure special key sequences to exit full screen mode, or even to disable the ability to exit full screen mode altogether.
  • New Controls – JavaFX 8 includes a few new controls. Most notable of these is the date picker, which has been something people have been asking for. A combination table and tree view has also been included.
  • Touch Support – Gestures like swipe, scroll, zoom and rotate are supported as event listeners and touch specific events can be detected as well as multiple touch points.
  • 3D Support – All the features required for 3D support are present. Basic shapes that can be combined to form more complex shapes as well as the ability to construct shapes of arbritary complexity using meshes and trangle meshes.

Virtual Machine

Nashorn JavaScript Engine

Nashorn is a complete rewrite of the JavaScript engine included in the JDK and replaces the old Rhino version. JavaScript applications can be run on their own using the new jjs command, or integrated into Java code using the existing javax.script API. Some of the key features are:

  • Lightweight, high-performance JavaScript engine
  • Use existing javax.script API
  • ECMAScript-262 Edition 5.1 language specification compliance
  • New command-line tool, jjs to run JavaScript
  • Internationalised error messages and documentation

Removal Of The Permanent Generation

Tuning of the permanent generation that holds data used by the VM like Class and Method objects as well as interned strings, etc has been problematic in the past. The removal of the permanent generation will simplify tuning.

Summary

In summary, Java 8 offers a new opportunity for enhanced innovation for Java developers who operate on anything from tiny devices to cloud-based systems. This would increase in developer productivity and application performance through the reduced boilerplate code and increased parallel programming that lambdas offer. Java 8 offers best-in-class diagnostics, with a complete tool chain to continuously collect low-level and detailed runtime information.

You can see Java 8 adds powerful new features to all areas of the core Java platform: language, libraries and VM and client UI enhancements in JavaFX. By bringing the advantages of Java SE to embedded development, developers can transfer their Java skills to fast-evolving new realms in the IoT, enabling Java to support any device of any size. It is an exciting time to be a Java developer.

Posted in Java | Tagged , , | Leave a comment

Book Review: Kanban in Action

Good Introduction to Kanban

If you want to get an introduction to Kanaban, then this book is a must read. It uses an innovative technique of story-telling to introduce the various concepts. Human beings generally like to read stories and it keeps them engaged. Almost, all the Kanban concepts have been explained in a condensed form in the story. The authors have an elegant and lucid style of explaining concepts. However, it should have been mentioned in the beginning that it is a must to read about the chapter 1, otherwise later readers would not be able to relate with the Kanbaneros team.

Some of the facets of the book which I have liked are:

  • Toyota Visualization analogy
  • Making different types of stickies using avatars.
  • Good techniques to set WIP limits.
  • Swarming comes as a good technique.
  • Excellent notes on bottlenecks, standups, constraints.
  • Good insight into planning and estimates.
  • The fun part is towards the end where different games are introduced.

However, I feel there could have been more real life case studies which can help beginners to avoid the pitfalls of Kanban.

Overall a good attempt at introducing Kanban.

This review is also available at Amazon

Posted in Agile | Tagged , , | Leave a comment

Java Garbage Collectors – Moving to Java7 Garbage-First (G1) Collector

One of the key strengths of JVM is automatic memory management (Garbage Collection). Its understanding can help in writing better applications. This becomes all the more important as enterprise server applications have large amount of live heap data and significant parallel threads. Until recently, main collectors were parallel collector and concurrent-mark-sweep (CMS) collector. This blog introduces the various Garbage Collectors and compares the CMS collector against its replacement, a new implementation in Java7 i.e. G1. It is characterized by a single contiguous heap which is split into same-sized regions. In fact if your application is still running on the 1.5 or 1.6 JVM, a compelling argument to upgrade to Java 7 is to leverage G1.

We all know that Java programming language is widely used in large server applications which are characterized by large amounts of live heap data and considerable thread-level parallelism. These applications are often run on high-end multiprocessors. Although, throughput is important for such applications, but they are often also sensitive to latency. It becomes important for telecommunication, call-processing applications where delays of even milliseconds in setting up calls can adversely affect the user experience. The Java virtual machine (JVM) specification mandates that any JVM implementation must include a garbage collector (GC) to reclaim unused memory (i.e., unreachable objects). However, the behavior and efficiency of a garbage collector can heavily influence the performance and responsiveness of any application that relies on it.

HotSpot JVM Architecture

The HotSpot JVM architecture supports a strong foundation of features and capabilities that help in realizing high performance and massive scalability. The main components of the JVM include the class loader, the runtime data areas, and the execution engine.

1-jvmcomponents

The three main components of JVM responsible for application performance are heap, JIT compiler and Garbage Collector. All the object data is stored in heap. This area is then managed by the garbage collector selected at startup. Most tuning options help in sizing the heap and choosing the most appropriate garbage collector. The JIT compiler also has a big impact on performance but rarely requires tuning with the newer versions of the JVM.

While tuning a Java application, the key factors to consider are Responsiveness, Throughput and Footprint.

Responsiveness – Indicates how quickly an application or system responds with a requested piece of data. For applications that intend to be responsive, large pause times are not desirable. The aim is to respond in short periods of time. Examples include:

  • How quickly a desktop UI renders pages.
  • How prompt a website is.
  • How fast can a database be accessed.

Throughput – Maximizing the amount of work by an application in a specific period of time. High pause times may be acceptable for applications that focus on throughput. Throughput may be measured by the following criteria:

  • Number of transactions completed in a given time.
  • Number of jobs executed by a batch program in an hour.
  • The number of database queries completed in given time.

Footprint – The amount of heap size occupied by an application.

Typically, for any given application tuning, two out of the above mentioned three factors are chosen and worked upon. For example, if high throughput with minimal footprint is required, then the application would have to compromise on responsiveness. This is so as in order to keep footprint small, frequent garbage collection would be required which may lead to pausing the application during garbage collection.

The Java HotSpot VM garbage collectors are based on Generational Hypothesis. It is based on following principles:

  • Most objects die young
    • Only a few live very long
    • Longer they live, more likely they live longer
  • Old objects rarely reference young objects

Most allocated objects will die young. Few references from older to younger objects exist.

These two observations are collectively known as the weak generational hypothesis, which generally holds true for Java applications. To take advantage of this hypothesis, the Java HotSpot VM splits the heap into three physical areas, as depicted in figure below:

2-vmstructure

Young Generation – This is the place where most new objects are allocated. It is typically small and collected frequently. Since most objects in young generation are expected to die quickly, the number of objects that survive a young generation collection (also referred to as a minor collection) is expected to be low. Minor collections tend to be very efficient as they concentrate on a space that is usually small and is likely to contain a lot of garbage objects. Young generation is further compartmentalized into an area called Eden plus two smaller survivor spaces. Most objects are initially allocated in Eden. The survivor spaces hold objects that have survived at least one young generation collection and have thus been given additional chances to die before being considered “old enough” to be promoted to the old generation. At any given time, one of the survivor spaces (labeled From in the figure) holds such objects, while the other is empty and remains unused until the next collection.

Old Generation – Objects that are too big to fit in young generation are allocated directly from old generation. Similarly, objects that are longer-lived are promoted (or tenured) to the old generation. The old generation is typically larger than the young generation and it gets occupied more slowly. This results in old generation collections (also referred to as major collections) to be infrequent but lengthy.

Permanent Generation – The Permanent generation contains JVM metadata which describes the classes and methods used in the application. It is populated by the JVM at runtime based on classes in use by the application. In addition, Java SE library classes and methods may be stored here.

Different garbage collection strategies are followed for different regions.

Serial Collector – The Serial Collector does collection for both young and old generation, serially. There is a stop-the-world pause during both minor and major collections. The application processing resumes once the collection is finished. The Serial Collector works fine for client side applications that do not have low pause time requirements.

Parallel Collector – These days, machines with a lot of physical memory and multiple CPUs is quite common. The parallel collector takes advantage of multiple CPUs rather than keeping most of them idle while only one does garbage collection work. It uses a parallel version of the young generation collection algorithm utilized by the serial collector. It is still a stop-the-world and copying collector, but performing the young generation collection in parallel leads to decrease in garbage collection overhead and increase in application throughput. Server side applications that run on machines with multiple CPUs and don’t have pause time constraints benefit from parallel collector.

Concurrent Mark-Sweep (CMS) Collector – In many cases, end-to-end throughput is not as important as fast response time. Young generation collections do not cause long pauses most of the times. However, old generation collections, though infrequent, can cause long pauses, especially when large heaps are involved. To address this issue, the HotSpot JVM includes a collector called the concurrent mark-sweep (CMS) collector, also known as the low-latency collector. It collects the young generation the same way the Parallel and Serial Collectors do. Its old generation, however, is collected concurrently along with application threads. This results in shorter pauses.

Let us now see how CMS does garbage collection for both the young and old generations.

Young Generation Collection by CMS Collector – CMS collects young generation using multiple threads just like Parallel Collector. Figure below illustrates a typical heap ready for young generation collection:

3cms1

The young generation comprises of one Eden and two Survivor spaces. The live objects in Eden are copied to the initially empty survivor space, labeled S1 in the figure, except for ones that are too large to fit comfortably in the S1 space. Such objects are directly copied to the old generation. The live objects in the occupied survivor space (labeled S0) that are still relatively young are also copied to the other survivor space, while objects that are relatively old are copied to the old generation. If the S1 space becomes full, the live objects from Eden or S0 that have not been copied to it are tenured, regardless of their age. Any objects remaining in Eden or the S0 space after live objects have been copied are not live and need not be examined. Figure below illustrates the heap after young generation collection:

3cms2

The young generation collection leads to stop the world pause. After collection, eden and one survivor space are empty. Now let’s see how CMS handles old generation collection. It essentially consists of two major steps – marking all live objects and sweeping them.

Old Generation Collection in CMS

3cms3

The marking is done in stages. At the beginning there is a short pause, called initial mark, which identifies the set of objects that are immediately reachable from outside the heap. This has a stop the world pause. Thereafter, during the concurrent marking phase, it marks all live objects that are transitively reachable from this set. Since the application is running and updating reference fields (hence, modifying the object graph) while the marking phase is taking place, not all live objects are guaranteed to be marked at the end of the concurrent marking phase. To care of this, the application stops again for a second pause, called remark, which finalizes marking by revisiting any objects that were modified during the concurrent marking phase. As the remark pause has a substantial stop the world pause, multiple threads are used to increase its efficiency. At the end of the remark phase, all live objects in the heap are guaranteed to have been marked. Since revisiting objects during the remark phase increases the amount of work the collector has to do, its overhead increases as well. This is a typical trade-off for most collectors that attempt to reduce pause times.

The heap structure after mark phase(s) can be seen below where the live objects are in light blue colored blocks.

3cms4

3cms5

After marking of all live objects in old generation is done, the concurrent sweeping happens which sweeps over the heap, de-allocating garbage objects in-place without relocating the live ones. As the figure illustrates, object marked with dark color are assumed to be garbage. After the sweeping phase, the dark colored objects are removed and only blue colored (live) objects remain. The free space is not contiguous and the collector needs to employ a data structure (free lists, in this case) that records which parts of the heap contain free space. As a result, allocation into the old generation is more expensive. This imposes extra overhead to minor collections, as most allocations in the old generation take place when objects are promoted during minor collections. Another disadvantage of CMS is that it typically has larger heap requirements. There are few reasons for this. First, a concurrent marking cycle lasts longer than that of a stop-the-world collector. And it is only during the sweeping phase that space is actually reclaimed. Given that the application is allowed to run during the marking phase, it is also allowed to allocate memory, hence the occupancy of the old generation potentially will grow during the marking phase and drop only during the sweeping phase. Additionally, despite the collector’s guarantee to identify all live objects during the marking phase, it doesn’t actually guarantee that it will identify all objects that are garbage. Some objects that will become garbage during the marking phase may or may not be reclaimed during the cycle. If they are not, then they will be reclaimed during the next cycle. Garbage objects that are wrongly identified as live are usually referred to as floating garbage.

The heap becomes fragmented due to the lack of compaction and it might also prevent the collector from using the available space as efficiently as possible.

Finally, it is very tedious to tune the CMS collector. There are lots of options and it takes a lot of experimentation to arrive the best configuration for a particular application.

Introducing Garbage-First (G1) Collector

In order to overcome the shortcomings of CMS and not comprise throughput, the Garbage-First (G1) Collector has been introduced. The G1 collector is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is suitable for applications that:

  • Can work concurrently along with application threads like CMS collector.
  • Compacts free space without lengthy GC induced pause times.
  • Require more predictable GC pause durations.
  • Want reasonable throughput performance.
  • Do not want a much larger Java heap.

G1 is supposed to be the long term replacement for the Concurrent Mark-Sweep Collector (CMS). G1 offers many benefits in comparison to CMS. First and foremost, G1 is a compacting collector. G1 compacts sufficiently which leads to elimination of potential fragmentation issues, to a large extent. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets. The Refinement, Marking and Cleanup phases are concurrent, while the young generation collection is done using multiple threads in parallel. The full garbage collection continues to be single threaded but if tuned properly applications should avoid full GCs. Another major benefit is that G1 is easy to use and tune. When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the completion of mark phase, G1 knows which regions are mostly empty and it collects in these regions first. This usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First.

G1 Heap Overview

3gone1

The heap in case of G1 is differently organized in comparison to earlier generational GCs. The heap is one large contiguous spaced partitioned into a set of equal-sized heap regions (approximately 2000 in number). The region size is chosen at startup and varies from 1 MB to 32 MB. There is no physical separation between young and old generation. A region may act as wither eden, survivor(s) or old generation. This provides a greater flexibility in memory usage. Objects are moved between regions during collections. For large objects (> 50% of region size), humongous regions are used. Currently, G1 is not optimized for collecting large objects in humongous regions.

Young Generation Collection in G1

4gone2

Live objects are evacuated (copied/moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation regions. It involves a stop the world (STW) pause. It’s done in parallel with multiple threads to shorten the pause time. Eden size and survivor size is calculated for the next young GC. Pause time goal are taken into consideration. This approach makes it very easy to resize regions, making them bigger or smaller as needed.

Old Generation Collection in G1

4gone3

The old generation collection starts with the Initial Marking phase and it is piggybacked on young generation collection. It is a stop the world event and survivor regions (root regions) which may have references to objects in old generation are marked.

4gone4

In the Concurrent Marking phase liveness information per region is determined while the application is running. Live objects are found over the entire heap. This activity may get interrupted by young generation collections. Any empty regions found (denoted as X) is removed immediately in the Remark phase.

4gone5

The Remark phase completes the marking of live objects in the heap. G1 collector uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what is used in the CMS collector. Empty regions are removed and reclaimed. Region liveness is now calculated for all regions and this is a stop-the-world event.

4gone6

In the Copying/Cleanup phase, G1 selects the regions with the low “liveness”. These regions can be collected fastest and this cleanup happens at the same time as a young GC. So both young and old generations are collected at the same time.

4gone7

After the cleanup phase, selected regions are collected and compacted. This is represented in dark blue region and the dark green region shown in the figure. Some garbage objects may be left in old generation regions and they may be collected later based on future liveness, pause time target and number of unused regions.

G1 Old Generation GC Summary

  • Concurrent Marking Phase
    • Calculates liveness information per region, concurrently while the application is running
    • Identifies best regions for subsequent evacuation phases
    • No corresponding sweeping phase
  • Remark Phase
    • Different marking algorithm than CMS
    • Uses Snapshot-at-the-beginning (SATB) which is much faster than what was being used in CMS
    • Copying/Cleanup Phase
  • Completely empty regions are reclaimed
    • Young generation and Old generation reclaimed at the same time
    • Old generation regions selected based on their liveness

G1 and CMS Comparison

Features G1 GC CMS GC
Concurrent and Generational Yes Yes
Releases Maximum Heap memory after usage Yes No
Low Latency Yes Yes
Throughput Higher Lower
Compaction Yes No
Predictability More Less
Physical Separation between Young and Old No Yes

Footprint Overhead

For the same application size, as compared to CMS, the heap size is likely to be larger in G1 due to additional accounting data structures

Remembered Sets (RSets / RSet) – The RSets track object references into a given region and there is one RSet per region. This enables parallel and independent collection of a region as there is no need to track whole heap to find references. Footprint overhead due to RSets is less than 5%. More inter-region references can lead to bigger Remembered Set which in turn leads to a slow GC.

Collection Sets (CSets / CSet) – The CSet is set of regions that will be collected in a GC cycle. Regions can be eden and survivor, and optionally after (concurrent) marking some old generation regions. All live data in a CSet is evacuated (copied/moved) during the GC. It has a footprint overhead less than 1%.

Command Line Options

To start using the G1 Garbage Collector use the following option:

-XX:+UseG1GC

To set target for the maximum GC pause time, use the following option:

-XX:MaxGCPauseMillis=200

Tuning Options

The main goal of G1 GC is to reduce latency. If latency is not a problem, then Parallel GC can be used. A Related goal is simplified tuning. The most important tuning option is XX:MaxGCPauseMillis=200 (default value = 200ms). It Influences maximum amount of work per collection. However, this is Best effort only.

A trigger to start GC can be given by the option -XX:InitiatingHeapOccupancyPercentage=n. It specifies percent of entire heap not just old generation. Automatic resizing of young generation has lower and upper bound of 20% and 80% of java heap, respectively. One should be cautious while using option as too low value can lead to unnecessary GC overhead and too high value can lead to Space Overflow.

Threshold for region to be included in a Collection Set can be specified by -XX:G1OldCSetRegionLiveThresholdPercent=n. One should be cautious while using option as too high value can lead to more aggressive collecting and too low value can lead to heap wastage.

The Mixed GC / Concurrent Cycle can be specified using -XX:G1MixedGCCountTarget=n. A too high value can lead to unnecessary overhead and a too low can lead to longer pauses.

Care must be taken if young generation size is fixed using the option –Xmn. This can cause PauseTimeTarget to be ignored. G1 no longer respects the pause time target. Even if heap expands, the young generation size is fixed.

Sample Application Test

Let’s create a sample application to to measure performance and behavior of CMS and G1 collectors. The basic algorithm is described below:

  • Create and add 190 Float Arrays into an Array List
  • Each Float Array reserves 4MB of memory, i.e. 1 x 1024 x 1024 = 4 MB
  • 4 MB x 190 = 760 MB
  • After each iteration the arrays are released and application sleeps for some time
  • Same steps are repeated certain number of times

I have run this application on a Windows 7 machine and VisualVM is used to analyze GC logs.

CMS Collector Results

Command Line Arguments to test CMS Collector:

java -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:CMS.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:\ GCTest 190

Observations with VisualVM

5cmsvm

G1 Collector Results

Command Line Arguments to test G1 Collector:

java -server -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:G1GC.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:\ GCTest 190

Observations with VisualVM

5g1vm

Results Comparison

Parameters G1 GC CMS GC
Time Taken for Execution 7 min 5 sec 7 min 56 sec
Max CPU Usage 27.3% 70.2%
Max GC Activity 2% 24%
Max Heap Size 974 MB 974 MB
Max Used Heap Size 763 MB 779 MB
  • G1 GC is able to reclaim max heap size
    • CMS is not able to do so
  • Lesser CPU utilization for G1 collection
  • G1 Heap goes to max size in three distinct jumps
    • CMS seems to gain max heap size in initial jump

Should You Move to G1 GC

Now that we have covered all the good things about G1 GC, the most pertinent question is in which cases it should be used. I would suggest exercising cautious optimism. Don’t rush blindly to embrace it as it also has some costs (high heap size) and may actually not be a good fit for certain scenarios. Evaluate all other options before moving to G1 GC.

If you don’t need low latency then you are better off using parallel GC. If you don’t need a big heap, then use a small heap and parallel GC. If you need a big heap, then first try CMS collector. If CMS is not performing well, then try to tune it. If all attempts at tuning CMS are not paying dividends then you can consider using G1 GC.

Image | Posted on by | Tagged , , , , , | 1 Comment