Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Java IoT

Java IoT: Blog Post

Hunting Lost Treasures: Understanding and Finding Memory Leaks

Week 5 of our 2010 Application Performance Almanac

Searching for memory leaks can easily become an adventure – fighting through a jungle of objects and references. When the leak occurs in production time is short and you have to act fast. Like in a treasure hunt, we have to interpret signs, unravel mysteries to finally find the “lost” memory.

Memory leaks – together with inefficient object creation and incorrect garbage collector configuration – are the top memory problems. While they are a typical runtime problem, their analysis and resolution worries developers. Therefore I will focus in this post on how to analyze memory problems by covering how to find those problems and providing some insights into the anatomy of memory leaks.

Packing Our Equipment
What do we need for effective memory diagnosis? We need a heap analyzer for analyzing heap content and a console to collect and visualize runtime performance metrics. Then we are well-equipped for our expedition. Which tools you are going to choose depends on the platform you are using, the money you want to spend and your personal preferences. The range goes from JVM tools, to open source tools to professional performance management solutions.

The Heap Dump
A heap dump allows us to get a snapshot from the JVM memory to analyze its content. Heap dumps can be triggered in multiple ways. There are JVM parameters like XX:+HeapDumpOnOutOfMemoryError which will trigger a heap dump in case of an OutOfMemoryError. Unfortunately this option is not enabled by default; however, I recommend switching it on by default. There is nothing more frustrating than trying to reproduce a problem just because you failed to get all necessary information upfront. Alternatively you can also trigger a heap dump while the JVM is running. These tools use the JVM Tooling Interface (JVMTI) to retrieve this information from the JVM.

The biggest issue with Heap Dumps is that their format is not standardized and is different between different JVMs. JSR 326 is working on a standardized way to access heap information. Defining a standardized API to access heap dump information should enable the use of a single tool to work with different heap dump formats. If you cannot wait for the JSR to be implemented you have to choose a tool which supports the required formats or use tools like dynaTrace which access the heap information directly and therefore work across JVM implementations.

What We Get
The information within the heap dump may also vary based on the JVM as well as the JVM version. However there is certain information which is contained in every heap dump. We get information about the objects – their classes – as well as references on the heap. Additionally we get information about the size of an object. This size is often referred to as shallow size – the size of the object itself without any referenced data structures. Newer JVMs additionally support the collection of values for primitive data types like Strings or Integers as well. Some tools also indicate the number of survived Garbage Collection cycles by performing special heap profiling.

Depending on the size of your JVM’s heap, the amount of information can be huge. This affects the heap dump creation time as well as the processing time of the dump; needless to mention that the analysis itself gets more complex. Therefore some tools provide means to collect only the number of objects. While providing less detail, this approach has the advantage of being much faster. By creating a series of dumps over time and then comparing the object counts of the dumps, we can immediately see which objects are growing.

Naturally this will show a lot of primitive data types and a number of classes we might have never seen before because they are internal to the JDK or other libraries we are using. We skip those classes and look for objects of our own classes which grow. This already provides a good indication of a potential memory leak. If we then additionally can see the allocation stack of these object, we might be able to identify the memory leak without even having to analyze a full heap dump.

JVM Metrics
In addition to heap dumps we will also use JMX-based memory metrics of the JVM to monitor the heap at runtime. Based on the changes of memory consumption over time we can see whether there is a memory leak at all. Monitoring memory usage of the JVM is essential in any diagnosis process. These metrics should – no, must – be collected by default during load tests and also in production. Relating this metric to monitoring data – like the types of request at a certain time – will also be a good indicator for potential memory problems. While monitoring will not prevent you from running into OutOfMemoryErrors, it can help to proactively resolve performance problems.

I recall a customer situation where we have seen sudden spikes in heap usage. While they never caused an OutOfMemoryError they still made us feel uncomfortable. We then correlated this information to other monitoring data to find out what was different when the spikes occurred. We then realized that there were some bulk processing operations going on. Diagnosing this transaction we realized that submitted XML was transformed into a DOM tree for further processing. As processing time depended on the amount of data, these objects potentially stayed in memory for minutes – or longer. The issues could then be fixed, tested and deployed into production without users every being affected by it.

The only potential shortcoming of monitoring heap usage is that slowly-growing memory leaks might be more difficult to spot. This is especially true if you happen to look at the data in the wrong granularity. In order to overcome this issue I use two different charting intervals; the last 32 days for visualizing long term trends and the last 72 hours for short-term and more fine-granular information.

Besides potential memory leaks JVM metrics also help us to spot potential Garbage Collector configuration problems. Our primary metrics are the number and the time of Garbage Collections.

Let’s Go Hunting
As I’ve already discussed in another post, memory leaks in Java are not “classical” leaks. As the Garbage Collector automatically frees up unreferenced objects, it has taken this burden away from us. However we as developers have to ensure that all references to objects are freed up if we no longer need them. While this sounds very simple it turns out to be quite difficult in reality.

Looking a bit closer at the problem we realize that it is a specific kind of reference which causees memory leaks. Every object we allocate within the scope of our execution will be freed up automatically after leaving the method scope. So memory leaks are caused by references which exist beyond our current execution scope like Servlet sessions or caches and any objects stored in static references.

A central concept in understanding the origins of memory leaks is Garbage Collection roots. A GC root is a reference which only has outgoing and no incoming references. Every object on the heap has at least one GC root. If an object is no longer referenced by a GC root it is marked as unreachable and ready for Garbage Collection. There are three main types of GC roots.

  • Temporary variables on stack of threads
  • Static fields of classes
  • Native references in JNI
Garbage Collection Roots

Garbage collection roots and other heap objects

A single object however will not cause a memory leak. For the heap to fill up continuously over time we have to add more and more objects over time. Collections are the critical part here as they allow us to grow continuously over time, while holding an ever-increasing number of references. So this means that most memory leaks are caused by collections which are directly or indirectly referenced by static fields.

Enough of theory; let’s look at an example. The figure below shows the reference chain of a HTTP Session object – specifically its implementation in Apache Tomcat. The session object is key in ConcurrentHashmap which is referenced by the ThreadLocal storage of the Servlet threads. They are then kept within a Thread array, which is again part of a ThreadGroup. The ThreadGroup is then referenced by the Thread class itself. You can see even more details looking at the figure below.

Heap Root Walk of an HTTP Session Object

Heap Root Walk of an HTTP Session Object

This shows that most memory problems can be tracked back to a specific object on the heap. In memory analysis you will in this context often hear about the concept of dominators or the dominator tree.

The concept of a dominator comes from graph theory and is defined as follows: A node dominates another node if it can only be reached via this node. For memory management this means that A is a dominator of B if B is only referenced by A. A dominator tree is then a whole tree of objects where this is true for the root object and all referenced objects. The image below shows an example of a dominator tree. (You might want to get a coffee now and think about this :- )).

Dominator Tree Example

Dominator Tree Example

In case there are no more references to a dominator object all referenced objects will be freed up as well. Large dominator trees are therefore good candidates for memory leaks.

Post Mortem versus Runtime Analysis

When diagnosing memory leaks we can basically follow two approaches. Which one to choose depends mostly on the situation. In case we already ran into an OutOfMemoryError we can only perform a post-mortem analysis, if we started our JVM with the proper JVM argument as stated above. While this option has been added in Java 6, JVM vendors have back-ported this functionality also in older JVM versions. You should check whether your JVM version supports this feature.

The “advantage” of post-mortem memory dumps is that the leak is already contained in the dump and you need not spend a lot of time reproducing it. Especially in case of slowly-appearing memory leaks or problems which occur just in very specific situations, it can become close to impossible to reproduce the problem. Having a dump available right after the error occurred can save a lot of time (and nerves).

The biggest disadvantage – besides crashing a production system – is that you will miss a lot of additional runtime information. The dominator tree however is highly valuable to find the objects responsible for the memory leak more or less easily. This information combined with good knowledge of the source code often helps to resolve the problem.

Alternatively, continuously increasing memory consumption already indicates a memory leak. Well, this does not change the situation that the JVM would crash eventually, but we can already start to search for the leak proactively. Additionally we can prevent users from being affected by the memory leak by restarting the JVM for example.

As creating these heap dumps means that all running threads have to be suspended, it is good advice to redirect user traffic to other JVMs. Very often the collected data will be sufficient for identifying the leak. Additionally we can create a number of snapshots to identify objects growing continuously. Solutions like dynaTrace additionally allow tracking the size of specific collections including information where they have been instantiated. This information very often helps experienced developers to identify the problem without extensive heap analysis.

Size Does Matter
A central factor in heap dump analysis is the heap size. Bigger does not mean better. 64bit JVMs represent a special challenge here. The huge number of objects results in more data to be dumped. This means that dump take longer and more space is required for storing the dump output. At the same time analysis of dumps takes longer as well. In particular algorithms for calculating garbage collection sizes or dynamic sizes of objects show decreasing runtime performance for bigger heaps. Some tools – at least in my experience – already have problems even opening dumps bigger than about 6 GB. The generation of heap dumps also requires memory within the JVM itself. In the worst case this can mean that the generation of a dump is no longer possible at all. The main reason lies within the implementation of the JVMTI heap dump methods.

First every object needs a unique tag. This tag is later used to analyze which objects are referenced by others. The tag is of the JNI type jlong which is 8 bytes in size. On top of that there is also the memory consumption of JVM internal structures. The size of these structures depends on the JVM implementation and can be up to 40 bytes per object. This is why we at dynaTrace specifically focus on supporting the analysis of bigger and bigger heap dumps.

The general advice is to work with smaller heaps. They are easier to manage and in case of errors easier to analyze. Memory problems also show up faster than in large JVMs. If possible it is better to work with a number of smaller JVMs instead of one huge single JVM. If however you have to work with a large JVM it is indispensible to test in advance whether it is possible to analyze a memory dump. A good test is to create a heap dump from a production-sized instance and calculate the GC size of all HTTP sessions. In case you have problems solving this simple problem, you should either upgrade your tooling or decrease your heap size. Otherwise you might end up in a situation where you have no means to diagnose a memory leak in your application.

The best memory leak is the one you do not have. So the best approach is to already test during development for potential memory leaks. The best means are long-running load tests. As our goal is less about getting performance results but rather finding potential problems we can work with smaller test environments. It might even be enough to have the application and the load generator on the same machine. We should however ensure that we cover all major use cases. Some memory leaks, however, might only occur in special situations and are therefore hard to find in testing environments. Regularly capturing heap dumps during the test run and comparing them to find growing objects, however, helps to identify potential leaks.

Comparison of Heap Dumps over Time

Comparison of Heap Dumps over Time

Memory leaks are amongst the top performance-related problems in application development. At the beginning analysis might look extremely complex. However a proper understanding of the “anatomy” of a memory leak helps to find those problems easily, as they follow common patterns. We however have to ensure that we can work with the information when we need it. This means dumps have to be generated and we must be able to analyze them. Long term testing also does a good job in finding leaks proactively. Increasing the heap is not a solution at all. It might even make the problem worse. There are a lot of tools out there that support in memory analysis; each one with their strength and weaknesses. I might be a bit biased here, but for a general overview of available functionality I recommend looking at memory diagnosis in dynaTrace . It provides a good overview of different approaches towards memory analysis.

This article is based on the performance series I did with Mirko Novakovic of codecentric.  Mirko also did a great post on OutOfMemoryErrors!

Related reading:

  1. SharePoint: Identifying memory problems introduced by custom code SharePoint is a great platform that makes it easy to...
  2. Can you trust your .NET Heap Performance Counters? Memory Management is a tough topic in managed runtime environments...
  3. Java Memory Problems Memory Leaks and other memory related problems are among the...
  4. Memory Leak in EntityDataSource when controlling lifetime of your ObjectContext The EntityDataSource is a control you can use on your...
  5. .NET Performance Analysis: A .NET Garbage Collection Mystery Memory Management in .NET is a broad topic with a...

More Stories By Alois Reitbauer

Alois Reitbauer is Chief Technical Strategist at Dynatrace. He has spent most of his career building monitoring tools and fine-tuning application performance. A regular conference speaker, blogger, author, and sushi maniac, Alois currently shares his professional time between Linz, Boston, and San Francisco.

@ThingsExpo Stories
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...