Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Java IoT

Java IoT: Blog Post

Hunting Lost Treasures: Understanding and Finding Memory Leaks

Week 5 of our 2010 Application Performance Almanac

Searching for memory leaks can easily become an adventure – fighting through a jungle of objects and references. When the leak occurs in production time is short and you have to act fast. Like in a treasure hunt, we have to interpret signs, unravel mysteries to finally find the “lost” memory.

Memory leaks – together with inefficient object creation and incorrect garbage collector configuration – are the top memory problems. While they are a typical runtime problem, their analysis and resolution worries developers. Therefore I will focus in this post on how to analyze memory problems by covering how to find those problems and providing some insights into the anatomy of memory leaks.

Packing Our Equipment
What do we need for effective memory diagnosis? We need a heap analyzer for analyzing heap content and a console to collect and visualize runtime performance metrics. Then we are well-equipped for our expedition. Which tools you are going to choose depends on the platform you are using, the money you want to spend and your personal preferences. The range goes from JVM tools, to open source tools to professional performance management solutions.

The Heap Dump
A heap dump allows us to get a snapshot from the JVM memory to analyze its content. Heap dumps can be triggered in multiple ways. There are JVM parameters like XX:+HeapDumpOnOutOfMemoryError which will trigger a heap dump in case of an OutOfMemoryError. Unfortunately this option is not enabled by default; however, I recommend switching it on by default. There is nothing more frustrating than trying to reproduce a problem just because you failed to get all necessary information upfront. Alternatively you can also trigger a heap dump while the JVM is running. These tools use the JVM Tooling Interface (JVMTI) to retrieve this information from the JVM.

The biggest issue with Heap Dumps is that their format is not standardized and is different between different JVMs. JSR 326 is working on a standardized way to access heap information. Defining a standardized API to access heap dump information should enable the use of a single tool to work with different heap dump formats. If you cannot wait for the JSR to be implemented you have to choose a tool which supports the required formats or use tools like dynaTrace which access the heap information directly and therefore work across JVM implementations.

What We Get
The information within the heap dump may also vary based on the JVM as well as the JVM version. However there is certain information which is contained in every heap dump. We get information about the objects – their classes – as well as references on the heap. Additionally we get information about the size of an object. This size is often referred to as shallow size – the size of the object itself without any referenced data structures. Newer JVMs additionally support the collection of values for primitive data types like Strings or Integers as well. Some tools also indicate the number of survived Garbage Collection cycles by performing special heap profiling.

Depending on the size of your JVM’s heap, the amount of information can be huge. This affects the heap dump creation time as well as the processing time of the dump; needless to mention that the analysis itself gets more complex. Therefore some tools provide means to collect only the number of objects. While providing less detail, this approach has the advantage of being much faster. By creating a series of dumps over time and then comparing the object counts of the dumps, we can immediately see which objects are growing.

Naturally this will show a lot of primitive data types and a number of classes we might have never seen before because they are internal to the JDK or other libraries we are using. We skip those classes and look for objects of our own classes which grow. This already provides a good indication of a potential memory leak. If we then additionally can see the allocation stack of these object, we might be able to identify the memory leak without even having to analyze a full heap dump.

JVM Metrics
In addition to heap dumps we will also use JMX-based memory metrics of the JVM to monitor the heap at runtime. Based on the changes of memory consumption over time we can see whether there is a memory leak at all. Monitoring memory usage of the JVM is essential in any diagnosis process. These metrics should – no, must – be collected by default during load tests and also in production. Relating this metric to monitoring data – like the types of request at a certain time – will also be a good indicator for potential memory problems. While monitoring will not prevent you from running into OutOfMemoryErrors, it can help to proactively resolve performance problems.

I recall a customer situation where we have seen sudden spikes in heap usage. While they never caused an OutOfMemoryError they still made us feel uncomfortable. We then correlated this information to other monitoring data to find out what was different when the spikes occurred. We then realized that there were some bulk processing operations going on. Diagnosing this transaction we realized that submitted XML was transformed into a DOM tree for further processing. As processing time depended on the amount of data, these objects potentially stayed in memory for minutes – or longer. The issues could then be fixed, tested and deployed into production without users every being affected by it.

The only potential shortcoming of monitoring heap usage is that slowly-growing memory leaks might be more difficult to spot. This is especially true if you happen to look at the data in the wrong granularity. In order to overcome this issue I use two different charting intervals; the last 32 days for visualizing long term trends and the last 72 hours for short-term and more fine-granular information.

Besides potential memory leaks JVM metrics also help us to spot potential Garbage Collector configuration problems. Our primary metrics are the number and the time of Garbage Collections.

Let’s Go Hunting
As I’ve already discussed in another post, memory leaks in Java are not “classical” leaks. As the Garbage Collector automatically frees up unreferenced objects, it has taken this burden away from us. However we as developers have to ensure that all references to objects are freed up if we no longer need them. While this sounds very simple it turns out to be quite difficult in reality.

Looking a bit closer at the problem we realize that it is a specific kind of reference which causees memory leaks. Every object we allocate within the scope of our execution will be freed up automatically after leaving the method scope. So memory leaks are caused by references which exist beyond our current execution scope like Servlet sessions or caches and any objects stored in static references.

A central concept in understanding the origins of memory leaks is Garbage Collection roots. A GC root is a reference which only has outgoing and no incoming references. Every object on the heap has at least one GC root. If an object is no longer referenced by a GC root it is marked as unreachable and ready for Garbage Collection. There are three main types of GC roots.

  • Temporary variables on stack of threads
  • Static fields of classes
  • Native references in JNI
Garbage Collection Roots

Garbage collection roots and other heap objects

A single object however will not cause a memory leak. For the heap to fill up continuously over time we have to add more and more objects over time. Collections are the critical part here as they allow us to grow continuously over time, while holding an ever-increasing number of references. So this means that most memory leaks are caused by collections which are directly or indirectly referenced by static fields.

Enough of theory; let’s look at an example. The figure below shows the reference chain of a HTTP Session object – specifically its implementation in Apache Tomcat. The session object is key in ConcurrentHashmap which is referenced by the ThreadLocal storage of the Servlet threads. They are then kept within a Thread array, which is again part of a ThreadGroup. The ThreadGroup is then referenced by the Thread class itself. You can see even more details looking at the figure below.

Heap Root Walk of an HTTP Session Object

Heap Root Walk of an HTTP Session Object

This shows that most memory problems can be tracked back to a specific object on the heap. In memory analysis you will in this context often hear about the concept of dominators or the dominator tree.

The concept of a dominator comes from graph theory and is defined as follows: A node dominates another node if it can only be reached via this node. For memory management this means that A is a dominator of B if B is only referenced by A. A dominator tree is then a whole tree of objects where this is true for the root object and all referenced objects. The image below shows an example of a dominator tree. (You might want to get a coffee now and think about this :- )).

Dominator Tree Example

Dominator Tree Example

In case there are no more references to a dominator object all referenced objects will be freed up as well. Large dominator trees are therefore good candidates for memory leaks.

Post Mortem versus Runtime Analysis

When diagnosing memory leaks we can basically follow two approaches. Which one to choose depends mostly on the situation. In case we already ran into an OutOfMemoryError we can only perform a post-mortem analysis, if we started our JVM with the proper JVM argument as stated above. While this option has been added in Java 6, JVM vendors have back-ported this functionality also in older JVM versions. You should check whether your JVM version supports this feature.

The “advantage” of post-mortem memory dumps is that the leak is already contained in the dump and you need not spend a lot of time reproducing it. Especially in case of slowly-appearing memory leaks or problems which occur just in very specific situations, it can become close to impossible to reproduce the problem. Having a dump available right after the error occurred can save a lot of time (and nerves).

The biggest disadvantage – besides crashing a production system – is that you will miss a lot of additional runtime information. The dominator tree however is highly valuable to find the objects responsible for the memory leak more or less easily. This information combined with good knowledge of the source code often helps to resolve the problem.

Alternatively, continuously increasing memory consumption already indicates a memory leak. Well, this does not change the situation that the JVM would crash eventually, but we can already start to search for the leak proactively. Additionally we can prevent users from being affected by the memory leak by restarting the JVM for example.

As creating these heap dumps means that all running threads have to be suspended, it is good advice to redirect user traffic to other JVMs. Very often the collected data will be sufficient for identifying the leak. Additionally we can create a number of snapshots to identify objects growing continuously. Solutions like dynaTrace additionally allow tracking the size of specific collections including information where they have been instantiated. This information very often helps experienced developers to identify the problem without extensive heap analysis.

Size Does Matter
A central factor in heap dump analysis is the heap size. Bigger does not mean better. 64bit JVMs represent a special challenge here. The huge number of objects results in more data to be dumped. This means that dump take longer and more space is required for storing the dump output. At the same time analysis of dumps takes longer as well. In particular algorithms for calculating garbage collection sizes or dynamic sizes of objects show decreasing runtime performance for bigger heaps. Some tools – at least in my experience – already have problems even opening dumps bigger than about 6 GB. The generation of heap dumps also requires memory within the JVM itself. In the worst case this can mean that the generation of a dump is no longer possible at all. The main reason lies within the implementation of the JVMTI heap dump methods.

First every object needs a unique tag. This tag is later used to analyze which objects are referenced by others. The tag is of the JNI type jlong which is 8 bytes in size. On top of that there is also the memory consumption of JVM internal structures. The size of these structures depends on the JVM implementation and can be up to 40 bytes per object. This is why we at dynaTrace specifically focus on supporting the analysis of bigger and bigger heap dumps.

The general advice is to work with smaller heaps. They are easier to manage and in case of errors easier to analyze. Memory problems also show up faster than in large JVMs. If possible it is better to work with a number of smaller JVMs instead of one huge single JVM. If however you have to work with a large JVM it is indispensible to test in advance whether it is possible to analyze a memory dump. A good test is to create a heap dump from a production-sized instance and calculate the GC size of all HTTP sessions. In case you have problems solving this simple problem, you should either upgrade your tooling or decrease your heap size. Otherwise you might end up in a situation where you have no means to diagnose a memory leak in your application.

Prevention
The best memory leak is the one you do not have. So the best approach is to already test during development for potential memory leaks. The best means are long-running load tests. As our goal is less about getting performance results but rather finding potential problems we can work with smaller test environments. It might even be enough to have the application and the load generator on the same machine. We should however ensure that we cover all major use cases. Some memory leaks, however, might only occur in special situations and are therefore hard to find in testing environments. Regularly capturing heap dumps during the test run and comparing them to find growing objects, however, helps to identify potential leaks.

Comparison of Heap Dumps over Time

Comparison of Heap Dumps over Time

Conclusion
Memory leaks are amongst the top performance-related problems in application development. At the beginning analysis might look extremely complex. However a proper understanding of the “anatomy” of a memory leak helps to find those problems easily, as they follow common patterns. We however have to ensure that we can work with the information when we need it. This means dumps have to be generated and we must be able to analyze them. Long term testing also does a good job in finding leaks proactively. Increasing the heap is not a solution at all. It might even make the problem worse. There are a lot of tools out there that support in memory analysis; each one with their strength and weaknesses. I might be a bit biased here, but for a general overview of available functionality I recommend looking at memory diagnosis in dynaTrace . It provides a good overview of different approaches towards memory analysis.

Credits
This article is based on the performance series I did with Mirko Novakovic of codecentric.  Mirko also did a great post on OutOfMemoryErrors!

Related reading:

  1. SharePoint: Identifying memory problems introduced by custom code SharePoint is a great platform that makes it easy to...
  2. Can you trust your .NET Heap Performance Counters? Memory Management is a tough topic in managed runtime environments...
  3. Java Memory Problems Memory Leaks and other memory related problems are among the...
  4. Memory Leak in EntityDataSource when controlling lifetime of your ObjectContext The EntityDataSource is a control you can use on your...
  5. .NET Performance Analysis: A .NET Garbage Collection Mystery Memory Management in .NET is a broad topic with a...

More Stories By Alois Reitbauer

Alois Reitbauer is Chief Technical Strategist at Dynatrace. He has spent most of his career building monitoring tools and fine-tuning application performance. A regular conference speaker, blogger, author, and sushi maniac, Alois currently shares his professional time between Linz, Boston, and San Francisco.

@ThingsExpo Stories
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
BnkToTheFuture.com is the largest online investment platform for investing in FinTech, Bitcoin and Blockchain companies. We believe the future of finance looks very different from the past and we aim to invest and provide trading opportunities for qualifying investors that want to build a portfolio in the sector in compliance with international financial regulations.
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Imagine if you will, a retail floor so densely packed with sensors that they can pick up the movements of insects scurrying across a store aisle. Or a component of a piece of factory equipment so well-instrumented that its digital twin provides resolution down to the micrometer.
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
We are given a desktop platform with Java 8 or Java 9 installed and seek to find a way to deploy high-performance Java applications that use Java 3D and/or Jogl without having to run an installer. We are subject to the constraint that the applications be signed and deployed so that they can be run in a trusted environment (i.e., outside of the sandbox). Further, we seek to do this in a way that does not depend on bundling a JRE with our applications, as this makes downloads and installations rat...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
The IoT Will Grow: In what might be the most obvious prediction of the decade, the IoT will continue to expand next year, with more and more devices coming online every single day. What isn’t so obvious about this prediction: where that growth will occur. The retail, healthcare, and industrial/supply chain industries will likely see the greatest growth. Forrester Research has predicted the IoT will become “the backbone” of customer value as it continues to grow. It is no surprise that retail is ...