Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic, Java IoT

Weblogic: Blog Feed Post

Java Memory Problems

Memory problems in Java applications are manifold

Memory Leaks and other memory related problems are among the most prominent performance and scalability problems in Java.  Reason enough to discuss this topic in more detail.

The Java memory model- or more specifically the garbage collector –  has solved many memory problems. At the same time new ones have been created. Especially in J EE Environments with a large number of parallel users, memory is more and more becoming a critical ressource. In times with cheap memory available, 64bit JVMs and modern garbage collection algorithms this might sound strange at first sight.

So let us now take a closer look at Java memory problems. Problems can be categorized into four: groups

  • Memory leaks in Java are created by referencing objects that are no longer used. This easily happens when multiple references to objects exist and  developer forget to clear them, when the object is no longer needed.
  • Unnecessarily high memory usage caused by implementations consuming to much memory. This is very often a problem in web applications where a large amount of state information is managed for “user comfort”. When the number of active users increases, memory limits are reached very fast. Unbound or inefficiently configured caches are another source of constant high memory usage.
  • Ineffizient object creation easilty results in a performance problem when user load increases, as the garbage collector must constanly clean up the heap. This leads to unnecessarily high CPU consumption by the garbage collector. As the CPU is blocked by garbage collection, application response times increases often already under moderate load. This behaviour is also referred to as GC trashing.
  • Inefficient garbage collector behaviour is caused by missing or wrong configuration of the garbage collector. The garbage collector will take care that object are cleaned up. How and when this should happen must however by configured by the programmer or system architect. Very often people simple “forget” to properly configure and tune the garbage collecotr. I was involved in a number of performance workshops where a “simple” parameter change resulted in a performance improvement of up to 25 percent.

In most cases memory problems affect not only performance but also scalability.  The higher the amount of consumed memory per request, user or session the less parallel transactions can be executed. In some cases memory problems also affect availabilty. When the JVM runs out of memory or it is close to memory limits it will quit with an OutOfMemory error. This is when management enters your office and you know you are in serious trouble.

Memory problems are often difficult to resolve for two reasons: In some case analysis will get complex and difficult – especially if you are missing the right methodology to resolve them. Secondly their foundation is often in the architecture of the application. Simple code changes will not help to resolve them.

In order to make life easier I present a couple of memory antipatterns which are often found in real world world applications. Those patterns should help to be able to already avoid memory problems during development.

HTTP Session as Cache

This antipattern refers to the misuse of the HTTPSession object as a data cache. The session object serves as means to store information which should “survive” a single HTTP request. This is also referred to a as conversational state. Meaning data is stored over a couple of requests until it is finally processed. This approach can be found in any non-trivial web application. Web applications have no other means than storing this information on the server. Well, some information can be put into the cookie, but this has a number of other implications.

It is important to keep as few data as possible and as short as possible.  It can easily happen that the session contains megabytes of object data.  This immediately results in high heap usage and memory shortages. At the same time the number of parallel users is very limited. The JVM will respond to an increasing number of users with an OutOfMemoryError. Large user sessions have other performance penalties as well. In case of session replication in clusters increased serialization and communication effort will result in addtional performance and scalability problems.

In some projects the answer to this kind of problems is increasing the amount of memory and switching to 64bit JVMs. They cannot resisit the temptation of just increasing heap size up to several gigabytes. However this is often only hiding symptons than providing a cure to the real problem. This “solution” is only temporal and also introduces a new problem. Bigger and bigger heaps make it more difficult to find “real” memory problems.  Memory dumps for very large heaps (greated 6 gigabytes) cannot be processed by most available analysis tools.  We at dynaTrace invested a lot of R&D effort to be able to efficiently analyze large memory dumps. As this problem is gaining more and more importance a new JSR specification is also adressing it.

Session caching problems often arise because the application architecture has not been clearly defined.  During development data is simply put into the session as it is comfortable. This very often happens in an “add and forget” manner, as nobody ensures that this data is removed when no longer needed. Normally unneeded session data should be handled by the session timeout. In enterprise applications which are constantly under heavy use the session timeout, this will not work. Additionally very often very high session timeouts are used – up to 24 hours – to provide additional “comfort” to users so that they do not have to relogin.

A practical example is putting selection choices from list, which have to be fetched from the database, in the session.  The intention is to avoid unnecessary database queries. (Smells like premature optimization – doesn’t it).  This results in several kilobytes being put into the session object for every single user. While it is reasonable to cache this information the user session is definitely the wrong place for it.

Another example is abusing the Hibernate session for managing conversational state. The Hibernate session object is simply put into the HTTP session to be have fast access to data.  This however results in much more data to be stored as necessary and the memory consumption per users rises significantly.

In modern AJAX applicatons conversational state can also be managed at the client side. Ideally this leads to a stateless or nearly stateless server application which also scales signifcantly better.

ThreadLocal Memory Leak

ThreadLocal variables are used in Java to bind variables to a specific thread. This means every thread gets it’s own single instance. This approach is used to handle status information within a thread. An example would be user credentials. The lifecycle of a ThreadLocal variable is however related to the lifecycle of the thread.  ThreadLocal variables are cleaned up when the thread is terminated and removed by the garbage collector –  if not explicitly removed by the programmer.

Forgotten ThreadLocal variables can especially in application servers easily result in memory problems.  Application servers uses  ThreadPools in avoid constant creation and destruction of threads. An HTTPServletRequest for example gets a free thread assigned at runtime, which is passed back to the ThreadPool after execution. If the application logic uses ThreadLocal variables and forget to explicitly remove them, the memory will not be freed up.

Depending on the pool size – in production systems this can be several hundret threads – und the size of the objects reference by the ThreadLocal variable this can lead to problems. A pool of 200 threads and a ThreadLocal size of 5MB will in the worst case lead to 1 GB of unnecessarily occupied memory. This will immediately result in high GC activity leading to bad response times and potentially to an OutOfMemoryError.

A practical example was a bug in jbossws version 1.2.0 which was fixed in version 1.2.1 – “DOMUtils doesn’t clear thread locals”.  The problem was a ThreadLocal variable which referenced a parsed document having a size of 14 MB.

Large Temporary Objects

Large temporary objects can in the worst case also lead to OutOfMemoryErrors or at least to high GC activity. This will for example happen if very big documents (XML, PDF, images, …) have to be read and processed. In a specific case the application was not responsive for a couple of minutes or performance was so limited that it was not practically usable.  The root cause was the garbage collection going crazy. A detailed analysis lead down to the following code for reading a PDF document.

byte tmpData[] = new byte [1024];
int offs = 0;
do{
  int readLen = bis.read (tmpData, offs, tmpData.length - offs);
  if (readLen == -1)
      break;
  offs+= readLen;
  if (oofs == tmpData.length){
    byte newres[] = new byte[tmpData.length + 1024];
    System.arraycopy(tmpData, 0, newres, 0, tmpData.length);
  tmpData = newres;
  }
} while (true);

The documents which have been read using the method had a size of several megabytes. They were read into the bytearray and then send to the user’s browser. Several parallel requests rapidly led to a full heap.  The problem got even worse due to highly inefficient algorithm for reading the document. The idea is that an intial byte array of one KB is created. If this array is full a new array which is 1 KB large is created and the old array is copied into the new one.  This means when reading the document a new array is created and copied  for each KB read.  This results in a huge number of temporary objects and a memory consumption which is two times the size of the actual amount of data – as the data is permantently copied.

When working with large amounts of data, optimization of the processing logical is crucial to performance. In this case a simple load test would have unvealed the problem.

Bad Garbage Collector Configuration

In the scenarios presented so far the problem was caused by the application code. In a lot of cause the root cause however is wrong – or missing – configuration of the garbage collector. I frequently see people trusting the default settings of their application servers and believing these application server guys know best what is ideal for their application.  The configuration of the heap however strongly depends on the application and the actual usage scenario. Depending on the scenario parameters have to adopted to get a well performing application.  An application processing a high number of short lasting  requests has to be configured completely different than a  batch application, which is execution long lasting tasks. The actual configuration additionally also depends from the used JVM. What works fine for Sun JVM might be a nightmare for IBM (or at least not ideal).

Misconfigured garbage collectors are often not immediately identified as the root cause of a performance problem (unless you monitor GC acitvity anyway).  Often the visual manifestion of problems are bad response times. Understand the relation of garbage collector activity to response times is not obivous. If garbage collector times cannot be correlated to response times, people find themselves hunting a very complex performance problem. Response times and exeution time metric problems will manifest across the applications – at different places without an obvious pattern behind this phenomenon.

The figure below shows transaction metrics correlated with garbage collection times in dynaTrace . I found cases where optimizations in garbage collector settings solved performance problems in minutes which people were hunting for weeks.

Transaction Times correlated to Runtime Suspensions

Transaction Times correlated to Runtime Suspensions

ClassLoader Leaks

When talking about memory leaks most people primarily think about objects on the heap. Besides objects, classes and constants are also managed on the heap. Depending on the JVM they are put into specific areas of the heap.  The Sun JVM for example uses the so called permanent generation or PermGen. Classes very often are put on the heap several times. Simply because they have been loaded by different classloaders.  The memory occupation of loaded classes can be up to several hundret MB in modern enterprise applications.

Key is to avoid unecessarily increasing the size of classes. A good sample is the definition of large amount of String constants – for example in GUI applications. Here all texts are often stored in constants. While the approach of using constants for Strings is in principle a good design approach, the memory consumption should not be neglected. In a real world case all constants where define in one class per language in an internationalized application. A not obviously visibile coding error resulted in all of this classed being loaded. The result was a JVM crash with an OutOfMemoryError in the PermGen of the application.

Application servers suffer additional problems with classloader leaks. These leaks are causes as classloaders cannot be garbage collected because an object of one of the classes of this classloader is still alive. As a result memory occupied by these classes will not be freed up. While this problem today is handled well by J EE application server,  it seems to appear more often in OSGI-based application environments.

Conclusion

Memory problems in Java applications are manifold and easily lead to performance and scalability problems. Especially in J EE applications wiht a high number of parallel users memory management must be a central part of the application architecture.

While the garbage collector takes care that unrefenced objects are clean up, the developer still is responsible for proper memory management. In addtion to application design memory management is a central part of application configuration.

Credits

This article is based on the Performance Antipatterns Series I am working on together with Mirko Novakovic of codecentric.

Related posts:

  1. SharePoint: Identifying memory problems introduced by custom code SharePoint is a great platform that makes it easy to...
  2. Performance Analysis: How to identify “bad” methods messing up the GC Whenever the Garbage Collector kicks in to free up memory...
  3. .NET Performance Analysis: A .NET Garbage Collection Mystery Memory Management in .NET is a broad topic with a...

 

Read the original blog entry...

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@ThingsExpo Stories
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search and...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
SYS-CON Events announced today that Outscale will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outscale's technology makes an automated and adaptable Cloud available to businesses, supporting them in the most complex IT projects while controlling their operational aspects. You boost your IT infrastructure's reactivity, with request responses that only take a few seconds.
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo Silicon Valley Call for Papers is now open.
As cloud adoption continues to transform business, today's global enterprises are challenged with managing a growing amount of information living outside of the data center. The rapid adoption of IoT and increasingly mobile workforce are exacerbating the problem. Ensuring secure data sharing and efficient backup poses capacity and bandwidth considerations as well as policy and regulatory compliance issues.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
SYS-CON Events announced today that DivvyCloud will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. DivvyCloud software enables organizations to achieve their cloud computing goals by simplifying and automating security, compliance and cost optimization of public and private cloud infrastructure. Using DivvyCloud, customers can leverage programmatic Bots to identify and remediate common cloud problems in rea...
SYS-CON Events announced today that Tintri, Inc, a leading provider of enterprise cloud infrastructure, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Tintri offers an enterprise cloud platform built with public cloud-like web services and RESTful APIs. Organizations use Tintri all-flash storage with scale-out and automation as a foundation for their own clouds – to build agile development environments...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs ofte...
SYS-CON Events announced today that Tappest will exhibit MooseFS at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. MooseFS is a breakthrough concept in the storage industry. It allows you to secure stored data with either duplication or erasure coding using any server. The newest – 4.0 version of the software enables users to maintain the redundancy level with even 50% less hard drive space required. The software func...
SYS-CON Events announced today that Interoute has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Interoute is the owner operator of Europe's largest network and a global cloud services platform, which encompasses over 70,000 km of lit fiber, 15 data centers, 17 virtual data centers and 33 colocation centers, with connections to 195 additional partner data centers. Our full-service Unifie...
SYS-CON Events announced today that EARP will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "We are a software house, so we perfectly understand challenges that other software houses face in their projects. We can augment a team, that will work with the same standards and processes as our partners' internal teams. Our teams will deliver the same quality within the required time and budget just as our partn...
SYS-CON Events announced today that Carbonite will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Carbonite protects your entire IT footprint with the right level of protection for each workload, ensuring lower costs and dependable solutions with DoubleTake and Evault.
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...