Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Diagnosing Application Failures in WebLogic

Diagnosing Application Failures in WebLogic

Imagine. You're designing and developing a highly complex Web-based application. This app will serve thousands, or even millions, of customers. It will be deployed on hundreds of servers. Web and application servers will interact with a multitude of third-party services, no doubt accessing internal and legacy applications as well as issuing queries to a variety of databases. Furthermore, this system will require a great many tools and frameworks (from different software vendors, no less) that will perform different tasks and interoperate with each other in a highly dynamic manner.

In a standard development cycle you'll do your best to collect all known and necessary requirements, perform system analysis, design database schemas, and define integration points. Throughout this cycle, of course, you follow existing best practice recommendations. All of this is a prelude to your application design. This process can require thousands of person-hours, and will have to be tracked through thousands of milestones, big and small. You are confident that all of the difficult decisions have been anticipated and ameliorated, that the compromises made by the myriad independent groups within your company and your outside development partners have been managed. All of us are very familiar with this process, one that requires equal parts genius and patience.

The application is then deployed.

Problems occur.

After the initial panic subsides, you begin the triage process. Unfortunately, you don't always know why the application failed, for whom it failed, or the exact nature of the failure. Your application users may be based primarily in Japan, but your servers are hosted in Arizona, the support team is on the U.S. East Coast, and development is in California. The simple fact is that even identifying exactly what happened could take days or weeks, not to mention determining the implications of the problem to the application's users.

WebLogic is one of the best homes for complex J2EE applications on the market today. It makes application development, deployment, and management very convenient. However, the application server itself can address only very common problems during the application life cycle. The reality of today's complex applications typically requires better diagnostic capabilities, particularly in post-production.

Problem
After going through this scenario more than once - being frustrated by bugs, glitches, service outages, and other sorts of problems - application architects will eventually decide to refocus their priorities to better anticipate production diagnostic. After all, the medical community has taught us that prevention is simpler than treatment. The same applies to software development. It is infinitely more efficient to prepare for problem resolution instead of waiting until problems occur. The question is, when entering the treatment phase, how can we avoid time-consuming application triage when that very application has already been shipped and is in use by customers?

This requires design of a diagnostic subsystem with the following characteristics:

  • Descriptive and unified logging for any kind of messages: This logging should cover not only the applications themselves, but also all services and components the application might access. Recording of such messages should be based on a common logging framework, application code instrumentation, and binary code instrumentation, as well as possible third-party component log data normalization to a format that makes it possible to correlate the messages that arrive from the native logging framework;
  • Good granularity of messages: When an error occurs it is very important to precisely identify the components that caused the problem, as well as the implications to the users affected by the service outage. It is very important to be able to troubleshoot each and every application module in great detail, despite the degree of complexity that is inherent in the app.
  • Highly flexible logging configuration: Easy-to-use and descriptive views into the data collected by different management components. During the troubleshooting phase, we have to keep analyzed data to manageable amounts. It is all too common to be buried by volumes of data, but still lose track of the pertinent information. Usability is a crucial issue in every application logging strategy success. Nobody wants to use disorganized data, inconveniently collected. For all application support layers it is important to intelligently escalate from triage to problem resolution, filtering out nonrelevant information. Ultimately, it is vital to deliver the relevant information, in a manageable amount, so next-stage support can begin working toward problem resolution with no additional requests.
  • Ability to support proactive problem identification and resolution scenarios: Through real-time alerts and notifications (e-mail, IM, etc.).
  • Strong integration of logging data: With incident tracking system, including descriptive reports and views.
  • Ability to support reactive problem mining: Running any business today requires compliance with high-quality standards that demand not only fixing a bug, but also determining the implications of such a bug. This includes uncovering each problem aspect, correlating the problem to all affected users, and providing necessary service to them. This goal is best addressed via highly tunable queries and data mining tools that will make all logging information collected by all your applications easily accessible.

    Solution
    The most common solution today to triage problem identification is that of generic logging. Typically, the application architect selects the logging strategy used by developers to issue logging messages to event recording subsystems directly from application code. It is possible to use publicly available logging frameworks, such as log4j or java.util.logging, or to develop one in-house. Regardless of the approach taken, it is necessary to unify this logging framework throughout all applications, as well as align output from different loggers into a common format. For example, using log4j it is easy to standardize logging infrastructure, create a loggers' hierarchy, extend the logging subsystem with appenders, and manage log sources. It is critical to have control over the output of each and every logger and to be able to feed them into a single searchable repository, along with preserving all-important structured data such as timestamps, server and application names, component names, and so on. Moreover, if the application uses third-party components and services it is good practice to make those logs searchable as well. In this way, it's possible to correlate problems the application reports with records in those logs.

    The question is: Is it possible to come up with logging strategy and logging messages in the application code once and forever? A well-designed logging subsystem cannot address all possible failure points in a complex application without the risk of being overwhelmed with logging data, or over-loading the application code with diagnostic instrumentation. Even prudently designed log levels would be insufficient in segregating valuable information related to the "noise"problem. Fortunately, it's possible to construct a dynamic logging solution using on-demand automatic code instrumentation. Currently, there is a wide variety of tools and frameworks available that allow different ways to instrument the code: from source code extensions by aspect-oriented pointcuts to bytecode patching using BCEL, or via other tools such as OC Systems' Aprobe. These technologies manage architect or developer concerns at almost every stage of the application life cycle by reducing logging data to desirable amounts. Using the tools mentioned above, or equivalent techniques, it's easy to develop a set of components (or probes) that will address different types of concerns within an application and apply them only when necessary and only to important application parts. In other words, instrumentation achieves necessary logging granularity on both component and code levels. At the same time, code complexity stays constant. Developers can, therefore, clearly separate diagnostic tasks from the actual business logic that makes those tasks loosely coupled within the application code.

    The other side of the problem within the enterprise infrastructure (and probably the most mysterious one) that we need to address is that of application user behavior. Typically, use cases describe numerous scenarios of how applications are intended to be used, but there are always unanticipated use cases - a user that finds a way to bypass tested paths and is met with an application error. That's why it's very important to incorporate user session context during any logging task design and implementation. Modern application servers are well designed to isolate concurrent users from each other along with performed activities. WebLogic Server proved to be the fastest J2EE server on the market, which means it efficiently manages underlying hardware resources such as memory, threads, and sockets. Poorly written applications, however, will seriously impact even the best application server performance.

    Starting with version 2.3, the servlet specification has a nice hint for a user context problem solution. In the chapter named "Filtering," we can get a good idea of logging and auditing filters. It's really up to architects, site administrators, and application support to determine how much information about user interactions is necessarily logged. Servlet 2.3 filters allow you to dump certain request fields and parameters, along with response body, without modifying any of your servlets and JSPs. For example, TeaLeaf Technology's J2EE filter collects all information about request parameters, attributes, and other request-specific data along with the whole response body and writes it to the TeaLeaf RealiTea server for storing and post-processing. Along with data collecting, the filter creates unique context identification IDs that can be reused by downstream components for binding additional logging data. Unique IDs create a context that can be used for grouping all logging activity across all application components, which will allow the user to isolate the session that caused the problem and replay the steps that led to it.

    Making All Logging Techniques Work Together
    Servlet 2.3 API filter
    .
    The Servlet 2.3 specification introduced filters as an integral part of the servlet container. While the specification itself states that filters are an ideal place for logging different sorts of user interaction, it is still difficult to find good examples of how to do this type of logging. Without diving into many details deserving of a separate article, our proposed logging framework is going to use Servlet 2.3 filters for the following purposes:

  • To log request information
  • To log response information
  • To introduce a unique hit ID that will be passed to WebLogic internal tracing API

    Each of these steps are fairly straightforward, based on collecting data by calling request object methods, wrapping response object methods and underlying output stream, generating globally unique hit IDs (based, perhaps, on unique hardware attributes where the program will be running and combining with the precise moment when the call occurred), and finally appending this ID to the running thread by calling the WebLogic tracing API.

    weblogic.trace.Trace.beginTrace(uniqueId); // uniqueId

    Above is the byte array that is created for each hit.

    WebLogic Tracing
    Internal WebLogic tracing allows the user to propagate any byte array data throughout calling chains inside a single JVM or across different JVMs. As I stated earlier, I will use this feature to associate each particular Web site hit with all calls that will be executed by downstream components while generating an appropriate response. In other words, it is a matter of a single function call from any method in your application to reach information about the exact Web hit that caused this method to run.

    byte uid[] = weblogic.trace.Trace.currentTrace();
    //retrieving hit id recorded on previous step

    Instrumentation
    It's now possible to add dynamicity to our logging infrastructure by using tools like BCEL, OC Systems' Aprobe, or AspectJ to actually instrument particular parts of your application with logging messages. Logging messages are required to fetch the hit ID from the tracing context described in previous sections and to add as much information as needed. Tools such as Aprobe or AspectJ allow defining pointcuts where a developer wants to insert logging messages in a useful manner. For example, in Aprobe the callback class supports the published interface and a list of methods that will be instrumented. In AspectJ, it is pointcut definition and code that will be inserted to specified methods.

    Glue
    The final, and most important, component of the whole infrastructure is the system that will store all data that will be logged by all of the loggers and make it searchable. You can select different strategies, from storing data in text files and then importing them into a relational database, to logging directly to a database or using optimized logging solutions that enable storing and querying this data in real time (such as TeaLeaf RealiTea). "Glue server" is very important because tracing the context we inserted with the servlet filter, propagated by WebLogic tracing and used by logging code, will become relevant only in a central location where you can correlate data that came from your servlet hosted on Machine A to data from an EJB that is being hosted on Machine B.

    Real-World Scenario.
    Let's imagine that our application is deployed to several cluster nodes. After running successfully for a period of time, we suddenly receive several serialization notifications from the replication subsystem.

    Those messages contain information that some object cannot be serialized. Using standard logs provides little insight into the exact context of this problem. We don't know what servlet or JSP code attempted to insert invalid attributes into session context or, more important, exactly which object is not serializable. To simulate the problem described above we can use a simple chunk of code that simply counts visits to a page; inserts a serializable object into session context each time it is accessed, increasing the counter; and only on hit #3 will it insert an object that has a nonserializable member:

    <%@ page import="dummy.*" %>
    <%
    Integer i = (Integer)session.getAttribute("counter");
    int iv = (i==null)?1:i.intValue()++;
    session.setAttribute("counter", new Integer(iv));
    if(iv==4) {
    session.setAttribute("BADAT
    TRIBUTE", new MyData());
    }
    %>
    <html><body><h1> Counter = <%=iv%></h1></body><html>

    Let's make our example a little more complex by making one of the MyData members (MyInternal) nonserializable:

    package dummy;
    import java.io.Serializable;
    public class MyData implements Serializable {
    String str = "dummy string";
    MyInternal mi = new MyInternal();
    }
    And MyInternal source code:
    package dummy;
    public class MyInternal {
    Object o = new Object(); // this is
    non-serializable member
    String str = "string";
    }

    Real-world situations can easily be even more complicated - dynamic page code could be much more complex; session attribute insertion could be less straightforward; the application could use a controller that glues together different presentation, model, and data access components. All that could tremendously complicate problem diagnosis and resolution.

    To solve this problem we can use TeaLeaf's filter component in conjunction with WebLogic's tracing capabilities and code instrumentation. The only steps required are to install the filter on our application (this step has to be performed only once per application), make WLS run using tracing (Dweblogic.TracingEnabled=true), and configure the instrumentation patch. The filter in this scenario is performing the following tasks:

  • HTTP traffic logging (request and response data, including HTML/XML, served back to the client);
  • Creating and binding tracing context for application servlet and downstream components that will be called from the servlet;
  • Feeding all data into TeaLeaf RealiTea Server, allowing the user session to be replayed and making the entire session searchable and accessible in realtime. Alternatively, the RealiTea Server could record all of the data into log file format locally on the server.

    WebLogic tracing provides logging code and instrumentation patches with unique context identifiers that can be accessed from any downstream component and that can propagate the context across JVM boundaries in case of RMI calls.

    For code patching in this example it is possible to use OCSystems' Aprobe, BCEL, or even AspectJ. The main idea is to add additional verification to the method that verifies each and every object before writing it to a stream (weblogic.common.internal.RemoteObjectReplacer.replaceObject) if it is serializable, and proceeding with event issuing if it is not:

    ...
    if(returnValue instanceof Serializable)
    return;
    ...

    In case of verification, it's necessary to extract the current call trace context using the WebLogic tracing API and join a logging message to the request/response data logged by the filter:

    byte b[] = weblogic.trace.Trace.currentTrace(); // get logging context
    if(b!=null && b.length==64) {
    sid = new String(b,0,32); // get unique session id
    hid = new String(b,32,32); // get unique hit id
    // record an event with your favorite logging API
    logger().error(sid, hid, "SERIAL_FAIL",
    SymbolTable.getPrintableMethodName(super.methodId));
    }
    ...

    Finally, we have to instrument the application and let it run. After executing a four-hit session we can find the following information logged (in the example I'm using RealiTea Viewer, TeaLeaf Event API and capture filter; see Figure 1).

    Now we find that the problem is happening exactly on hit number 4 of this session. We also clearly see that the object that cannot be serialized is an instance of MyInternal class, and that all request and response data sent to the WebLogic Server by the browser and generated by the servlet is captured automatically via the filter. In short, we can now easily find each and every user affected by the same problem after the fact (see Figure 2).

    Conclusion
    Having logging filters installed and running on your production servers and your centralized logging system, combined with custom logging instrumentation, will significantly reduce the time required to diagnose serious application problems. This process is further improved by constantly monitoring complex Web applications through the initiation of proactive alerts and notifications. On the other hand, a full record of user interactions will help address not only technical problems, but also the business issues and implications of how application failure affects all site users.

  • More Stories By Vitaliy Stulski

    Vitaliy Stulski is Java Developer and Architect at TeaLeaf Technologies, a provider of solutions to help businesses ensuring the accuracy of their Web applications.

    He has extensive experience in the design and development of large, highly scalable, distributed enterprise applications based on J2EE platform. Vitaliy also has several years of experience with BEA products, especially with WebLogic Application Server.

    Vitaliy holds a Master’s Degree in mathematics University in Minsk, Belarus.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    @ThingsExpo Stories
    "MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
    "Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    "Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
    "IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
    Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
    "Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
    Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
    Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
    "There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
    SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
    SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
    It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
    WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
    A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
    SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
    Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
    To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
    An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics gr...