Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Finding Production System Performance Problems

Finding Production System Performance Problems

This article demonstrates how Wily Technology's Introscope can be used to reach accurate conclusions to resolve a typical Java application performance problem. The article will be useful for architects, operations managers, testers, and developers responsible for WebLogic application performance and will give readers a better understanding of practical approaches to analyzing, improving, and managing production performance, without developing monitoring code by hand.

A business-to-business catalog and ordering system had been running in production for several months without much promotion and had served increasing numbers of customers reliably and quickly.

Recently, though, new features of the system, such as the ability to evaluate alternative items while browsing the catalog, had been promoted by the marketing organization, which caused more intensive use of the system. Unfortunately, users complained about slow performance of the application while searching and browsing for items as well as while building an order.

Operations and business managers came to the development organization with a critical mission to resolve these performance problems quickly. While developers had many hypotheses about the source of the problem, they were frustrated that they had no effective way to find it. The company chose to use Wily's Introscope to tackle their problem.

Using Introscope in the load-testing environment the development group identified a number of bottlenecks and eliminated other possible causes of the performance problems. With this information, developers could quickly implement focused changes in the application code to improve the system's performance and eliminate the performance problems. The operations and development groups also realized that using Introscope to monitor applications in production would make both groups more productive, allowing them to avoid and fix performance problems before they adversely affected their customers.

The System and Its Symptoms
Understanding the system

This business-to-business system is a moderately complex application developed by a third party as a work-for-hire and turned over to the company to manage and maintain. The System Architecture diagram (Figure 1) gives a graphic representation of the parts of the system and how they interact. All of the components in the diagram are Java components.

The Controller receives requests for application services from the system and routes them to the appropriate components. It also manages user permissions and profiles, relying on an external Authentication Service.

The Catalog Browser implements business logic relating to searching for and browsing items. It relies on the Customers subsystem to retrieve information related to customers, such as preferred brands and pricing models; the Item Catalog subsystem for detailed item information, including pictures and descriptions, pricing, and combinations; and the Inventory system to show item availability while reviewing product information.

The Order Builder might also be called the shopping cart component. It combines user selections of products, quantities, and destinations with the Customers' pricing models, Item Catalog's prices and combinations, and the Inventory's on-hand information to create an order.

When an order is built, the Order Placer does the work of committing the order. It checks a Credit Verification Service if the customer's profile indicates it, updates the Customers and Inventory subsystems with the new order information, and has the Orders subsystem begin the process of fulfilling the order and billing the customer.

The Customers, Item Catalog, Inventory, and Orders components map the system's Java representations onto the back-end systems' representations and act as clients of the back-end systems.

The Customer System, Item Database, Inventory System, and Order System exist in the Java system as connectors to those back-ends provided by their suppliers. The actual back-end systems run on different platforms and are shared by different systems, such as point-of-sale systems.

Seeing the symptoms but not the cause(s)
The development organization has a load-testing environment, which allowed them to reproduce the problems that only appear under heavy load. However, because the system has many interacting components, which could not be measured directly, they could not isolate the cause of the problem, only verify its existence under certain conditions.

The back-end systems used by the production application are also used by the load-testing environment, as well as by many other systems, such as point-of-sale and telesales systems. Because they are shared, moderately used, and performing reliably, monitoring them directly does not provide useful information to troubleshoot the Java application's problems.

Individual developers frequently use a profiler to get very detailed information about the code they have written. However, because running the application with the profiler slows the application tremendously and produces enormous amounts of trace data, it has not proven useful to understand this production performance problem under load.

The development organization considered writing their own logging code into the application, but they decided not to for several reasons. Foremost is the great cost to the development organization in taking their developers' time away from creating new code to provide business value and occupying it with writing troubleshooting code, and building the infrastructure to store and present it. Other reasons included the risks of introducing and managing code changes in so many parts of the application and the damage to developers' morale from being assigned "grunt" work.

At the same time, there were many hypotheses about the source or sources of the performance problems. Some said it was the back-end systems' slow response, others believed that faster server hardware was needed, and others attributed the problem to the Web server.

In short, the development organization needed a way to get component-level performance information from the application while running under load without substantially changing its performance characteristics and without having to write it themselves. They also needed all this right away, since customer complaints were increasing daily. They discussed these needs with their BEA representatives who suggested that they contact Wily Technology (a BEA partner) about their product Introscope.

Introscope
Component-level monitoring

Introscope provides component-level performance information about live production Java applications as well as applications running under load in a testing environment. It monitors any Java application running in any contemporary JVM (JDK 1.1.3 or later) on any hardware and operating system platform. Moreover, installation and administration of Introscope with WebLogic Server 5.1 and later is particularly easy with a feature called AutoProbe Integration.

Out of the box, Introscope monitors many common Java and J2EE components such as servlets, JSPs, EJBs, and JDBC and Socket activity. In addition, users can configure Introscope to monitor any class or method that they have built themselves or integrated from a third party using the Custom Tracing features. Users can also change the components they are monitoring even after the application is deployed, as their needs for performance information change. More importantly, monitoring choices are made without the need to access or change source code.

Introscope measures the average response time and the responses per second for most of the components it monitors. For other components, measurements include the bytes per second coming into or leaving the Java system and CPU utilization. In addition to each component's individual performance information, it keeps track of component interaction and attributes the performance of each to the component that caused or called it (a feature called "Blame Technology"). These performance measurements are useful for understanding how an application's components are performing while under load. The "Blamed" measurements make bottlenecks in component interactions easy to identify.

Introscope uses a number of techniques to ensure that the overhead of collecting performance information remains low. Introscope is selective about the components it monitors, and places lightweight monitors on relatively heavyweight component activity. The Introscope Agent collects summary information about component performance and reports that information asynchronously to a separate Enterprise Manager component, which handles more CPU-intensive tasks such as storing the data and making data available to the Workstation, Introscope's GUI.

Historical data stored for analysis and reports
Introscope stores performance data in a JDBC-accessible database and/or comma-separated value (CSV) text files. The user controls exactly which data is stored and the frequency at which it is recorded. Once stored, the historical data can be viewed in the Workstation, or by using any technique that can query or report on the JDBC database or CSV files. Introscope includes sample component performance, service-level, and capacity-planning Crystal Reports.

Alerts for operations
Since Introscope is designed to manage Java systems in production, it can perform actions when performance measurements cross user-defined thresholds. Actions commonly triggered by Alerts include sending an e-mail, showing a dialog box in the Workstation, sending a message to a pager, writing to a log file, reconfiguring or restarting an application, and sending a message to another enterprise management system. In addition, an Introscope Alert can trigger any executable or shell script.

Customizable views
The Workstation is an application that allows users to view and manage their systems' component performance. Particularly useful is the ability for users to create customized Dashboards to present performance data graphically for different users' needs. One Dashboard might show an overview of a system with colored lights indicating system status, while another might show detailed performance information for a component and the services it uses.

Monitoring clusters
Application instances can be monitored individually or as a cluster. Many Agents (whether on one machine or many and whether in a cluster or working as different tiers) can report performance information to the same Enterprise Manager. The data from each Agent is handled separately but can easily be monitored by an Alert or displayed together in a Dashboard. Additionally, aggregates for a cluster can easily be set up, for example, to provide the combined average response time for a Servlet in all instances in a cluster.

The Approach
The development and operations groups arranged for a Wily performance consultant to come in for five days and work with Introscope on their system in their environment. The goals of the work were to improve the system's performance and understand how Introscope can manage this and other systems in production.

Installing Introscope with AutoProbe
On the first day, Introscope was set up quickly on the WebLogic machines in the load-testing environment by using the AutoProbe Integration feature with WebLogic Server. The Enterprise Manager was installed on a separate, shared, low-end box in the testing environment. An existing database server had database structures and a user added for Introscope to use. The Workstation was installed on several machines, including one in the testing environment, one in the operations management environment, and two in the development organization. That afternoon, the team was already viewing live component information with Introscope.

Customizing the monitoring environment
As with most systems, this one is made up of both J2EE components (which Introscope monitors out-of-the-box) and a number of custom components (which must be configured for Introscope to monitor). Introscope uses text files to configure which custom components to monitor, referencing the package, class, and method names of the primary ways the components are accessed. Based on discussions with the company's system architect, the package, class, and method names for Business Logic, External Service Provider, and Business Data Access Components were collected and used to create directive files for Introscope. This code snippet is a line from a custom directives file:

TraceOneMethodOfClass: com.company.onlinesales.logic.OrderBuilder addItem BlamedMethodTimer "Business Logic|Order Builder:Average Response Time (ms)"

The application server was restarted with this updated configuration and the performance information about these components became visible in Introscope.

Monitoring components and interactions under load
Moderate load was run against the system to begin to understand what performance information could be displayed and how it is represented.

From the load generator's point of view, the application behaved the same as before Introscope was introduced. There was no discernible difference in the performance of the application running under load with Introscope. This finding was crucial. First, the development organization needed to be confident that analysis with Introscope would not change the nature of the performance problem. Second, the operations group and system business sponsor would balk at the possibility of introducing large overhead, which would require additional server investment.

In Introscope, a component hierarchy of the application is visible in the Explorer window. It shows both the performance of individual components specified earlier in the configuration, as well as the performance of related components that are involved during the course of a component's work.

Figure 2 shows the performance information for the Catalog Browser and the Inventory components by themselves, as well as the Item Catalog performance when it is working on behalf of the Catalog Browser. As previously mentioned, Introscope's ability to associate performance information about one component with other components is called Blame Technology. Because the Inventory component works for several other components, it is extremely useful to be able to differentiate the performance of each component by its context. Without this feature, it would be difficult to find problems and bottlenecks that are caused by particular components' interactions rather than their aggregate performance. Another useful aspect of the Blame Technology is that Introscope does not have to be configured in advance as to which interactions to monitor.

Browsing Introscope's Explorer tree confirms that under heavy load the Catalog Browser and Order Builder components respond slowly. New information is now apparent: the Item Catalog and Inventory components are busier and slower, while other components do not appear to slow down much. Figure 3 shows the Inventory Average Response Time with moderate and heavy load.

Looking at the performance information for the monitored components, it is evident that in order to understand the performance of the Item Catalog and Inventory components, it is also important to monitor the Business Data System or DB components to see how the Item Catalog and Inventory components are using them.

Being able to view different performance information side-by-side would make this correlation and analysis easier than browsing in the Explorer. Introscope's Dashboards show selected performance information on the same screen. That customization is discussed below.

Homing in on the Problems
On day two of the project, Introscope's directive files were updated to include the Business Data System or DB components and the application was restarted and run under load again. Looking in the Explorer, the newly-configured components were shown as top-level components, as well as called resources under the Business Data Access Components.

To create Dashboards to conveniently show this information side-by-side, component metrics were dragged from the Explorer tree onto new Dashboards, automatically creating graph views, which were labeled and organized in Panels. The Business Logic components overview is shown in Figure 4. It shows the average response times and responses per second of the four Business Logic components. In this Dashboard, which shows the transition from moderate to heavy load, the much slower response times of all the Business Logic components except the Order Placer is clearly evident.

On the Catalog Browser Dashboard, shown in Figure 5, it appears that under higher load the Catalog Browser responds more slowly because it relies more frequently on the Item Catalog and the Inventory components, which are also much slower.

The analogous pattern is evident on the Order Builder Dashboard: the Item Catalog and Inventory components are both busier and much slower under heavier load.

From the Item Catalog Dashboard, it appears that under higher load, the Item Database component is also used much more frequently, but responds quickly. This implies that the bottleneck is not in the back-end system but in the Item Catalog component or the way it is used. A corresponding pattern is evident on the Inventory Dashboard, which shows the Item Database is used more often while still responding quickly.

The Results
Identifying and fixing problems

On day three of the engagement, conclusions about the sources of the performance problems became clear.

External circumstances
As suggested initially, there were two primary external contributing circumstances causing the slowdown: more users and more searches for related items by the Catalog Browser. The relationship between the number of users and the activity of the two bottleneck components was expected to be linear, and previous log analysis suggested this was true. However, with promotion of the "find related" feature, usage of the Inventory and Item Catalog components increased at a much greater rate than that of the number of users, which led to stress on the system and general application slowdown.

Inventory and Item Catalog
component bottlenecks

The analysis showed that the Inventory and Item Catalog components were accessed every time the Catalog Browser returned product description information for an item and every time the Order Builder added an item to an order. The combination of additional users and their use of the "find related" feature meant that there were many more Inventory and Item Catalog look-ups. The number of Inventory System and Item Database lookups was also greater, but the Inventory System and Item Database themselves did not slow noticeably. This suggested that the Inventory and Item Catalog components should be used less often, or they could cache their results in order to respond more quickly.

Eliminating alternative explanations
Many other possible sources of problems (other components, back-end systems, networking, memory) were quickly eliminated as suspects, reducing both the time taken to come to conclusions and the amount of work needed to reach them.

New release moved into production
Based on the conclusions made possible by Introscope, useful, localized changes could quickly be made in several components with a high degree of confidence that those changes would have a substantial beneficial effect on the application's performance under load.

Operations considers Introscope
On day four (while development worked on system code changes), Wily's performance consultant worked with operations line and management personnel to understand how Introscope could be deployed and used in the company's production environment to monitor their various Java systems. Sample Alerts and Dashboards were set up and test events were sent into the company's existing management framework.

Figure 6 shows one of the Alerts that operations set up. Figure 7 shows a resulting Alert message and the detailed information that Introscope provides when thresholds are crossed.

The operations group has an operations center in which monitoring consoles run all the time. Figure 8 shows an example of a Dashboard that might be displayed in the operations center, providing system status information at a glance.

Operations also spent some time understanding the database structure in which Introscope stores historical data. Some sample reports were run which showed how Introscope could be used both as a source of benchmarking reports during performance testing and for service-level and trend analysis reporting over time.

With Introscope's functionality, the operations and development group expect to be able to better understand how their systems are performing, respond to problems quickly when they occur, and avoid involving the development group except in exceptional circumstances. When these circumstances do occur, operations and development are confident that they will be able to share a view into the running application and avoid guessing and finger pointing about what the causes of problems might be.

Improved performance
By the end of the week, development had made the indicated changes and began to test the performance under load. Preliminary results indicated that the large slowdowns had disappeared. Preparations were begun to deploy the updated application to production with Introscope.

Conclusion
The causes of the performance problems were quickly identified. Much time-consuming investigation and potentially expensive purchases of server hardware were avoided. The development group could promptly make well-targeted changes to improve the application's performance for their customers. To learn more about Introscope call 1-800-GETWILY or visit www.wilytech.com

 

More Stories By Carl Seglem

Carl Seglem is a member of the Wily Technology Services team and has worked with Introscope at dozens of Fortune 1000 customers. Before joining Wily, he worked on information systems development and management at Scudder Kemper Investments, and KPMG. He can be reached at [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...
@DevOpsSummit at Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, is co-located with 22nd Cloud Expo | 1st DXWorld Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
SYS-CON Events announced today that T-Mobile exhibited at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on qua...
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness. For more information, please visit https://www.cedexis.com.
SYS-CON Events announced today that Google Cloud has been named “Keynote Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Companies come to Google Cloud to transform their businesses. Google Cloud’s comprehensive portfolio – from infrastructure to apps to devices – helps enterprises innovate faster, scale smarter, stay secure, and do more with data than ever before.
SYS-CON Events announced today that Vivint to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. As a leading smart home technology provider, Vivint offers home security, energy management, home automation, local cloud storage, and high-speed Internet solutions to more than one million customers throughout the United States and Canada. The end result is a smart home solution that sav...
SYS-CON Events announced today that Opsani will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Opsani is the leading provider of deployment automation systems for running and scaling traditional enterprise applications on container infrastructure.
SYS-CON Events announced today that Nirmata will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nirmata provides a comprehensive platform, for deploying, operating, and optimizing containerized applications across clouds, powered by Kubernetes. Nirmata empowers enterprise DevOps teams by fully automating the complex operations and management of application containers and its underlying ...
SYS-CON Events announced today that Opsani to exhibit at SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California. Opsani is creating the next generation of automated continuous deployment tools designed specifically for containers. How is continuous deployment different from continuous integration and continuous delivery? CI/CD tools provide build and test. Continuous Deployment is the means by which...