Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Holistic Infrastructure Monitoring and Management

Holistic Infrastructure Monitoring and Management

WebLogic Server, like most applications, provides robust and detailed monitoring tools bundled with the basic application. The embedded monitoring and management provided by the WebLogic Console is extremely useful when diagnosing and repairing a problem once it has been isolated in the WebLogic Server. But this embedded point solution is of limited use in most real-world situations where the application server is just a single component in a system of components that are all vital to providing the end-user application. When trying to quickly diagnose a general problem with the end-user application, it is much more powerful and effective to view data from each component holistically as a part of the system rather than evaluating each component by itself.

This holistic view becomes even more powerful when correlated with end-user application and business transaction performance. If business priorities and processes can be included in this view of the infrastructure, operations management becomes more valuable to the enterprise.

Point Solutions
A typical Web infrastructure is composed of a multitude of hardware and software components: hosts running various operating systems, Web servers, application servers, databases, legacy systems, and network devices. Each element of the infrastructure usually includes a point monitoring and management solution.

The command line utilities available to operating systems in the Unix family are typical point solutions. Point solutions provide power and flexibility to the experienced administrator. At the application level, vendors package tools for the monitoring and management of their applications. The WebLogic Server Console provides detailed information about many aspects of WebLogic (current connections, JVM memory usage, database connection pool load, etc.).

The common benefit of these point solutions is that they enable real-time system monitoring, troubleshooting, and resolution by the experienced administrator. Moreover, these tools are ultimately used to solve problems once they have been identified.

One difficulty with point solutions, however, lies in their heterogeneity. It is not always self-evident to an experienced Unix administrator how to identify an errant process on a Windows platform, although the operation is nearly identical to that used on Unix. Similarly, an administrator familiar with WebLogic Server Console might have difficulty using a Web server or another application server's point solution. Point monitoring solutions demand a high level of specialization and domain expertise from administrators.

A second difficulty is in their dispersion. With point solutions there is no visibility across the components, which can be of various types. Unfortunately, an operational problem with WebLogic Server may be related to an underlying problem with the host, the database server it's connected to, or the network layer. Because specialists are hired to fix particular problems, each using their own tools, problem isolation is slow and labor intensive, especially in large and highly specialized IT organizations.

A third drawback to point solutions is that they tend to be reactive. They're used when the system is already down or performing poorly and the administrator is responding in real time to the problem as the business suffers the consequences. Lacking an aggregate tool, inventive teams have crafted scripts and other tools to facilitate these activities, but the maintenance of these tools requires even more expertise.

Infrastructure as a Whole
Infrastructure managers quickly realized that the information they were getting from their various point solutions was much more valuable when viewed together as part of a larger whole. They realized that fault isolation and correlation became much easier once they had this larger view. In the network world, SNMP became the de facto standard for aggregating all of the element data and enabled master management consoles.

Suddenly, infrastructure managers could do more to support the enterprise because they were spending less time in the isolation phase of problem resolution. Infrastructure management was still principally reactive, but it was faster. For the most part, though, these goals were only achieved at the network layer. State models of IP-based networks are much easier to understand and manipulate than what is happening at the application level (see Figure 1).

As network management solutions attempted to integrate infrastructure management and monitoring above OSI Layer 3, the problem of resolving a common view of the health of the infrastructure from disparate data sources becomes even more pronounced. Application vendors extended SNMP support to their products (WebLogic, for example, can be monitored via SNMP) as an attempt to enable a common infrastructure view.

But SNMP can suffer from reliability and complexity issues, as well as being incompletely supported by some components in the infrastructure. Access to command-line utilities and Windows tools like Performance Monitor also remain critical in the complete management of infrastructure resources. Also, databases typically make most statistics available only via SQL.

Agent-based systems are often proposed as an alternative comprehensive monitoring technique, but are difficult to manage and tax system resources unnecessarily. They also often lack access to some of the components you might need.

Common access to system information from heterogeneous sources is vital. Most existing approaches to consolidation involve a Manager of Managers (MoM) system, which receives monitoring data from multiple sources. These systems tend to be prohibitively expensive to purchase, customize, and support. They also take considerable time to implement and require significant training to be used effectively.

NOCpulse Command Center uses a plug-in framework to separate monitoring data from the access methods used to gather that data, providing the benefits of the agent-based and MoM systems without suffering the disadvantages. Lacking a cross-industry standard to reconcile the very different issues encountered when monitoring an application like WebLogic versus an operating system (and such a standard seems unlikely, even considering the distant future), a flexible, extendable product-based standard (like NOCpulse Command Center's plug-in framework) is the next best thing.

Also key is a common data repository and interface. Command Center plug-ins access required metrics via multiple protocols, but the results are presented in a common format via a single Web-based user interface. Performance metrics collected from the infrastructure are gathered in a common data store, allowing easy data mining, historical event correlation, and root cause analysis through a shared report engine (see Figure 2).

The result is a holistic view of all the components that make up an end-user application: up the stack from the operating system through the application layer to the network and vertically from server to server.

Infrastructure and the End User
Too often the focus of monitoring is attention to system problems without regard to the real end-user impact. What is ultimately important is not the specific health of all of the individual components of an Internet infrastructure, but the performance of the application at the other end. Can our customers currently purchase a CD from our site? Is our billing system too slow? Does the customer service section of our site offer the help our customers need? The era of Web applications has given rise to point solutions for end-user monitoring. These products either quantify end-user experience or monitor site accessibility.

Unfortunately, the limited end-user monitoring approach ignores the infrastructure. Web site slow? Administrators go to another solution to solve the problem. Web infrastructures grow quickly in complexity; an administrator might not be able to expediently and effectively correlate end-user performance issues with a particular component of their infrastructure.

Holistic monitoring requires a common interface to both infrastructure health and end-user application performance. NOCpulse Command Center provides this common interface and allows a user to model a multi-step browser-based transaction through a point-and-click configuration tool for both remote and local monitoring.

It may be of interest to see the performance of my e-commerce site at a sample of locations on the public network. But this needs to be triangulated against local performance. Customers were unable to place orders for a time. Do I need to complain to my network provider or do I need to scale up my Web server farm? Point end-user monitoring solutions cannot answer this question; a holistic approach that correlates user experience to infrastructure health can.

Infrastructure and the Enterprise
Once we have a single view of the infrastructure that can be correlated to the performance of our end-user applications and business transactions, resources can be efficiently applied to quickly solve problems. The infrastructure can be tuned to provide better performance with a strong feedback loop of the metrics that matter, ensuring that changes actually have the intended effect. Service Level Agreements can be managed proactively; problems become defined by what the end user is experiencing or by metrics associated with the business transaction rather than by an arbitrary definition of "problem" at the infrastructure level.

The final stage in the development of infrastructure management involves connecting business management to the infrastructure. It involves making the priorities and concerns of the business transparent within the view of the infrastructure. Now precious operations resources are not only able to resolve problems quickly, but they can be applied efficiently to where they matter most: to the most urgent problem, where urgency is determined by business priorities. Fundamentally, it amounts to quickly getting the right people, with the right tools and information, to the most important problem (as defined by the priorities of the business).

For example, we might have problems with two applications: our CRM system is down and our credit card approval process is running slowly. From an operational perspective, the first problem may seem more severe, but when tied to business priorities, the risk of lost revenue mandates that effort be applied to the second problem first. A truly holistic management solution enables these types of decisions automatically.

NOCpulse Command Center allows users to build arbitrary groups of components that correspond to business processes, transactions, customers, or end-user applications. The behaviors of each of these groups (thresholds, notification destinations, escalation procedures, etc.) can be set in accordance with the importance of each group to the business. Critical customer issues get raised and resolved while lower priority or tolerable problems wait until people are free to deal with them.

Fundamentally, infrastructure management adds more and more value to the enterprise as it evolves from an inefficient, slow, reactive, element-focused approach to an efficient, responsive, proactive approach that is able to see infrastructure holistically and understand the relative importance of each end-user application to the business. When that level is achieved, infrastructure management becomes a true business enabler, allowing service level management, efficient customer problem reporting and resolution, and prioritization of fault response in accordance with business priorities. These benefits require a holistic operational view that provides the ability to correlate what is happening at the infrastructure level with what is happening at the business level.

More Stories By Greg Peters

Greg Peters, VP of Engineering at NOCpulse,
has over 10 years of experience in software development,
system engineering, and program management. He leads
the company's core product development strategy and
engineering teams.

More Stories By Lance Peterson

Lance Peterson, program manager at NOCpulse, is responsible
for interface design and engineering program management.
Previously, he worked as a programmer
analyst at Emory University and in an earlier life
taught literature, film, and [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
SYS-CON Events announced today that IoT Global Network has been named “Media Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. The IoT Global Network is a platform where you can connect with industry experts and network across the IoT community to build the successful IoT business of the future.
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?