Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Failover and Recovery of Enterprise Applications - Part 1

High availability - moving beyond clustering

In enterprise application architecture, it is naïve to assume that none of the software/hardware components will go down. In fact, most of the IT managers and architects acknowledge this. However, a well-tested and robust recovery procedure continues to take a back seat when designing and implementing software projects. In several scenarios, administrators end up performing basic failover testing by shutting down the processes and verifying that the subsequent requests succeeded.

Although this level of testing can satisfy the failover requirements for the records, more robust failover testing needs to be performed to ensure a proper recovery if failures do occur. One of the primary reasons for the lack of robust recovery and well-tested procedures can be attributed to a high degree of reliance on the application server's in-built failover capabilities. While it is true that most of the popular application servers boost their clustering and associated high-availability features, it is equally important that system architects need to go beyond those features for an end-to-end high availability of the application. This article suggests how good planning, robust scripting, and a well- thought-out application design, along with application server facilities, can provide a highly available environment for the entire application. In the first article in a series of three, I will cover the basic high-availability options available while deploying a J2EE application on WebLogic Server.

Before diving into specifics, it is important to understand some of the commonly used terms when one talks about high-availability.

High Availability - So, what does high availability really mean? From a user's point of view, it means an application that is always available to perform the user's transactions. In other words, even though one or more of the back-end system becomes unavailable, the application may be architected in such a manner that these failures are not visible to the end user. High availability is a relative term - for some applications, there is high availability only if the site is never down - or zero downtime. For some applications it should be available, without interruptions - during the normal business operations. How and when a system is called highly available is typically defined by SLA - Service Level Agreements. Each organization may have its own system level agreements for the nonavailability of the applications during certain maintenance windows.

Transparent Failover - A highly available system does not necessarily imply that failover is always transparent. In other words, when one of the components within the application fails, the subsequent requests get routed to other available servers, thereby making it highly available. However it is possible for users to lose their sessions. There may even be a loss of data during the failure and things of that nature. In these cases, even though the system is available for subsequent requests, none of the above scenarios truly represents a transparent failover within an enterprise application.

Fault Tolerance - Fault tolerance is the system's ability to keep the system available and maintain data consistency in case of failure in one or more of its software components.

Clustering - Perhaps the most common term used in conjunction with HA, clustering is essentially a service provided by application servers in which multiple servers run as a group. The clustering service is responsible for load balancing the user requests and also for rerouting the incoming traffic to available servers should one of the servers in the cluster become unavailable.

Recovery - The ability of the system to recover itself from a failure condition as well as to bring the state of the objects into their pre-failure condition.

Failure Points
In an enterprise application, there are several points of failure - both within each individual component and at integration touch points with other components and interfaces. As the number of interfaces increases, the failover and recovery procedures become increasingly complex due to interdependencies involved. Although each component within the application may have recovery procedures for itself, the problem arises when one component or a combination of components within the system fails simultaneously.

So, a first step in designing a highly available enterprise system is identifying all of the possible permutations and combinations of failure points within the system. The identification process should involve all of the components within the system - application, network, hardware, and database - each and every component involved. As one begins to create such a matrix, it becomes overly complicated as the number of components and products that are mixing increases. This article will serve to focus failover and recovery of components and applications within the WebLogic Platform.

WebLogic Platform
WebLogic Server - WebLogic Server provides a number of J2EE services that include Java Messaging Service (JMS), Transactions (JTS/JTA) and Security, Servlet Engine, EJB container, and so forth. When designing enterprise applications for high availability it is crucial that architects have a good understanding of how these services failover when one of the instances hosting them goes down. A thorough understanding of the failover and recovery procedures of these services is also critical in automating the recovery of a downed instance or a system.

Clustering Overview - Perhaps no discussion on WLS high availability is complete without touching on WebLogic Clustering. An application that is deployed homogenously in a cluster is generally highly available. When one of the server instances goes down, the request is routed to the other available server in the cluster. In fact, the concept of the cluster is not new to J2EE architects and WebLogic Server administrators. Readers of this article are encouraged to review the WebLogic Server clustering documentation (http://e-docs.bea.com/wls/docs81/cluster/index.html). Essentially, most of the services provided by WebLogic can be categorized as either clusterable (i.e., multiple instances of these run on different services on multiple servers), or nonclusterable (there may be multiple instances of these services configured, but only one is active at a given time within the cluster). More information on which services can be clustered is provided in WebLogic Server documentation. Table 1 lists the clustered and nonclustered services in WebLogic Platform 8.1.

More Stories By Sudhir Upadhyay

Sudhir Upadhyay is currently with Architecture and Shared services at JP Morgan Chase where he is an application architect. Prior to joining JPMorgan, he was a principal consultant with BEA Professional Services where he helped customers design and implement enterprise J2EE solutions. He is a BEA Certified WebLogic Developer and a Sun Certified Java Developer.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Thirunavukarasu 02/04/06 06:31:00 AM EST

A neatly explained technical atricle. Very useful, very crisp. Thanks .

Viktor Lioutyi 08/12/05 10:34:27 AM EDT

Hi,

How to test failover in automatic manner?

"In several scenarios, administrators end up performing basic failover testing by shutting down the processes and verifying that the subsequent requests succeeded.

Although this level of testing can satisfy the failover requirements for the records, more robust failover testing needs to be performed to ensure a proper recovery if failures do occur."

We did the manual testing and failover worked. But we would like to do automatic testing of failover to make sure that it works for all our 1000+ pages. BEA does not have any tool for such testing.

There are different reasons why someone may want to test all pages for failover.

1) WebLogic only replicates attributes that were modified. Call of session's setAttribute() method is an indication for WebLogic that attribute was modified. This call may be done explicitly or implicitly when jsp tags are used. It is possible that on some pages members of complex attributes were modified but WebLogic was not notified about it, so it will not replicate such attributes.

2) Complex attributes may reference other objects and attributes. After replication these references may be broken. For example, attribute A and B references object C. Only attribute A was modified, so only A will be replicated. After the replication A and B may point to different copies of C and program may not work correctly anymore.

3) Some objects are assumed to be singletons. Developer needs to provide special implementation for serialization to support replication of singleton objects. If this implementation is omitted, then replication may create copies of a singleton object.

4) Transient fields are not going to be replicated but there should be a recovery code that restores values of these fields after replication. Without testing we do not know if all our recovery code works correctly or not.

There are probably other reasons too.

Does anybody know about any tool for automatic testing of failover (or at least just session replication) for WebLogic and/or WebSphere?

Thanks,
Viktor

@ThingsExpo Stories
From 2013, NTT Communications has been providing cPaaS service, SkyWay. Its customer’s expectations for leveraging WebRTC technology are not only typical real-time communication use cases such as Web conference, remote education, but also IoT use cases such as remote camera monitoring, smart-glass, and robotic. Because of this, NTT Communications has numerous IoT business use-cases that its customers are developing on top of PaaS. WebRTC will lead IoT businesses to be more innovative and address...
In his session at @ThingsExpo, Sudarshan Krishnamurthi, a Senior Manager, Business Strategy, at Cisco Systems, discussed how IT and operational technology (OT) work together, as opposed to being in separate siloes as once was traditional. Attendees learned how to fully leverage the power of IoT in their organization by bringing the two sides together and bridging the communication gap. He also looked at what good leadership must entail in order to accomplish this, and how IT managers can be the ...
SYS-CON Events announced today that Secure Channels, a cybersecurity firm, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Secure Channels, Inc. offers several products and solutions to its many clients, helping them protect critical data from being compromised and access to computer networks from the unauthorized. The company develops comprehensive data encryption security strategie...
"DX encompasses the continuing technology revolution, and is addressing society's most important issues throughout the entire $78 trillion 21st-century global economy," said Roger Strukhoff, Conference Chair. "DX World Expo has organized these issues along 10 tracks with more than 150 of the world's top speakers coming to Istanbul to help change the world."
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
What sort of WebRTC based applications can we expect to see over the next year and beyond? One way to predict development trends is to see what sorts of applications startups are building. In his session at @ThingsExpo, Arin Sime, founder of WebRTC.ventures, discussed the current and likely future trends in WebRTC application development based on real requests for custom applications from real customers, as well as other public sources of information.
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
SYS-CON Events announced today that Calligo has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo is an innovative cloud service provider offering mid-sized companies the highest levels of data privacy. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalized support service from its globally located cloud platform...
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
Consumers increasingly expect their electronic "things" to be connected to smart phones, tablets and the Internet. When that thing happens to be a medical device, the risks and benefits of connectivity must be carefully weighed. Once the decision is made that connecting the device is beneficial, medical device manufacturers must design their products to maintain patient safety and prevent compromised personal health information in the face of cybersecurity threats. In his session at @ThingsExpo...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence a...