Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Transactions: How Big Are Your Atoms?

Transactions: How Big Are Your Atoms?

This month's article is again inspired by an interesting design discussion posted on the weblogic.developer.transaction newsgroup. (Ever get the feeling I'm running short of inspiration? Ideas for new articles always welcome!)

Since the problem described is a common one with transactional design I thought it might be valuable to review the design, the problems with it, and some solutions.

The problem was stated on the newsgroup thus:

We have a Session Bean method (with a "Required" TX attribute) that creates an entity Bean, and then fires a JMS message that indicates that it was created. There is an MDB that listens for this message. When it hears it, it looks up the entity bean.

The problem is that sometimes the lookup of this entity bean will throw an ObjectNotFoundException. We have ensured that the JMS message firing uses the transaction context of the method, so that the creation of the entity bean and the firing of the message all takes place within the same transaction (we did this by using the "javax.jms.TopicConnectionFactory", and using a JMS session that was not transacted). Also, we have verified that the entity that gets created exists in the database (at least it does sometime after the lookup by the MDB fails).

So, what's going on here? The creation of the entity bean and the sending of the JMS message are in the same transaction, and we know therefore that the message will not be dispatched until the transaction is committed, so why can't the logic in the MDB see the new entity bean? The transaction manager is broken, right?

Well, no. In order to understand this situation, you need to take a step back and think about the implementation the transaction manager does. From the 10,000 ft level, things should be working: the devil must be in the detail... Let's go diving!

Let's Dive for the Devil
A transaction encompasses the entity creation and the sending of the JMS message, so they will complete as an atomic unit - either message sent and entity created or total failure, that's what the transaction manager is giving us. However, from an implementation perspective, we need to look more closely at exactly when the transaction is complete. It can't be when the application (or the EJB container) calls commit - we know that this just initiates a set of dialogues between the transaction manager and the resource managers, which is bound to take some time. The completion will happen some time later, when these dialogues are done. Diving even deeper, you may recall that these dialogues fall into two categories - the two phases of the transaction (it's called two-phase commit, after all) looking at the xa specification. You'll find that once a resource manager has replied affirmatively to a prepare, it is undertaking to guarantee to make whatever updates were in the scope of the transaction at some time in the future. Now we're getting somewhere - we've found a period of time over which things will be happening behind the scenes; maybe these asynchronous things are causing our problem. From a high-level perspective, given the xa guarantee, the transaction can be assumed to be complete once the prepare calls have all succeeded. From an implementation level, until the commit calls are processed by all the resource managers we cannot be certain that we will be able to access the updated database state, and we have no way of knowing exactly when these commits will happen - commit processing is going on in the background and the time it takes to perform a commit will vary depending on factors such as system load, resource-manager locality, the order the transaction manager sent out commit messages in, and so on. (This ignores completely the possibility of failure; imagine the database manager crashing after a prepare. The commit can't be processed until it is brought back online. How long will that take? Well, it depends on how long it takes to fix the problem - if the crash is caused by a faulty power supply in a machine, then it could take days waiting for a spare part. This whole parenthetic discussion then leads into one of my favorite subjects, the transaction abandonment timeout.)

So, the moral of this story is that you cannot rely on an atomic transaction being truly atomic in time - it will complete as a logically atomic unit, sure, but there will always be amounts of timing jitter involved in making its results visible across all the resources it touched.

Danger: Mixed Synchronicity!
It is clear now what the problem is with the design stated on the newsgroup. The assumption has been made that this asynchronous transaction processing doesn't happen. A race has been set up between the JMS and the database resource managers to commit the transaction. When JMS wins, the message-processing logic assumes the database has committed too, but it hasn't - the commit processing is still going on in the background, and the ObjectNotFound exception is thrown.

So much for the theory, how can we fix the design? There are (as always in architecture of this kind) a few options, ranging from the hacky workaround to the elegant rearchitecture.

The hacky workarounds involve coding round the problem, either with JMS message birth times or defensive coding in the MDB. If the code that creates the JMS message sets the birth time for some time in the future, the JMS system will introduce a delay into the processing path before it releases the message. This delay should give enough time for the commit processing to complete. That's a great theory as far as it goes, but how long should the delay be? As I already said, the required window will depend on system load and physical architecture, and it might vary radically in some failure conditions. Using this method for a production system will sooner or later lead to sporadic failures as loads and deployment vary, and will incur a support cost and a reliability loss. So, the defensive coding. A simple-minded approach might be to roll back the MDB, have the message redelivered, and try again; or simply try again after a pause in the MDB logic itself. That's well and good, but what if the scenario isn't object creation, but modification? Now you can't be certain that the data you're updating is the current data (at least, you'll not be certain that you're certain - it depends on the database's locking strategy); to code around this, you add a version field to the object and implement some kind of optimistic concurrency so that the MDB can wait until it's sure it's operating on the right version of the data.

The fact that you're doing all this frantic coding to work around this issue should be ringing alarm bells - clearly the architecture of the application does not mesh well with the architecture of the infrastructure. The best solution is to get to the bottom of why...

It's Not a Mesh, It's a Mess!
JMS is all about allowing processes to run asynchronously with respect to one another. JTA is all about making updates that logically execute atomically, which in turn implies synchronously (or as near synchronously as reality allows). In this scenario, an attempt is being made to use JMS as a synchronous calling mechanism - the operations on the data are clearly related to one another (synchronous) but for some reason we have interposed an asynchronous messaging system into the processing flow. Maybe the most elegant solution would be to implement the next processing step as an Entity EJB, call it via RMI, and have it participate in the original transaction. All the updates would be visible to all the processing steps then, since the updated data is visible before the commit within the transaction. But what if there's another requirement that necessitates the asynchronous path to the "stage 2 processing"? Well, wrap the Entity EJB you created in this use case in an MDB facade and the logic can then be executed synchronously or asynchronously, depending on the use case (even better, maybe the "stage 2 Entity" only offers a local interface).

As a parting observation, this kind of tricky asynchronous corner case is not at all uncommon in building transactional systems - in fact, it's more like the norm. TP systems like Tuxedo, CICS, and others all offer facilities analogous to the design pattern I just described to handle this kind of thing. So does the BEA WebLogic Workshop framework - it builds in this style atop J2EE and provides a natural, event-driven programming model while taking care of this kind of implementation detail in the framework, again demonstrating the power and potential of using such a framework to simplify implementation while increasing reliability.


More Stories By Peter Holditch

Peter Holditch is a senior presales engineer in the UK for Azul Systems. Prior to joining Azul he spent nine years at BEA systems, going from being one of their first Professional Services consultants in Europe and finishing up as a principal presales engineer. He has an R&D background (originally having worked on BEA's Tuxedo product) and his technical interests are in high-throughput transaction systems. "Of the pitch" Peter likes to brew beer, build furniture, and undertake other ludicrously ambitious projects - but (generally) not all at the same time!

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.