Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Transactions: How Big Are Your Atoms?

Transactions: How Big Are Your Atoms?

This month's article is again inspired by an interesting design discussion posted on the weblogic.developer.transaction newsgroup. (Ever get the feeling I'm running short of inspiration? Ideas for new articles always welcome!)

Since the problem described is a common one with transactional design I thought it might be valuable to review the design, the problems with it, and some solutions.

The problem was stated on the newsgroup thus:

We have a Session Bean method (with a "Required" TX attribute) that creates an entity Bean, and then fires a JMS message that indicates that it was created. There is an MDB that listens for this message. When it hears it, it looks up the entity bean.

The problem is that sometimes the lookup of this entity bean will throw an ObjectNotFoundException. We have ensured that the JMS message firing uses the transaction context of the method, so that the creation of the entity bean and the firing of the message all takes place within the same transaction (we did this by using the "javax.jms.TopicConnectionFactory", and using a JMS session that was not transacted). Also, we have verified that the entity that gets created exists in the database (at least it does sometime after the lookup by the MDB fails).

So, what's going on here? The creation of the entity bean and the sending of the JMS message are in the same transaction, and we know therefore that the message will not be dispatched until the transaction is committed, so why can't the logic in the MDB see the new entity bean? The transaction manager is broken, right?

Well, no. In order to understand this situation, you need to take a step back and think about the implementation the transaction manager does. From the 10,000 ft level, things should be working: the devil must be in the detail... Let's go diving!

Let's Dive for the Devil
A transaction encompasses the entity creation and the sending of the JMS message, so they will complete as an atomic unit - either message sent and entity created or total failure, that's what the transaction manager is giving us. However, from an implementation perspective, we need to look more closely at exactly when the transaction is complete. It can't be when the application (or the EJB container) calls commit - we know that this just initiates a set of dialogues between the transaction manager and the resource managers, which is bound to take some time. The completion will happen some time later, when these dialogues are done. Diving even deeper, you may recall that these dialogues fall into two categories - the two phases of the transaction (it's called two-phase commit, after all) looking at the xa specification. You'll find that once a resource manager has replied affirmatively to a prepare, it is undertaking to guarantee to make whatever updates were in the scope of the transaction at some time in the future. Now we're getting somewhere - we've found a period of time over which things will be happening behind the scenes; maybe these asynchronous things are causing our problem. From a high-level perspective, given the xa guarantee, the transaction can be assumed to be complete once the prepare calls have all succeeded. From an implementation level, until the commit calls are processed by all the resource managers we cannot be certain that we will be able to access the updated database state, and we have no way of knowing exactly when these commits will happen - commit processing is going on in the background and the time it takes to perform a commit will vary depending on factors such as system load, resource-manager locality, the order the transaction manager sent out commit messages in, and so on. (This ignores completely the possibility of failure; imagine the database manager crashing after a prepare. The commit can't be processed until it is brought back online. How long will that take? Well, it depends on how long it takes to fix the problem - if the crash is caused by a faulty power supply in a machine, then it could take days waiting for a spare part. This whole parenthetic discussion then leads into one of my favorite subjects, the transaction abandonment timeout.)

So, the moral of this story is that you cannot rely on an atomic transaction being truly atomic in time - it will complete as a logically atomic unit, sure, but there will always be amounts of timing jitter involved in making its results visible across all the resources it touched.

Danger: Mixed Synchronicity!
It is clear now what the problem is with the design stated on the newsgroup. The assumption has been made that this asynchronous transaction processing doesn't happen. A race has been set up between the JMS and the database resource managers to commit the transaction. When JMS wins, the message-processing logic assumes the database has committed too, but it hasn't - the commit processing is still going on in the background, and the ObjectNotFound exception is thrown.

So much for the theory, how can we fix the design? There are (as always in architecture of this kind) a few options, ranging from the hacky workaround to the elegant rearchitecture.

The hacky workarounds involve coding round the problem, either with JMS message birth times or defensive coding in the MDB. If the code that creates the JMS message sets the birth time for some time in the future, the JMS system will introduce a delay into the processing path before it releases the message. This delay should give enough time for the commit processing to complete. That's a great theory as far as it goes, but how long should the delay be? As I already said, the required window will depend on system load and physical architecture, and it might vary radically in some failure conditions. Using this method for a production system will sooner or later lead to sporadic failures as loads and deployment vary, and will incur a support cost and a reliability loss. So, the defensive coding. A simple-minded approach might be to roll back the MDB, have the message redelivered, and try again; or simply try again after a pause in the MDB logic itself. That's well and good, but what if the scenario isn't object creation, but modification? Now you can't be certain that the data you're updating is the current data (at least, you'll not be certain that you're certain - it depends on the database's locking strategy); to code around this, you add a version field to the object and implement some kind of optimistic concurrency so that the MDB can wait until it's sure it's operating on the right version of the data.

The fact that you're doing all this frantic coding to work around this issue should be ringing alarm bells - clearly the architecture of the application does not mesh well with the architecture of the infrastructure. The best solution is to get to the bottom of why...

It's Not a Mesh, It's a Mess!
JMS is all about allowing processes to run asynchronously with respect to one another. JTA is all about making updates that logically execute atomically, which in turn implies synchronously (or as near synchronously as reality allows). In this scenario, an attempt is being made to use JMS as a synchronous calling mechanism - the operations on the data are clearly related to one another (synchronous) but for some reason we have interposed an asynchronous messaging system into the processing flow. Maybe the most elegant solution would be to implement the next processing step as an Entity EJB, call it via RMI, and have it participate in the original transaction. All the updates would be visible to all the processing steps then, since the updated data is visible before the commit within the transaction. But what if there's another requirement that necessitates the asynchronous path to the "stage 2 processing"? Well, wrap the Entity EJB you created in this use case in an MDB facade and the logic can then be executed synchronously or asynchronously, depending on the use case (even better, maybe the "stage 2 Entity" only offers a local interface).

As a parting observation, this kind of tricky asynchronous corner case is not at all uncommon in building transactional systems - in fact, it's more like the norm. TP systems like Tuxedo, CICS, and others all offer facilities analogous to the design pattern I just described to handle this kind of thing. So does the BEA WebLogic Workshop framework - it builds in this style atop J2EE and provides a natural, event-driven programming model while taking care of this kind of implementation detail in the framework, again demonstrating the power and potential of using such a framework to simplify implementation while increasing reliability.


More Stories By Peter Holditch

Peter Holditch is a senior presales engineer in the UK for Azul Systems. Prior to joining Azul he spent nine years at BEA systems, going from being one of their first Professional Services consultants in Europe and finishing up as a principal presales engineer. He has an R&D background (originally having worked on BEA's Tuxedo product) and his technical interests are in high-throughput transaction systems. "Of the pitch" Peter likes to brew beer, build furniture, and undertake other ludicrously ambitious projects - but (generally) not all at the same time!

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.