Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Weblogic

Weblogic: Article

Design for Production Meets the Application Delivery Process

Lessons from the world of manufacturing

An anecdote that an engineer shared with me recently reminded me of a long-standing concept in manufacturing, design for production (DFP).

The concept has to do with evaluating how a given operation - production line, supply chain, or an entire factory - is performing. DFP ratings are supposed to help product development teams pinpoint bottlenecks and decide what steps might be needed to raise productivity and profitability. DSP isn't something you hear very often in the application delivery space, but it is interesting to imagine what criteria would provide a holistic assessment of how the application life cycle is working, and help determine what tools or processes might improve the process.

The anecdote concerned a large shipping/logistics company, where the engineer had been involved in the final delivery of new and re-worked WebLogic applications to production. The company had a farm of 600 Linux machines, and his role was to automate new WebLogic deployments as much as possible and troubleshoot any issues that arose. The automation portion of the job was fairly routine: back up the config.xml, stop the admin server, zero out the file, regenerate a new config.xml for use as a template on a bare bones administrative server install, and finish by restarting the server and configuring all of the components necessary to run the clusters in a given domain. Due to the existence of all those hosts, the need to constantly revise the scripts to automate new WebLogic deployments was no trivial job; however, the main challenges he faced were always on the troubleshooting side.

One day he received a page around 6:30 p.m. and returned to the office to find that one of the company's primary shipping acknowledgement systems had completely crashed. The fact that it happened late in the evening was helpful from a severity standpoint, but going over the entire infrastructure stack line-by-line, hunting for configuration errors, made for a very long evening - and represented a task he had encountered many times previously. What was the issue? One of the engineers working on a major upgrade to the application went in and accidentally modified loads of configurations on the wrong host. He applied the changes to production instead of staging. It's easy to see what happened - the standard way for developers to compare their staging environment to production is to bring up two administration consoles side-by-side, and scan line-by-line for discrepancies. While reviewing the settings, this developer simply got confused and made his edits in the wrong browser window.

It is an easy mistake to make. The only way to tell the browser windows apart is to look up at the host names of the preproduction and production machines at the top. This is the kind of problem that often is easily dismissed with, "Sorry, I messed up," until the loss of revenue and business impact results in a painful escalation. The fact - and often the challenge - remains that developers have to access real-time production settings. Even if access were barred to live servers, errors such as this are common, even in IT. Who hasn't heard of "fat fingering" while manually updating configuration settings in production? (see Figure 1)

All this leads me to muse about what sort of criteria might be useful to assess the comparative health of an application delivery system, similar to "design for production" standards. My goal is to help managers assess weaknesses in the overall application delivery chain, and provide areas to target as they go about the work of implementing improvements. Here are five areas for consideration at a high level:

  1. How good are you at centralizing all of your configuration artifacts? In my own experience, this is an extremely tall order. The design of a repository is not so difficult, but the actual will to stick to using it on an ongoing basis is next to impossible. Configuration artifacts come from all over the stack - Web servers, application servers, clusters, middleware, LDAP servers, databases. You might be able to assemble everything into a single place for a week or so, but how do you make the metadata usable? For this you need some sort of UI that can expose the settings in a normalized way, thereby allowing the Database administrator, the System administrator, the WebLogic administrator, etc., to make sense of the configuration metadata and feel comfortable using it themselves.
  2. How well can you gauge what has changed as an application progresses across the life cycle? This includes detecting out-of-band changes to running application configurations, and monitoring the steady stream of changes that occur along the application's path to production. By enhancing the ability to compare environments up and down the infrastructure stack - and determining good and bad changes when an application moves from QA out to the staging environment - companies can gain more confidence when it comes time to unleash a new application on the production environment. More often than not, the inability to model the impact of changes before committing them to production leads IT to simply implement the change and "watch for smoke" - which has always struck me as a pretty risky way to run a business.
  3. The third set of criteria that I think would be valuable in a DFP assessment - and one that consistently comes up among managers of IT infrastructure - is the extent to which standards are defined and enforced around changes to the IT infrastructure stack. Standards have a general appeal, especially now that change management and reporting have become such hot topics. However enforcing standards around changes to the IT infrastructure stack represents more than just trying to set up a common way of reviewing or reporting on the changes. It requires setting a reliable, consistent process for how to make changes, and ideally, a set of rules that declare which changes are considered valid or invalid. If applied effectively, standards can be enormously useful in helping streamline tasks, as they establish procedures for reusing processes that are known to work. In the long run, standards also lay the foundation for a much-desired state within IT - true automation of time-consuming tasks and processes.
  4. The fourth DFP yardstick, in my opinion, should be evaluating how well the organization eliminates wasted effort. This is an area that probably has the biggest impact on the bottom line, and seems quite hard to remedy. I have a hypothesis about why this is so: it's the "blow up the data center" mentality. For reasons I've never been clear about, architects and other senior IT people seem to always conclude the worst about how inefficient business areas need to be addressed. "Our processes are so inefficient, we should just start over" is often the response. It is of course optimistic to plan to solve these problems in one big push, but it just doesn't seem realistic to me. Instead of ripping everything apart and rebuilding, how about defining a solution that solves 80 percent of the pain, and trying to execute on that as a starting point? One way to begin is to look across silos of responsibility and identify tasks that are constantly being repeated. By addressing these repetitive tasks, managers can increase application quality and throughput, and also lay the groundwork for further savings through automation.
  5. My final candidate for judging an organization's DFP state of health has to do with relative consistency. In a real-world environment of hot-fixes, patches, upgrades, and new releases, the IT infrastructure team is typically very hard-pressed to combat configuration "drift." It is commonly accepted that as a particular server in one area of the staging environment or the data center undergoes a series of consecutive changes, it will naturally drift out of alignment with the proper standard or policy. To make a lasting impact on the overall health of the application delivery process and achieve a high level of consistency and reliability across all stages of the application life cycle - from development, QA, stress/performance testing, to staging and production - the life-cycle environments have to remain in sync and be verifiable within the correct constraints set up by the IT department.
The anecdote related by the engineer that prompted this line of thought wasn't unusual. It was merely a symptom of an IT infrastructure approach that is significantly handicapped. In order to push higher-quality applications out faster, IT has to expose the underlying configuration settings that power the infrastructure stack in a transparent way, and provide the right degree of access to these same artifacts across teams of stakeholders who are not in a position to share this type of access today. The WebLogic administration console is very well suited to managing configuration changes to the application server layer, but it cannot single-handedly automate repetitive processes across the entire application life cycle. Handling that burden requires introducing tools and technologies capable of normalizing metadata from a broad range of target systems and allowing individual administrators to manage the data more efficiently.

Once new approaches to standards are implemented and repetitive tasks have been rolled into automated actions, it is reasonable to expect that IT will have a solid foundation from which to easily push out new applications as fast as developers can deliver them. The point is not that mistakes should never happen in a highly automated IT organization; rather, the best way to guard against human error is to put in place a system of checks and balances that can take into account the broad base of specialized, granular system knowledge that exists across the entire application life cycle. In agreeing on such a foundation, it is vitally important to apply the same policies across the entire IT infrastructure, from development to QA to staging and out to production. In this way, the foundation for automation is laid far in advance of the production environment, and the whole application delivery mechanism improves steadily over time. After all, just as in a manufacturing line, the IT infrastructure stack is never better than the sum of its parts, and when one silo of technology gets something wrong, the entire value chain suffers.

More Stories By Raman Sud

Raman Sud is the vice president of engineering for mValent, developer of mValent Integrity. Sud has 20 years of experience delivering mission-critical software for enterprises and telecommunication service providers leveraging distributed development and building integrated teams in the US and India.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
As IoT continues to increase momentum, so does the associated risk. Secure Device Lifecycle Management (DLM) is ranked as one of the most important technology areas of IoT. Driving this trend is the realization that secure support for IoT devices provides companies the ability to deliver high-quality, reliable, secure offerings faster, create new revenue streams, and reduce support costs, all while building a competitive advantage in their markets. In this session, we will use customer use cases...