| By Raman Sud | Article Rating: |
|
| January 30, 2006 06:00 PM EST | Reads: |
10,552 |
An anecdote that an engineer shared with me recently reminded me of a long-standing concept in manufacturing, design for production (DFP).
The concept has to do with evaluating how a given operation - production line, supply chain, or an entire factory - is performing. DFP ratings are supposed to help product development teams pinpoint bottlenecks and decide what steps might be needed to raise productivity and profitability. DSP isn't something you hear very often in the application delivery space, but it is interesting to imagine what criteria would provide a holistic assessment of how the application life cycle is working, and help determine what tools or processes might improve the process.
The anecdote concerned a large shipping/logistics company, where the engineer had been involved in the final delivery of new and re-worked WebLogic applications to production. The company had a farm of 600 Linux machines, and his role was to automate new WebLogic deployments as much as possible and troubleshoot any issues that arose. The automation portion of the job was fairly routine: back up the config.xml, stop the admin server, zero out the file, regenerate a new config.xml for use as a template on a bare bones administrative server install, and finish by restarting the server and configuring all of the components necessary to run the clusters in a given domain. Due to the existence of all those hosts, the need to constantly revise the scripts to automate new WebLogic deployments was no trivial job; however, the main challenges he faced were always on the troubleshooting side.
One day he received a page around 6:30 p.m. and returned to the office to find that one of the company's primary shipping acknowledgement systems had completely crashed. The fact that it happened late in the evening was helpful from a severity standpoint, but going over the entire infrastructure stack line-by-line, hunting for configuration errors, made for a very long evening - and represented a task he had encountered many times previously. What was the issue? One of the engineers working on a major upgrade to the application went in and accidentally modified loads of configurations on the wrong host. He applied the changes to production instead of staging. It's easy to see what happened - the standard way for developers to compare their staging environment to production is to bring up two administration consoles side-by-side, and scan line-by-line for discrepancies. While reviewing the settings, this developer simply got confused and made his edits in the wrong browser window.
It is an easy mistake to make. The only way to tell the browser windows apart is to look up at the host names of the preproduction and production machines at the top. This is the kind of problem that often is easily dismissed with, "Sorry, I messed up," until the loss of revenue and business impact results in a painful escalation. The fact - and often the challenge - remains that developers have to access real-time production settings. Even if access were barred to live servers, errors such as this are common, even in IT. Who hasn't heard of "fat fingering" while manually updating configuration settings in production? (see Figure 1)
All this leads me to muse about what sort of criteria might be useful to assess the comparative health of an application delivery system, similar to "design for production" standards. My goal is to help managers assess weaknesses in the overall application delivery chain, and provide areas to target as they go about the work of implementing improvements. Here are five areas for consideration at a high level:
- How good are you at centralizing all of your configuration artifacts? In my own experience, this is an extremely tall order. The design of a repository is not so difficult, but the actual will to stick to using it on an ongoing basis is next to impossible. Configuration artifacts come from all over the stack - Web servers, application servers, clusters, middleware, LDAP servers, databases. You might be able to assemble everything into a single place for a week or so, but how do you make the metadata usable? For this you need some sort of UI that can expose the settings in a normalized way, thereby allowing the Database administrator, the System administrator, the WebLogic administrator, etc., to make sense of the configuration metadata and feel comfortable using it themselves.
- How well can you gauge what has changed as an application progresses across the life cycle? This includes detecting out-of-band changes to running application configurations, and monitoring the steady stream of changes that occur along the application's path to production. By enhancing the ability to compare environments up and down the infrastructure stack - and determining good and bad changes when an application moves from QA out to the staging environment - companies can gain more confidence when it comes time to unleash a new application on the production environment. More often than not, the inability to model the impact of changes before committing them to production leads IT to simply implement the change and "watch for smoke" - which has always struck me as a pretty risky way to run a business.
- The third set of criteria that I think would be valuable in a DFP assessment - and one that consistently comes up among managers of IT infrastructure - is the extent to which standards are defined and enforced around changes to the IT infrastructure stack. Standards have a general appeal, especially now that change management and reporting have become such hot topics. However enforcing standards around changes to the IT infrastructure stack represents more than just trying to set up a common way of reviewing or reporting on the changes. It requires setting a reliable, consistent process for how to make changes, and ideally, a set of rules that declare which changes are considered valid or invalid. If applied effectively, standards can be enormously useful in helping streamline tasks, as they establish procedures for reusing processes that are known to work. In the long run, standards also lay the foundation for a much-desired state within IT - true automation of time-consuming tasks and processes.
- The fourth DFP yardstick, in my opinion, should be evaluating how well the organization eliminates wasted effort. This is an area that probably has the biggest impact on the bottom line, and seems quite hard to remedy. I have a hypothesis about why this is so: it's the "blow up the data center" mentality. For reasons I've never been clear about, architects and other senior IT people seem to always conclude the worst about how inefficient business areas need to be addressed. "Our processes are so inefficient, we should just start over" is often the response. It is of course optimistic to plan to solve these problems in one big push, but it just doesn't seem realistic to me. Instead of ripping everything apart and rebuilding, how about defining a solution that solves 80 percent of the pain, and trying to execute on that as a starting point? One way to begin is to look across silos of responsibility and identify tasks that are constantly being repeated. By addressing these repetitive tasks, managers can increase application quality and throughput, and also lay the groundwork for further savings through automation.
- My final candidate for judging an organization's DFP state of health has to do with relative consistency. In a real-world environment of hot-fixes, patches, upgrades, and new releases, the IT infrastructure team is typically very hard-pressed to combat configuration "drift." It is commonly accepted that as a particular server in one area of the staging environment or the data center undergoes a series of consecutive changes, it will naturally drift out of alignment with the proper standard or policy. To make a lasting impact on the overall health of the application delivery process and achieve a high level of consistency and reliability across all stages of the application life cycle - from development, QA, stress/performance testing, to staging and production - the life-cycle environments have to remain in sync and be verifiable within the correct constraints set up by the IT department.
Once new approaches to standards are implemented and repetitive tasks have been rolled into automated actions, it is reasonable to expect that IT will have a solid foundation from which to easily push out new applications as fast as developers can deliver them. The point is not that mistakes should never happen in a highly automated IT organization; rather, the best way to guard against human error is to put in place a system of checks and balances that can take into account the broad base of specialized, granular system knowledge that exists across the entire application life cycle. In agreeing on such a foundation, it is vitally important to apply the same policies across the entire IT infrastructure, from development to QA to staging and out to production. In this way, the foundation for automation is laid far in advance of the production environment, and the whole application delivery mechanism improves steadily over time. After all, just as in a manufacturing line, the IT infrastructure stack is never better than the sum of its parts, and when one silo of technology gets something wrong, the entire value chain suffers.
Published January 30, 2006 Reads 10,552
Copyright © 2006 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Raman Sud
Raman Sud is the vice president of engineering for mValent, developer of mValent Integrity. Sud has 20 years of experience delivering mission-critical software for enterprises and telecommunication service providers leveraging distributed development and building integrated teams in the US and India.
![]() |
SYS-CON Brazil News Desk 01/30/06 07:18:04 PM EST | |||
An anecdote that an engineer shared with me recently reminded me of a long-standing concept in manufacturing, design for production (DFP). The concept has to do with evaluating how a given operation - production line, supply chain, or an entire factory - is performing. |
||||
- The Economics of Cloud Computing Analyzed
- GovIT Expo Highlights Cloud Computing
- Cloud Computing Best Practices
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Why SOA Needs Cloud Computing - Part 1
- The Cloud Transition: What Does It Mean For You?
- Cloud Computing Strategy
- IBM’s Mainframe Monopoly Threatened by BMC Founder’s Shop
- Economy Drives Adoption of Virtual Lab Technology
- Virtualization Expo Call for Papers Deadline December 15
- Oracle in Leader's Quadrant for Enterprise Application Servers
- Oracle Fusion Middleware Delivers World Record Single-Node Result
- The Economics of Cloud Computing Analyzed
- The Difference Between Web Hosting and Cloud Computing
- GovIT Expo Highlights Cloud Computing
- Cloud Computing Best Practices
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Citrix Aims To Cripple VMware’s Cloud Designs
- Product Evaluation: JBoss TCO Calculator
- Why SOA Needs Cloud Computing - Part 1
- Build Reliability into Cloud Computing for SMBs
- Perhaps SOA is More Strategy Than Architecture
- EC Wrong, Wrong, Wrong – and Sloppy to Boot: Intel
- Five Reasons to Choose a Private Cloud
- Java vs C++ "Shootout" Revisited
- Where Are RIA Technologies Headed in 2008?
- Configuring Eclipse for Remote Debugging a WebLogic Java Application
- Migrating a JBoss EJB Application to WebLogic
- XA Transactions
- The Top 250 Players in the Cloud Computing Ecosystem
- An Introduction to Abbot
- WebLogic Tutorial: "Integrating Apache Poi in WebLogic Server"
- Eclipse "Pollinate" Project to Integrate with Apache Beehive
- Failover and Recovery of Enterprise Applications - Part 1
- Cover Story: A Practical Solution to Internationalization of a J2EE Web App
- WebSphere vs WebLogic: IBM and BEA Spar Over SPEC Results

































