Voices » HBR Voices » Susan Cramm » Emergency Care for Your Sick IT Systems
RSS Feed
2:00 PM Thursday September 18, 2008
You arrive early one morning only to discover that a mission critical system, the one that supports the fulfillment of inventory in support of your retail operations, is DOA - again.
And, as the system owner, it's your problem to solve.
It's a sobering reality that many companies are at major risk due to the fragile technologies that support operations. System downtime or degradation extracts a huge financial toll; up to 3.6% of revenue according to one 2004 study.
And no one is to blame. Every system requires modifications over time. Since change happens gradually, new functionality and technologies will be bolted on and, eventually, the system will become more and more complex and unstable.
You've convened meetings with various overworked IT specialists to discuss how to improve the health of the system. Like physicians diagnosing a rare condition, there's lots of hemming and hawing and milling around smartly. Justifiably so, the specialists raise serious concerns about jumping in and trying to fix the system since changes to one part of the system can have cascading, unanticipated impacts to others.
The conversations with the IT specialists swirl. There's prescriptions of design modifications, rewrites, service level objectives, applications monitoring tools, and process improvements, but the ideas never settle down into a logical treatment plan.
And none of these prescriptions addresses the need to decrease outages to a tolerable level.
Avoid this scenario. You don't want to get lulled in to buying into a long term wellness program without first stabilizing the existing system, just as you wouldn't start start a running regimen after learning of a heart condition.
You don't need meetings - you need a mandate and a team of experts focused on your case.
You need the organizational equivalent of an IT emergency room.
Here's what to do.
Appeal to the powers-that-be to organize a dedicated cross-functional team of IT and business specialists.
Assign business experts to work side-by-side with IT experts, hand selected from the various IT specialties, including architecture, development, infrastructure engineering and the help desk.
Ensure that this team reports to a seasoned IT executive who, in turn, reports directly to you and the CIO.
Once this team is in place, make sure they focus on the following four imperatives:
1. Start evaluating the symptoms (aka, outages), documented in incident reports available to everyone on the team. Ideally, the incidents should be funneled through the help desk for initial diagnosis and escalation. However, since many organizations don't have trained help desk personnel and disciplined incident management processes, ensure that these calls go to the team and that the they document the issues and the band-aids they applied to get the system up and running.
2. Document the business process, applications, data, and infrastructure architectures. The team needs to have a common, big-picture view of what the system does and how it's built. This entails analyzing the business process and mapping the process to the underlying applications and data and then mapping the applications and data to the underlying technologies. Without this perspective, it's impossible to diagnose and rectify issues. You'll quickly discover how remarkable (and scary) it is just how little is known about systems that have been around for years, doing really important things.
3. Conduct differential diagnosis. Analyze the incidents to identify problems and prioritize fixes based on business impact. Over time, the team will see patterns emerge (e.g., problems occur at month end, when volume reaches certain levels, when data contains certain values, etc.) and root cause analysis will focus on identifying changes that will reduce the frequency and duration of the outages.
4. Implement the changes. Protect against the likelihood of introducing more problems and instability by validating the changes by testing in an environment that mirrors the one driving the production system.
Continue steps 2 through 4 until the outages reach an "acceptable" level.
Disband the dedicated team once an ongoing "wellness" program is defined that ensures regular monitoring of system performance so that issues can be escalated and quickly addressed.
Finally, start developing a business case to justify new systems to replace the problematic ones. Keep in mind that the goal is not to replace the existing system in kind, but to identify business process changes that will provide fundamental improvements to business, as well as systems, performance.
Stay up to date on the latest HBR articles, podcasts, blogs, and more. Sign up for the HBR Email Newsletter today.
Never miss a new post from your favorite blogger again with the HarvardBusiness.org Daily Alert email. The Alert delivers the latest blog posts from HarvardBusiness.org and HBR.org directly to your inbox every morning at 8:00 AM ET.
TrackBack URL for this entry:
http://blogs.harvardbusiness.org/cgi-bin/mt/mt-tb.cgi/2830
No trackbacks have been made to this entry.
Posting Guidelines
We hope the conversations that take place on HarvardBusiness.org will be energetic, constructive, free-wheeling, and provocative. To make sure we all stay on-topic, all posts will be reviewed by our editors and may be edited for clarity, length, and relevance.
We ask that you adhere to the following guidelines.
Susan Cramm is the founder and president of Valuedance and a recognized industry expert on information technology leadership and coaching. She is the former CFO and executive vice president at Chevy’s Mexican Restaurants. Prior to Chevy’s, Cramm worked with the Taco Bell Corporation and held the positions of CIO and vice president of the Information Technology Group and Senior Director for Financial and Strategic Planning.
ADVERTISEMENT
If you can't read a balance sheet, you'd better read this. This specially priced set gives managers mastery of the financial basics they need to plan, budget, forecast, and control resources with confidence.
This specially priced set will show you how to plan and execute a course of action that will carry you and your employees through the current economic upheaval and help everyone maximize their performance.
ADVERTISEMENT
Comments
Perhaps it may be advisable, previously or simultaneosly, to have someone to document the syetem, since, my experience teaches me, that after a short period nobody keeps the documentation updated.
- Posted by Sergio
September 30, 2008 4:03 PM
Great model for a help desk, governance, and change control method. Many organizations have legacy or purchased systems where not even the data structures are known. Most have serious data quality problems. Many IT personnel ignore change control requirements.
Many organizations have outsourced IT or have waves of short-term consultants with no industry experience changing their systems. Even if they budget for documentation, many projects eat up documentation budgets with coding cost overruns they would have avoided had they spent some time documenting and analyzing. Business people are often daunted by the size and scope of the issues.
CIO's often have only a short time to make an impact. Solving information problems takes time and money. The good news is that, as you suggest, it can be addressed incrementally in a cost-effective manner before an expensive disaster occurs. It also lowers the overall maintenance cost and mitigates risk.
- Posted by Lamont
October 5, 2008 11:20 PM