iPhone Failed - Disaster Recovery Practical Insight
A lot of Disaster Recovery procedures are considered failed simply because they took longer then originally planned and documented. And a lot of these procedures take longer not because of poor equipment or incompetence. On the contrary, they take longer because the responsible people are focusing primarily on the effort to fix the problem. Here is a practical example:
On Tuesday my iPhone failed. And since its warranty is long gone i decided to fix it myself. I finally got it fixed at Wednesday night.
In my zeal to repair it, I forgot the first rule of business continuity - recover functionality within acceptable time frame. And for iPhone, just for any other mobile phone, the main functionality is TELEPHONY!!! I was unavailable for the most part of Tuesday and during parts of business hours on Wednesday.
In the end, the problem was solved, and my iPhone is working again. But then all missed calls came raining down, and that kicked me back into reality, and gave me a real perspective of what I needed to do: find a low end replacement phone instead of meddling with low-level format, firmware flashing and DFU modes. That way, I would have been contactable, and be under much less pressure to quickly fix my iPhone.
In perspective, the same behavior can be seen in many organizations during IT disaster recovery. Disaster recovery is organized and coordinate by IT people - mostly very capable engineers. And yet, a large number of Disaster Recovery actions are delayed by the effort of these good engineers focusing primarily on fixing the engineering problem - not fixing the business problem.
In a Disaster Recovery situation, the timer of recovery is known as Recovery Time Objective (RTO). That is the time interval starting from the moment ot disaster in which operation must be recovered to limited but essential functionality.
A good DR manager - regardless of his position and education does his work with a stopwatch. The time he can allow the engineers to try to fix the problem does not have a formal name so let's call it Fixing Time. It is the time difference between RTO and the tested time required to activate the Disaster Recovery systems.
Once this Fixing Time passes, Disaster Recovery preparations must start. If the problem gets fixed before completion of DR system activation, all is well. If not, RTO has been met. Oh, and the engineers can relax from the urgency pressure and work on fixing the original problem for as long as it takes
Back to my iPhone example - what was my timing? A phone RTO should be the recharge time - 2 hours. Getting a replacement phone is a walk to the store and buying the cheapest prepaid model or borrowing a spare form a friend - 30 minutes. So I needed to keep my cool, and try to fix the problem for only 1.5 hours before looking for an alternative. After that, I could have spent a week on the iPhone - no pressure to fix it fast.
Related posts
3 Rules to Prevent Backup Headaches
Business Continuity Analysis - Communication During Power Failure
Example Business Continuity Plan for Brick&Mortar Business
Business Continuity Plan for Blogs
Example Business Continuity Plan For Online Business
Talkback and comments are most welcome












2 comments:
Friend,
You have as many DR systems for your mobile as you wish, and DR computer system is usually 1.
I will think at least twice before its activation. I've been there, done that.
Yeah, it is scary, and DR is not activated lightly. But at the end of the day, for mobiles you can decide yourself, for the DR activation you need the Crisis Managers decision - which takes into account a lot more then the personal difficulties of engineers.
If the DR system activation is a problematic one for your organization, then you need to rework it, not avoid using it
Post a Comment