Contingency Planning Conference 2010
For anyone near New York City, you can check out the Planning & Management Conference (CPM 2010 East) on November 3-4.
According to the promoters, it is a 4-track advanced-level program taught by expert faculty in small, classroom settings. Plus, you can earn up to 35 Continuing Education Activity Points (CEAPs) just for attending.
You can register for the conference rate with a $100 discount off the full conference rate. Visit http://bit.ly/CPM2010MIS and register with the promotion code NX1C79.
Shortinfosec is distributing this information without any commercial interest. Sadly, we won't be able to visit. But anyone who visits is welcome to publish a guest post on Shortinfosec about the conference.
Choosing a Disaster Recovery Center Location
When preparing a Disaster Recovery Center, one of the most important decisions is the location of the location of the Disaster Recovery Center. Up until the 9/11, a lot of companies held their DR centers in the adjacent building, and right after 9/11, everyone wanted to go as far from the primary data center as possible.
One of the common misconceptions of Disaster Recovery planning is that longer distance ensures better disaster protection. Of course, increasing the distance between data centers reduces the likelihood that the two centers are affected by the same disaster. But just putting distance between locations may not be sufficient protection. In reality, the best distance for a DR location is dictated by a multitude of factors:
- Minimal parameters dictated by regulators - certain businesses, especially telco and finance must maintain regulatory compliance. It is not unusual for regulators to mandate minimal distance between the primary and the Disaster Recovery location. You must comply to these parameters
- Corporate RTO parameters - the company has decided that the Disaster Recovery Center must be up and running within the time defined as RTO - Recovery Time Objective. This time will include the travel time to Disaster Recovery center and the system activation times. So it is always important to take this parameter into account when choosing a Disaster Recovery site
- Telecommunications services - larger distance between the primary and DR site means higher telecommunication costs and limits the choice of appropriate remote copy technology. For instance, synchronous replication is still very difficult to achieve past the 40km mark. Choose a location that is sufficiently distant but still manages to deliver the required bandwidth for the chosen replication/remote copy technology
- Geophysical conditions -In order to avoid a natural disaster, it is not always sufficient to move your Disaster Recovery center to a specific distance from the primary center. Most natural disasters deliver high impact in areas which support their spread by terrain configuration or other geophysical conditions. For instance, a safe hurricane impact distance was considered 150 km. However hurricane Katrina lost strength after over 240 km inland since there was no terrain feature to stop it. Best location should be in a separate flood basin, off a seismic fault line (or at least on a different one) and with a large mountain between the primary and the DR site
- Means of Transportation - increased distance between primary and DR site may make it difficult for employees to travel to the recovery site. This is especially true in situations of crisis, when roads may be damaged or blocked, or public transport is stopped by strikes. Choose a site that has multiple travel options - railroad, motorway, even river boat
- Vicinity of Strategic objects - It is never smart to place your Disaster Recovery center in the vicinity of objects of strategic importance to the country. Such locations are prone to terrorist attacks, and attack by opposing forces in a military conflict. Also, even in situations of natural disasters, strategic locations will have strong military presence that may limit access to your Disaster Recovery center. Strategic objects are military bases, airports, refineries and oil depots etc. Choose a safe distance from such locations
There is no such thing as an ideal Disaster Recovery location. The optimal location is the one that minimizes the risks at an acceptable cost and meets the required SLAs and authorities' regulations.
Talkback and comments are most welcome
Related posts
Mitigating Risks of the IT Disaster Recovery Test
iPhone Failed - Disaster Recovery Practical Insight
Business Continuity Analysis - Communication During Power Failure
Business Continuity Plan for Brick & Mortar Businesses
Example Business Continuity Plan For Online Business
Labels: business continuity, information security, information strategy
Mitigating Risks of the IT Disaster Recovery Test
The IT Disaster Recovery Test as part of the Business Continuity testing is becoming an annual event for most IT departments. It is mandated by a lot of regulators, nearly insisted upon by internal audit and ofcourse a very healthy thing to do.
But performing the IT DRP test without proper risk management can put your organization at significant risk.
To put things into perspective, let's analyze the steps, risks and countermeasures of an IT Disaster Recovery test:
| DRP Test Step | Activity | Risks | Countermeasures |
| 1. Failure of primary systems | In order to perform a disaster situation, the Primary systems need to be caused to fail on some level |
|
|
| 2. Activation of Disaster Recovery systems | Severing any relation between the DR and the primary systems and running the DR systems as temporary primary |
|
|
| 3. Reconfiguring the user environment | Intervening in the end-user environment in a way that will make them use the DR system |
|
|
| 4. Reverting to the primary systems | Resuming the primary systems at some level and reestablishing the relation between the DR and the primary systems |
|
|
With all these risks, is it more prudent to never perform an IT DRP test? - Absolutely NOT, and here is why:
- Performing the IT DRP test actually confirms that things are running, and if something breaks, you are much more prepared for the next time.
- Not performing the test will just make you think everything is great, until the incident occurs. And the incident is just as certain as death and taxes
Talkback and comments are most welcome
Related posts
iPhone Failed - Disaster Recovery Practical Insight
Business Continuity Analysis - Communication During Power Failure
Business Continuity Plan for Brick & Mortar Businesses
Example Business Continuity Plan For Online Business
Is the Server Running - optimal use of redundancy on a budget
When purchasing a server, most companies select a server class computer from a reputable manufacturer. And in this day, usually the servers come loaded with redundant components to optimize server availability and make it more resilient. And yet a lot of these servers fail at the first glitch simply because they are not configured properly. Here is a brief blueprint on how to optimally utilize the purchased and paid redundancy.
First, let's analyze what is usually redundant in a server. If we take into account only the garden variety commercial servers and ignore the hugely expensive fault tolerant machines, here is what you usually get:
- Redundant Disk drives
- Redundant Power Supplies
- Redundant Network Adapters
To achieve a maximum from these elements, you should perform the following steps:
- Redundant Disk drives - organize them into a RAID configuration. RAID 1 (mirror) is the best in terms of redundancy and speed. But you loose exactly 50% of capacity. RAID 5 (parity) gives you the best trade off between capacity loss and optimal performance. When planning a RAID, look for a server that has a hardware RAID controller. The modern server operating systems can make a RAID themselves, but this way the operating system has to dedicate resources and have specific software to maintain the RAID - thus burdening the main CPU with this task

- Redundant Power Supplies - connect all power supplies of the server to power lines coming from a different circuit breaker. This will save you a lot of grief if the cleaning lady decides to connect her vacuum cleaner to an outlet connected to the same circuit breaker as the server and overloads it. If possible, connect all power supplies of the server to different Uninterruptible Power Supplies. This way, all UPS systems will help your server ride out the blackout.

- Network adapters - First, organize the network adapters to work as a failover team. This is realized with specific drivers delivered by the manufacturer, and the driver creates a virtual network adapter. The virtual network adapter is configured with the IP address of the server, and it binds to one of the physical network adapters. Should the adapter loose connectivity, the driver will bind the virtual network adapter to the other physical one, thus reestablishing connectivity. To achieve optimal solution, connect the physical network adapters to several switches which are interconnected via trunk links - thus creating one large meta-switch.

Talkback and comments are most welcome
Related Posts
iPhone Failed - Disaster Recovery Practical Insight
A lot of Disaster Recovery procedures are considered failed simply because they took longer then originally planned and documented. And a lot of these procedures take longer not because of poor equipment or incompetence. On the contrary, they take longer because the responsible people are focusing primarily on the effort to fix the problem. Here is a practical example:
On Tuesday my iPhone failed. And since its warranty is long gone i decided to fix it myself. I finally got it fixed at Wednesday night.
In my zeal to repair it, I forgot the first rule of business continuity - recover functionality within acceptable time frame. And for iPhone, just for any other mobile phone, the main functionality is TELEPHONY!!! I was unavailable for the most part of Tuesday and during parts of business hours on Wednesday.
In the end, the problem was solved, and my iPhone is working again. But then all missed calls came raining down, and that kicked me back into reality, and gave me a real perspective of what I needed to do: find a low end replacement phone instead of meddling with low-level format, firmware flashing and DFU modes. That way, I would have been contactable, and be under much less pressure to quickly fix my iPhone.
In perspective, the same behavior can be seen in many organizations during IT disaster recovery. Disaster recovery is organized and coordinate by IT people - mostly very capable engineers. And yet, a large number of Disaster Recovery actions are delayed by the effort of these good engineers focusing primarily on fixing the engineering problem - not fixing the business problem.
In a Disaster Recovery situation, the timer of recovery is known as Recovery Time Objective (RTO). That is the time interval starting from the moment ot disaster in which operation must be recovered to limited but essential functionality.
A good DR manager - regardless of his position and education does his work with a stopwatch. The time he can allow the engineers to try to fix the problem does not have a formal name so let's call it Fixing Time. It is the time difference between RTO and the tested time required to activate the Disaster Recovery systems.
Once this Fixing Time passes, Disaster Recovery preparations must start. If the problem gets fixed before completion of DR system activation, all is well. If not, RTO has been met. Oh, and the engineers can relax from the urgency pressure and work on fixing the original problem for as long as it takes
Back to my iPhone example - what was my timing? A phone RTO should be the recharge time - 2 hours. Getting a replacement phone is a walk to the store and buying the cheapest prepaid model or borrowing a spare form a friend - 30 minutes. So I needed to keep my cool, and try to fix the problem for only 1.5 hours before looking for an alternative. After that, I could have spent a week on the iPhone - no pressure to fix it fast.
Related posts
3 Rules to Prevent Backup Headaches
Business Continuity Analysis - Communication During Power Failure
Example Business Continuity Plan for Brick&Mortar Business
Business Continuity Plan for Blogs
Example Business Continuity Plan For Online Business
Talkback and comments are most welcome
Labels: business continuity, disaster recovery, information security
Business Continuity Analysis - Communication During Power Failure
As the world gets ever more hungry for power, resources are depleting while the climate is changing and large storms become frequent, power outages and massive problems on the grid all over the world will start to rise. While massive power outages will bring a lot of problems, companies will strive to continue some level of operation. And to achieve it, they need to communicate - both internally and externally. And massive power failures dictate special analysis of the telco backup resources. Here is the analysis and recommendations:
What happens to the telco infrastructure during a massive power failure?
- Every advanced telco device not on UPS will stop functioning immediately, including: routers and modems, PBX, faxes, cordless phones, ISDN phones
- The advanced telco devices supported by UPS will fail within 90-180 minutes after the power failure, since the same UPS is also supporting PCs and other equipment
- The alarm systems which usually have their independent battery pack will stop operating after approximately 24 hours
- The gsm telephony base stations are mostly supported by UPS, with only the largest ones supported by generators. Therefore, they will fail within 100-200 minutes, after the power failure.
- The only remaining telco resources after approximately 4 hours of blackout will be
- The advanced telco devices supported by a diesel generator
- Public Switched Telephony Network (PSTN) lines - they are powered over the telephone line by the telco PBX, which in turn is powered by a generator
- Islands of mobile telephony in the cells created by the Large Mobile Telephony base stations
- Satellite communication devices, like VSAT or IRIDIUM phones - these are a very temporary solution, since they are strongly dependent on battery capacity
Although diesel generators are not expensive, companies avoid them for all except the largest company locations for the following reasons:
- installation brings a wealth of problems for companies, since they need approval from fire inspectors,
- the company must adhere to safety and pollution regulations to install the generator
- maintenance costs cannot be ignored, especially when the normal grid is
- the diesel generators can become unreliable in very hot or very cold days
- generators can become dysfunctional due to neglect or external influence, for instance, the other company sealing off the exhaust pipe during remodeling
Recommended Measures
- Place diesel generators at all locations where it is possible - don't go overboard, just use a small device with 6-8 hours of anatomy and internal tank. After 10 hours of operation, you can create a controlled shutdown for a refill.
- Have dedicated "red phone" PSTN line at each location or several of them attached to a simple phone device (with no external power requirements) , which can be used during normal operations, but which will become the primary means of communication during a longer period blackout.
- Include the threat in your Business Continuity Plan (BCP) and define proper steps to be taken in case of occurrence
- Test the BCP with the power failure scenario.
Naturally, the measures are simple and well known, and naturally, few managers will accept the first two until the first power failure event.
But the Business Continuity Manager can do the following: Create a BCP test scenario in which it will be forbidden to communicate via any advanced telco devices, and present the results of the BCP to Management. The results will not be good, so be prepared to take the fire!
Related posts
Example Business Continuity Plan for Brick&Mortar Business
Business Continuity Plan for Blogs
Example Business Continuity Plan For Online Business
Talkback and comments are most welcome
Labels: business continuity, information security, information strategy
Business Continuity Plan for Brick & Mortar Businesses
Just as Business Continuity Plan for Blogs covered the activities for Business Continuity for a very small online business, The BCP is much more important for standard everyday businesses.
As a continuation of the series of Business Continuity Plan examples, we are happy to present a BCP for "Brick and Mortar" businesses. This example BCP is modeled after a mid-range accounting business, and it is easily adapted to any office based business.
The Incidents included in this BCP are
- Fire
- Flood
- Earthquake
- Employee Illness - Epidemic
- Strike blocking transport routes to site of business
You can download the Example Business Continuity Plan for Brick and Mortar business HERE
Related posts
Business Continuity Plan for Blogs
Example Business Continuity Plan For Online Business
Talkback and comments are most welcome
Labels: business continuity, information security, information strategy
Business Continuity Plan for Blogs
After the post on Example Business Continuity Plan For Online Business , there was a mail discussion with a reader about whether it's at all relevant to Blogs. Here I would like to stress a fact. The blog hosting providers have BCP plans, but to recover THEIR services, not all blogs. A lost blog may be collateral damage, since it is after all- free service.
Here is a Business Continuity Plan for Blogs - It is actually the BCP of Shortinfosec, which I am using
SHORTINFOSEC BUSINESS CONTINUITY PLAN BEGINS
Incidents
- Loss of broadband link communication
- Loss of Hosting (Blogspot down)
- Loss of Hosting (Blogspot lost content)
Loss of broadband link communication
Time to wait before using BCP plan - 24 hours
- Find alternative communication alternative choice
- Use dial-up for connectivity - Time to achieve - Immediately
- Use public hot spot at the Mall or Cafe - Time to achieve - 1 hour
- Use GPRS from the iPhone
- Publish the following message, post in the hotlink spot and as a first post:
We are experiencing difficulties in publication of new content. We
will continue with publication within the next 24 hours. In the meantime, please review our Archive
Total time of minimal function recovery - 1 hour after BCP activation
Total time of full recovery - 48 hours after BCP activation
Resources
- Charged Laptop Battery
- Charged iPhone
- Modem within Laptop/PC
- WiFi adapter for Laptop
Loss of Hosting (Blogspot down)
Time to wait before using BCP plan - 6 hours
- Find alternative host and register - Time to achieve - 15 minutes
- Wordpress http://wordpress.com/signup/
- Typepad https://www.typepad.com/t/app/register
- Choose a default template and Browse to see that it works - Time to achieve - 15 minutes
- Login to feedburner and modify the feedburner path to new RSS feed - Time to achieve - 10 minutes
- Publish post with content below - Time to achieve - 10 minutes
Title: Temporarily Moved We are experiencing difficulties in hosting of http://www.shortinfosec.net/. We are
working to resume normal operation. In the meantime, this is our temporary
home.
Please send your comments, questions and reactions to shortinfosec _at_ gmail dot com
- Set-up the temp blog to accept the address http://www.shortinfosec.net/ - Time to achieve - 15 minutes
- Log-On to DNS Hosting and redirect http://www.shortinfosec.net/ to new blog location - Time to achieve - 15 minutes
- If the blogger problem persists more then 24 hours, post new content to new blog.
- Wait for Blogger recovery, and if required restore template and content so the original site is available.
- If blogger is not recovered within 48 hours, post old content as archive on the new site(PDF or backdated posts)
Total time of minimal function recovery - 80 minutes after BCP activation
Total time of full recovery - 48-72 hours after BCP activation
Resources
- Charged Laptop Battery
- Functioning Internet access (refer to incident 1)
- URL and account name/password of DNS Hosting Service - written down on paper, in laptop bag, also saved in laptop
- Current Backup of Blogspot XML Template - Backup Weekly and send as attachment to two web-mail services
- Current Backup of custom Widgets - Backup Weekly and send as attachment to two web-mail services
- Current Backup of Template Images and Icons - Backup Monthly and send as attachment to two web-mail services
- Current Backup of Blogspot Posts - Subscribe to Feedburner to two web-mail services - Immediate Backup
- Current backup of Downloads section - Backup Monthly and send as attachment to two web-mail services
Loss of Hosting (Blogspot lost content)
Time to wait before using BCP plan - 1 hour
- Login to blogspot or re-register if account is lost - Time to achieve - 15 minutes
- Choose a default template and Browse to see that it works - Time to achieve - 15 minutes
- Login to feedburner and modify the feedburner path to new RSS feed (if changed) - Time to achieve - 10 minutes
- Publish post with content below - Time to achieve - 10 minutes
Title: Temporarily Moved We are experiencing difficulties in hosting
of http://www.shortinfosec.net/.
We are working to resume normal operation. In the meantime, this is our
temporary home. Please send your comments, questions and reactions to shortinfosec _at_ gmail dot com
Total time of minimal function recovery - 80 minutes after BCP activation
Total time of full recovery - 24- 48 hours after BCP activation
Resources
- Charged Laptop Battery
- Functioning internet access (refer to incident 1)
- URL and account name/password of DNS Hosting Service - written down on paper, in laptop bag, also saved in laptop
- Current Backup of Blogspot XML Template - Backup Weekly and send as attachment to two web-mail services
- Current Backup of custom Widgets - Backup Weekly and send as attachment to two web-mail services
- Current Backup of Template Images and Icons - Backup Monthly and send as attachment to two web-mail services
- Current Backup of Blogspot Posts - Subscribe to Feedburner to two web-mail services - Immediate Backup
- Current backup of Downloads section - Backup Monthly and send as attachment to two web-mail services
SHORTINFOSEC BUSINESS CONTINUITY PLAN ENDS
Related Posts
Example Business Continuity Plan For Online Business
Talkback and comments are most welcome
Labels: business continuity, information security, information strategy
Example Business Continuity Plan For Online Business
Online based businesses are 100% dependent on IT services, but a lot of them don't even consider the scenario of what will happen in a situation of IT failure of the IT systems hosting their business/service.
Furthermore, a lot of online business owners simply rely that their hosting providers will recover their services -THIS IS WRONG - they will restore the information, but not necessarily functionality!
Here is an analysis and a summary plan for business continuity of an online business:
First, a couple of definitions:
- The goal of business continuity is to resume business operation in a reduced but controlled manner after a disaster which impacts operation - until full recovery is achieved
- The goal of disaster recovery is to resume IT operations after a disaster which impacts IT operation - until full recovery is achieved
Requirement analysis
For large companies, the initial step of planning business continuity is the Business Impact Analysis (BIA), during which the company identifies which processes are critical to the company's survival and need to be restarted immediately, and which can be restored later.
For small online portals/services these have the following processes:
- Service Delivery - actual service running on web and database servers
- Service Development - design, programming, upgrading, bug fixing of the service
- Sales and Marketing - promotion, communication with affiliates
- Accounting and back office operations - self explanatory
- 1 - Process must never stop, immediate restart is needed
- 2 - We can survive without this process for 1 day
- 3 - We can survive without this process for 5 days
- 4 - We can survive without this process for 15 days
- Service Delivery - 1
- Service Development - 3
- Sales and Marketing - 2
- Accounting and back office operations - 3
Example Business Continuity Plan
I. Incident type - Loss Of Application and Database Data due to hosting server errors
Steps to achieve continuity
- Post a temporary information and contact page on alternative free hosting - Time to achieve - 15 minutes
- Redirect DNS to temporary information page - Time to achieve - 10 minutes
- Investigate whether servers are available. If not available, consult the list of alternative hosting providers that can provide hosting for 1 to 3 months - Time to achieve - 1 hour
- Restore latest trusted backup of Database to operational DB server - Time to achieve -1 hour
- Restore latest trusted backup of Web Application to operational Web server - Time to achieve -30 minutes
- Perform functional test of updated infrastructure - Time to achieve - 1 hour
- Redirect DNS to temporary information page - Time to achieve - 10 minutes
Resources to achieve continuity
- Temporary page prepared and available for publishing
- Funds on credit card to purchase hosting for 1 month
- List of alternative hosting providers which can support the application with contact information
- Functional broadband link - alternative, direct access to hosting provider premises and vehicle for transport
- Database Administrator/Developer available for activities
- Web Application Administrator/Developer available for activities
- Trusted and Stable Backup of Database
- Trusted and Stable Backup of Web Application
This example plan is very limited (one process, one incident) but this is the general structure of a continuity plan. But for an online business, in which every second of downtime counts, such a plan may be the difference between a minor incident and loss of business
Talkback and comments are most welcome
Labels: business continuity, information security, information strategy

