Regardless of the size and complexity of infrastructure you are running, it is always very important to have an established process of change management to your resources. If you fail to establish or enforce such process, errors will inevitably start to creep but will largely go on unnoticed, until something bad happens.
A small company I visited yesterday is running several high profile paid content web portals. While discussing a research i did for them, the CTO changed the subject: I need a fast way to know when my staff made changes to my servers. I must admit that this question caught me by surprise, since I had no insight into their operations. After 15 minutes of conversation on the topic, i concluded that they don't have a process that controls which changes to the infrastructure are made when and under whose authority. This led to a situation where one developer implemented a new version of the software, while another made changes to access permissions. The result - a hacker made copies of their content and published it on IRC channels, resulting in direct monetary loss.
This is an excellent example that there is no company small enough to justify not having a strict change management process.
Here are the golden rules of establishing a change management process
- Make a clear distinction between systems
Production environment is the place where you run your business. Nothing except approved versions of approved software should be running there.
Test environment is the place where you check your next versions. In configuration and set-up it should be as close to production as possible.
Development environment is the place where you develop new products. It is to be expected to be riddled with all kinds of code, and some testing will be done on it, but don't confuse it with a test environment
- Know exactly what is running in production - Establish a documented process of applying changes and versions to production. Make regular checks that this process is being used in daily operations
- Develop changes based on current production environment - Unless implementing a entirely new solution, maintain a protocol by which all further development takes into account the setting and versions on production. This way, your next version will always start from a tested core code, where you know or have weeded out most of the bugs
- Always run a full battery of test scenarios - When testing new versions, write test scenarios, and test the full software, not just the new features. It is very common for working functions to be broken by a seemingly unrelated change.
- Never institute several changes simultaneously - If something goes wrong, no one will know what change is problematic, and everyone will blame everyone else. When making several changes make them one by one. Between each change, reserve as much time as possible to confirm that all is working before making the next change.
- Always have a fallback plan when placing a new version in production environment - It is not WILL something go wrong, it is WHEN will something go wrong. The implementation team should have a written fallback plan when applying changes to production. This plan should include a responsibility for a human to oversee production operation at least in the first 12 hours after the change is made, to be able to react on first sign of trouble.
- Assign responsibility to a person - Group responsibility doesn't work. Name one person Change Manager or Coordinator. It will be his responsibility to oversee testing results and changes on a daily basis and coordinate multiple changes.
- Always insist on regular reporting - The CIO/CTO should receive regular monthly one page report on what changes were moved to production, test conclusions and next planned versions. Although not immediately useful to the manager, the responsibility for regular reporting will help to keep the process alive and used by the organization
Talkback and comments are most welcome