Friday, August 29, 2008

Essential Management Semantics - Responsible vs Accountable

4:28 PM Posted by Bozidar Spirovski


I've had a discussion at the office about who is responsible for a certain activity. And as expected, the junior colleagues got into a discussion of who is more and who is less responsible for the activity. The Information Technology Infrastructure Library (ITIL) defines two distinct roles: 
  • Responsible and
  • Accountable

If you open Websters dictionary (www.websters.com) and look up the adjective "responsible" you get the following description: answerable or accountable, as for something within one's power, control, or management
If you do the same for "accountable" here is what you get: subject to the obligation to report, explain, or justify something; responsible; answerable.

It is a common sense to assume that "accountable" and "responsible" are synonyms. But both in Management and IT their meaning differs slightly and that makes all the difference:

Accountable is the PERSON (singular) who answers for the entire set of results in a performed activity or process.
Responsible are the PERSON or PERSONS (singular or plural) who answers for the quality of a subset of tasks performed within an activity or a process.

So, there can be many responsible persons for the proper performance of a process, but should ALWAYS be only ONE person accountable for the entire process. 

Bonus Question
Q: When something does not get done right, who gets blamed. The Accountable or the Responsible:
A: The Accountable has the task to identify which Responsible is failing his job and take measures to fix the problem. In the long run however, if the problem is not fixed and the entire process fails, the Accountable will be called to answer.


Monday, July 14, 2008

5 SLA Nonsense Examples - Always Read the Fine Print

11:32 AM Posted by Bozidar Spirovski ,
I've had the opportunity to review several poor Service Level Agreement (SLA) contracts, which include clauses shielding the provider as if they are an endangered species. These clauses are usually masked under "general clauses" or fancy lingo to possibly go un-noticed.

Here are several examples of texts that a customer should watch out for in a Service Level Agreement:

1. The data protection trick

  • Sample clause: The provider will protect and not reveal any received or collected information about the buyer, unless it's required by legal authorities during a formal investigation or in case of protection of provider's interests
  • Analysis: Although this particular clause may vary from country to country (legal system differences), there is NO LOGICAL ARGUMENT for anyone to reveal your information for protection of their interest.
2. The no responsibility trick

  • Sample clause: The customer will hold harmless and indemnify the provider from all errors, damage or data loss, loss of business, delays in processing or any other problems resulting from usage or inability to use this service. The provider is not responsible for any damage to hardware or systems during the installation or maintenance of the service
  • Analysis: While a relatively standard clause, always have your legal team AND your technical team review and dissect this clause. In the example, the bold sentence wording actually makes the provider not responsible for any screw-ups during installation, even if their technician placed a 110V line in a 300V outlet, or used a drill to tighten a screw of the serial port.
3. The automatic consent trick

  • Sample clause: The provider reserves the right to modify the conditions of service, and the modifications will be considered agreed to in case of service contract renewal.
  • Analysis: An SLA can be written to refer to certain general conditions related to the service. A provider can modify these formal conditions without proper communication to the customer. Since most contract renewals are automatic, this can suddenly put the customer in a very bad position even if the initial SLA contract was good. Always insist that all agreed changes to service must be signed off in a dedicated document.
4. The service quality trick

  • Sample Clause: Our service has a service quality of XX% (delay, latency, bandwidth)...In case of unforeseen circumstances, this quality may be reduced.
  • Analysis: Nobody signs and pays for an SLA to guarantee services in ideal circumstances. The term unforeseen circumstances is simply get out of jail free card. For instance, even rain can be an unforeseen circumstance for a poorly protected wiring cabinet, but it's not something that the customer should worry about. If special circumstances need to be addressed, they need to be properly itemized, without room for different interpretation.
5. The business hours trick

  • Sample Clause: All service activities are performed during the 8AM to 6PM. If the customer requires intervention outside of business hours, such intervention will be charged according to regular pricing policy.
  • Analysis: This clause may have a place in a standard service contract. When a custom SLA contract is signed, the conditions and prices are levels above the standard contract. So, if one is paying for a level defined in the SLA, the price MUST COVER ALL POSSIBLE SCENARIOS.
Related posts
9 Things to watch out for in an SLA
The SLA Lesson: software bug blues

Friday, April 11, 2008

The SLA Lesson: software bug blues

5:27 PM Posted by Bozidar Spirovski , , ,
I have been hugely busy in the past weeks with several projects, so the blogging got stuck... I will try to avoid this in the future. Now back to my latest experience

Part of every Information Security Management System is the incident management process. It is as process in which the company identifies a problem which is occurring or has occurred, and performs steps to contain it, minimize the impact, identify the root cause and take measures to prevent the incident from recurring.

The incident in question is a dreaded application blocking - a company of 1000 employees uses a custom made fully integrated CRM/ERP system, which exhibited complete or partial non-responsiveness of several minutes for a period of nearly two hours. This situation was identified at several departments, while the rest of the company is functioning as usual.

As soon as the call came in, the incident response team was formed and the problem was analyzed. After 15 minutes, the problem was identified. Accounting has started a program which should run once a week and affects the billing information of most Key Customers. This program was started at it's usual time, with usual parameters. The problem was rectified by stopping the processing and postponing it for after business-hours

Upon further investigation of the incident it was identified that the problem has occurred before, at regular intervals, but was never reported as an incident. The situation has been handled by the IT department, who communicated the problem to the software company which created the software as a bug.

When i requested a status update from IT on this bug report, i received a shocking information: The software company has closed the bug report with a status of DENIED

So I called the release manager at the software company, and i got an even bigger shock: He explained that the software company decided to deny this bug report due to overwhelming change requests and bug reports from our company. In his words, this bug was a mere nuisance since it blocked part of the software for about an hour once a week - just run it during lunch!

At this point, the incident was no longer just an incident, it became a support contract issue, so i reported the situation to management and asked for their involvement.

This incident is a very good lesson in the different priorities and focus of the parties involved:
For a user of the system any problem can be a show stopper.
For the manufacturer of the system, the same problem can be played down to an importance of an itch. There can be many reasons for such a difference in opinion, but here are a few:
  1. There are insufficient human resources to address the issue
  2. There are profitable change requests or projects to to address, so this element is merely postponed since the software company will not see a profit from engaging their resources into correcting this problem.
  3. The problem is caused by a design flaw in the system, that is either very difficult or impossible to rectify in a reasonable time and within reasonable budget
The only way to increase the value of the users' incident to the manufacturer is through applying proper controls and penalties in the support contract. That is why security incidents history and results should also be used as a very valid input into the preparation and negotiation of the SLA

Thursday, April 3, 2008

9 Things to watch out for in an SLA

9:24 AM Posted by Bozidar Spirovski ,
I wasn't planning to touch the issue of the Service Level Agreement (SLA) for some time, but it appears that the incident report (Link to Blog Post) has stirred attention that merit a post on the subject.


As i already mentioned, it is a very frequent occurrence that the SLA is just an afterthought when preparing a contract, and that the buyer is usually waiting for the supplier to produce the SLA agreement. Of course, this leads to the situation in which the SLA actually protects the supplier, not the buyer.So here are the items one must do to achieve at least a reasonable if not good SLA
  1. Remember that any SLA is open for negotiation, but only in initial purchase- although the supplier may propose a very rigid position on the SLA (especially common in large companies), the SLA is part of the sales process. Standing by a rigid position should immediately raise red flags that the proposed "unchangeable" SLA is protecting the supplier, not the buyer. So the best opportunity to negotiate it is during the initial RFP negotiations. Once the product/service is sold and goes into production use, the buyer has lost all power of negotiation. So be very wary to agree that you will negotiate the SLA after delivery, end of warranty or some similar wording.


  2. Define Availability as you would expect it - availability is usually calculated as a percentage of time the product under SLA is up and running. Usual numbers vary from 98% to 99.999% of the time. Now, let's examine the "time" factor in the formula. Upon first reading, a person will usually interpret that 98% will be 98% of any time measure, whether it be hour, day, month, year, century...But let's observe the following table:
  3. In a SLA contract specifying a percentage of availability per time period, the total downtime is accumulated over the entire time period. Furthermore, if there is no time period specified in the availability percentage, the default time period is the period of the validity of the contract - which is very often 1 year or more. So, if you sign a yearly contract with an SLA of 99%, it doesn't guarantee you that you will have at most 10 minutes of downtime per week. It means that you won't have more then 3.65 days (or 86 hours) of downtime over the entire year, which means that you can have full 10 8-hour workdays WITHOUT ANY SERVICE in that year. If you take the same 99%, but insist on applying it on a weekly level, you suddenly get much better odds - now, you can't have more then 1.68 hours of downtime in any of the days. So take a day of meetings in your company to define what is your maximum possible downtime per day, and use the above matrix to find the best option for you.


  4. Always keep in mind the distinction between reaction time and correction time - During the negotiation of an SLA It is usual to have very tense negotiations to achieve a good "response time". But this umbrella term is an excellent umbrella - for the supplier! Response time is defined as the time passing between formal logging of problem and until a representative from the supplier logs a response (sends a reply on e-mail, makes a phone call or arrives on-site). So when defining the response times, ALWAYS define two or three different times: reaction time - which is equivalent to response time, workaround time - the time in which it is expected to achieve a temporary solution which will alleviate the problem and correction time - the time in which it is expected that a final solution will be found.


  5. Make precise definitions of problem severity levels and tie them in with reaction and correction times - as in my previous post, the severity of the problem can be viewed differently by the buyer and supplier. So, define a clear matrix of severity levels, and have a clause which states that if severity level differently, the view of the buyer prevails. A sample of severity levels are presented in the table below:


  6. Define response time for all levels of severity - naturally, the buyer should expect faster reaction and correction for more severe problems. When defining the severity levels, in each one include at least the expected reaction time and workaround time.


  7. Define channels of communication and escalation - At first glance a very simple thing, but one that is very often a reason for not being able to dispute the SLA contract. For the problem to be considered properly reported, the supplier will expect a report from an authorized person to specific persons via email, fax number or phone. Any deviation from the agreed upon process is an excellent reason for not meeting SLA parameters on the grounds of "not being informed". So always have at least three authorized persons for problem reporting, and modify internal procedures so these persons are the first to be informed of a problem. The same is true for the escalation of problems to higher levels, should the problem persist.


  8. Define the conditions under which the SLA criteria are applied to a problem - It is not uncommon in SLA agreements to see that the SLA criteria start to apply from the time of problem reporting from the buyer to the supplier. This is an element usually insisted upon by the supplier, since it offloads the burden of monitoring and reporting on the buyer. By the time the problem is reported, the actual problem is already existent for several minutes up to half an hour. Even more so, there are products for which the supplier cannot perform the monitoring and cannot conclude that a problem is occurring. So although this point will not be applied in the contract, adjust internal procedures so that the authorized persons of the buyer IMMEDIATELY report the problem to the supplier. Internal metrics can be even applied to this process, to identify internal lags in communication.


  9. Define measurements and reporting - An SLA is useless if you can't measure and document each problem length properly. So the buyer should keep track of problems, with info on the severity, duration of problem, reaction time and correction time, with all relevant e-mails and messages exchanged. Tracking can be achieved with something as simple as an excel sheet, all it requires is regular update.


  10. Tie in penalties and contract back-out options - this is the actual big stick in the SLA. Breach of SLA parameters should be tied to serious penalties and possibility for contract termination. When defining penalties, always strive to define them in monetary value payable immediately upon breach of SLA. Also, you should try negotiate a penalty that has an exponential growth with each further hour of SLA breach. Do not accept a penalty to be compensated with other goods or services from the same supplier, since the supplier will value such services at sales price in the refund, while their internal costs for such services are significantly lower, thus reducing the actual loss of the supplier in SLA breach

Related posts
The SLA Lesson: software bug blues