Maintaining quality in outsourcing telco services
More and more IT services are being outsourced. And as telco services are now easily integrated and transported over IP protocols, the outsourcing is being well established with telco.
But the issue with telco services is that quality in telco is very difficult to properly define. This is because there are parameters that are difficult to track – sound quality, response of system to tone-dial menu selection of an IVR, unexpected intermittent interruptions of voice communication, temporarily unavailable service.
And when part of the telco service is outsourced, it becomes even more difficult to manage the quality of such services.
Here are some elements that will affect the quality of outsourced telco services:
- Oversubscription to outsourcing service – the service may be of a variable quality, with off and on periods when service is poor and then it’s great. This is usually connected to oversubscription of the outsourcing service, and when their services are overloaded, the customer facing service is of poor quality.
- Availability of the oursourcing servers – simple and straightforward, power outages, server outages, cooling outages all create failures that interrupt service. Even if there are secondary servers, the switchover will fail all active connections
- Connectivity to outsourcing service - most outsourcing services are far and away, most often in asia. So internet links will be the primary connectivity media to such outsourcing services. But the internet as a medium has a lot of possible issues and failures of connectivity paths are not that rare.
When the outsourcing service is part of your call management, things get very interesting. Services that are part of the call management process that are easily outsourced are ringback tone, voice mail, autoanswer etc.
How to solve this issue of quality when outsourcing? There is no magic bullet, but here are some experiences and pointers:
- Ofcourse, you will create the standard contract with availability, packet loss and jitter criteria. (see related posts)
- You can also include call disconnects or failure to connect.
- It would be very good to try to connect this to customer complaint number, but the outsourcing service will be very reluctant to accept a quality of service condition is connected to a very subjective criteria that cannot be measured and confirmed by both parties independently.
- Create a criteria of complaint to outsourcing service - for example, if the telco customer detects issues that are so large that they need to send a complaint to their outsourcing service more then 4 times every quarter, that would be a basis for a contract review. This clause is very wise to include especially in the first year of use of the outsourcing service, when you are still learning their weak points
Talkback and comments are most welcome
Related posts
Telco SLA - parameters and penalties
Is the Phone Working? - Alternative Telephony SLA
5 SLA Nonsense Examples - Always Read the Fine Print
Telco SLA - parameters and penalties
Communication links provided by Telco providers are critical to most businesses. And as any network admin will tell you, these links tend to have outages, ranging from small interruptions up to massive breakdowns that can last for days.
When such interruptions occur, businesses suffer, but unless the provider has serious contractual obligations, there is little effort on their side to improve service or correct issues.
That is why businesses need a good Service Level Agreement (SLA). Usually, the preparation of the SLA is dreaded by most, since it is full of numbers and parameters on which the client must decide what is acceptable, and whose values may be difficult to measure.SLA Parameters
A good SLA is not necessarily loaded with a lot of numbers. You need to work with 2-3 parameters which are important to you. Here are the most frequent SLA parameters, with their acceptable values:
- Availability - more then 99% for internet, more then 99.5% for corporate data links
- Packet Loss - less then 0.4% for internet, less then 0.2% for corporate data links
- Jitter - less then 15ms for internet, less then 5ms for corporate data links
And you need penalties which will hurt the provider. Penalties are the big stick in the SLA.
Here are the penalties that you want:
- small breach of SLA - 25% to 33% of monthly fee
- large breach of SLA - 50% to 100% of monthly fee

Be aware that no provider will create an SLA that will eat much of it's profits. The commited provider can be identified by the type of Service Level Agreement (SLA) that it's prepared to sign without special negotiations.
Here are three different levels of SLA's - not so much by the metrics and parameters, but quite different in terms of penalties
- Verizon is offering a very basic SLA, with compensation of the daily charge for each day of SLA breach - http://www.verizonbusiness.com/terms/latam/co/sla/
- BT is accepting a more serious approach - a penaltyof a daily charge for each hour of SLA breach, but with a limit of maximum 10 days of charge in penalty http://business.bt.com/assets/pdf/BTnet%20Service%20Level%20Agreement.pdf
- Sprint is including some really hard penalties in their SLA, including a 100% of monthly charge in penalties for some parameters. http://www.sprint.com/business/resources/mpls_vpn.pdf
Talkback and comments are most welcome
Related posts
9 Things to watch out for in an SLA
The SLA Lesson: software bug blues
5 SLA Nonsense Examples - Always Read the Fine Print
Is the Phone Working? - Alternative Telephony SLA
Telephony costs are one of the main targets of cost cutting in many large companies. In this effort, the companies are turning to alternative voice providers, who offer much cheaper calls and more flexible services. But, these new operators are using new technologies and are relatively new on the market, so the buyer should approach the alternative telephony service with care and apply proper Service Level Agreement.
What we are used to?
In a traditional telephony, the voice reliability is taken for granted, and all equipment is designed to offer very high availability. Also, capacity is not an issue, since each incoming circuit to a switch is dedicated, and the switching capacity of the Telco Switch is calculated via well known formulae (Erlang models) to provide switching of all initiated calls.
PSTN availability was measured at 99.99% (maximum of 4 minute outage per month, or a total of 52 minutes outage per year!) in 1993 and that number is closing to 99.994%. Compared to this, classical IP data services are struggling with passing the "two point five nines" (99.5%) which is equivalent to 3.6 hours outage per month or nearly 2 days per year.
For all medium to large businesses (especially in operating a retail business) telephony is a "default" service, one that must ALWAYS work, one that is really taken for granted.
The potential challenges with an alternative voice provider
When a company decides to use the services of an alternative telephony provider several issues may appear. The alternative telephony provider may bypass the ILEC operator (Incumbent Local Exchange Carrier) to minimize costs, and quite often, they may arrive at your premises via a data link to attach to the company's PBX. Once we walk into the realm of data transfer, things get much different:
- The data link is terminated on a lower reliability active equipment (usually router or L3 switch) - To mimimize costs, this device will not be of a too high class, and it's hardware reliability will be around 98-99%
- The data link can be prone to faults on a physical level - alternative telephony operators are not too big on infrastructure protection and want fast deployment, so it can happen that the operator's cable is strung on power lines, placed in central heating ducts under the city, or in extreme examples, are even illegally dug-in in soft ground areas (parks, recreation tracks, green patches) where they are unmarked and easily fall victims to any other construction or renovation activity.
- Data links are by default based on best effort technologies - so IP data packet drops, retransmissions and delays can occur.
All this translates to a whole new ballgame in terms of controlling the services offered by your alternative voice service provider.
Establishing proper criteria for service quality
So in order to properly manage the alternativ voice services, one must define what criteria should be measured.
- Keep the good old data SLA - this is to control the overall data link quality, which is easiest to measure
- Establish measurement on Established, Failed and Dropped calls - via the router infrastructure connecting you to the alternative telephony provider. This measurement will be enabled through vendor specific router functions, most often through syslog event analysis.
- Define the guaranteed volume of simultaneous calls that the provider will deliver - measure the delivered volume of calls in terms of comparing the values of established, failed and dropped calls from point 2.
- Define and Apply penalties both on overall link quality (point 1) since it will affect all calls, and on volume of realised calls (points 2 and 3) since they relate to actual ability to use the service as contracted.
9 Things to watch out for in an SLA
5 SLA Nonsense Examples - Always Read the Fine Print
Talkback and comments are most welcome
5 SLA Nonsense Examples - Always Read the Fine Print
I've had the opportunity to review several poor Service Level Agreement (SLA) contracts, which include clauses shielding the provider as if they are an endangered species. These clauses are usually masked under "general clauses" or fancy legal lingo to possibly go un-noticed.
Here are several examples of texts that a customer should watch out for in a Service Level Agreement:
1. The data protection trick
- Sample clause: The provider will protect and not reveal any received or collected information about the buyer, unless it's required by legal authorities during a formal investigation or in case of protection of provider's interests
- Analysis: Although this particular clause may vary from country to country (legal system differences), there is NO LOGICAL ARGUMENT for anyone to reveal your information for protection of their interest.
- Sample clause: The customer will hold harmless and indemnify the provider from all errors, damage or data loss, loss of business, delays in processing or any other problems resulting from usage or inability to use this service. The provider is not responsible for any damage to hardware or systems during the installation or maintenance of the service
- Analysis: While a relatively standard clause, always have your legal team AND your technical team review and dissect this clause. In the example, the bold sentence wording actually makes the provider not responsible for any screw-ups during installation, even if their technician placed a 110V line in a 300V outlet, or used a drill to tighten a screw of the serial port.
- Sample clause: The provider reserves the right to modify the conditions of service, and the modifications will be considered agreed to in case of service contract renewal.
- Analysis: An SLA can be written to refer to certain general conditions related to the service. A provider can modify these formal conditions without proper communication to the customer. Since most contract renewals are automatic, this can suddenly put the customer in a very bad position even if the initial SLA contract was good. Always insist that all agreed changes to service must be signed off in a dedicated document.
- Sample Clause: Our service has a service quality of XX% (delay, latency, bandwidth)...In case of unforseen circumstances, this quality may be reduced.
- Analysis: Nobody signs and pays for an SLA to guarantee services in ideal circumstances. The term unforeseen circumstances is simply get out of jail free card. For instance, even rain can be an unforseen circumstance for a poorly protected wiring cabinet, but it's not something that the customer should worry about. If special circumstances need to be addressed, they need to be properly itemized, without room for different interpretation.
- Sample Clause: All service activities are performed during the 8AM to 6PM. If the customer requires intervention outside of business hours, such intervention will be charged according to regular pricing policy.
- Analysis: This clause may have a place in a standard contract. When an SLA contract is signed, it's levels are above the standard contract, and are appropriately priced. So, if one is paying for a level defined in the SLA, the price MUST COVER ALL POSSIBLE SCENARIOS.
Related posts
9 Things to watch out for in an SLA
The SLA Lesson: software bug blues
Talkback and comments are most welcome
9 Things to watch out for in an SLA
I wasn't planning to touch the issue of the Service Level Agreement (SLA) for some time, but it appears that the incident report (Link to Blog Post) has stirred attention that merit a post on the subject.
As i already mentioned, it is a very frequent occurrence that the SLA is just an afterthought when preparing a contract, and that the buyer is usually waiting for the supplier to produce the SLA agreement. Of course, this leads to the situation in which the SLA actually protects the supplier, not the buyer.So here are the items one must do to achieve at least a reasonable if not good SLA
- Remember that any SLA is open for negotiation, but only in initial purchase- although the supplier may propose a very rigid position on the SLA (especially common in large companies), the SLA is part of the sales process. Standing by a rigid position should immediately raise red flags that the proposed "unchangeable" SLA is protecting the supplier, not the buyer. So the best opportunity to negotiate it is during the initial RFP negotiations. Once the product/service is sold and goes into production use, the buyer has lost all power of negotiation. So be very wary to agree that you will negotiate the SLA after delivery, end of warranty or some similar wording.
- Define Availability as you would expect it - availability is usually calculated as a percentage of time the product under SLA is up and running. Usual numbers vary from 98% to 99.999% of the time. Now, let's examine the "time" factor in the formula. Upon first reading, a person will usually interpret that 98% will be 98% of any time measure, whether it be hour, day, month, year, century...But let's observe the following table:
- Always keep in mind the distinction between reaction time and correction time - During the negotiation of an SLA It is usual to have very tense negotiations to achieve a good "response time". But this umbrella term is an excellent umbrella - for the supplier! Response time is defined as the time passing between formal logging of problem and until a representative from the supplier logs a response (sends a reply on e-mail, makes a phone call or arrives on-site). So when defining the response times, ALWAYS define two or three different times: reaction time - which is equivalent to response time, workaround time - the time in which it is expected to achieve a temporary solution which will alleviate the problem and correction time - the time in which it is expected that a final solution will be found.
- Make precise definitions of problem severity levels and tie them in with reaction and correction times - as in my previous post, the severity of the problem can be viewed differently by the buyer and supplier. So, define a clear matrix of severity levels, and have a clause which states that if severity level differently, the view of the buyer prevails. A sample of severity levels are presented in the table below:
- Define response time for all levels of severity - naturally, the buyer should expect faster reaction and correction for more severe problems. When defining the severity levels, in each one include at least the expected reaction time and workaround time.
- Define channels of communication and escalation - At first glance a very simple thing, but one that is very often a reason for not being able to dispute the SLA contract. For the problem to be considered properly reported, the supplier will expect a report from an authorized person to specific persons via email, fax number or phone. Any deviation from the agreed upon process is an excellent reason for not meeting SLA parameters on the grounds of "not being informed". So always have at least three authorized persons for problem reporting, and modify internal procedures so these persons are the first to be informed of a problem. The same is true for the escalation of problems to higher levels, should the problem persist.
- Define the conditions under which the SLA criteria are applied to a problem - It is not uncommon in SLA agreements to see that the SLA criteria start to apply from the time of problem reporting from the buyer to the supplier. This is an element usually insisted upon by the supplier, since it offloads the burden of monitoring and reporting on the buyer. By the time the problem is reported, the actual problem is already existent for several minutes up to half an hour. Even more so, there are products for which the supplier cannot perform the monitoring and cannot conclude that a problem is occurring. So although this point will not be applied in the contract, adjust internal procedures so that the authorized persons of the buyer IMMEDIATELY report the problem to the supplier. Internal metrics can be even applied to this process, to identify internal lags in communication.
- Define measurements and reporting - An SLA is useless if you can't measure and document each problem length properly. So the buyer should keep track of problems, with info on the severity, duration of problem, reaction time and correction time, with all relevant e-mails and messages exchanged. Tracking can be achieved with something as simple as an excel sheet, all it requires is regular update.
- Tie in penalties and contract back-out options - this is the actual big stick in the SLA. Breach of SLA parameters should be tied to serious penalties and possibility for contract termination. When defining penalties, always strive to define them in monetary value payable immediately upon breach of SLA. Also, you should try negotiate a penalty that has an exponential growth with each further hour of SLA breach. Do not accept a penalty to be compensated with other goods or services from the same supplier, since the supplier will value such services at sales price in the refund, while their internal costs for such services are significantly lower, thus reducing the actual loss of the supplier in SLA breach
Related posts
The SLA Lesson: software bug blues
The SLA Lesson: software bug blues
I have been hugely busy in the past weeks with several projects, so the blogging got stuck... I Will try to avoid this in the future. Now back to my latest experience
Part of every Information Security Management System is the incident management process. It is as process in which the company identifies a problem which is occurring or has ocurred, and performs steps to contain it, minimize the impact, identify the root cause and take measures to prevent the incident from recurring.
The incident in question is a dreaded application blocking - a company of 1000 employees uses a custom made fully integrated CRM/ERP system, which exibited complete or partial non-responsiveness of several minutes for a period of nearly two hours. This situation was identified at several departments, while the rest of the company is functioning as usual.
As soon as the call came in, the incident response team was formed and the problem was analyzed. After 15 minutes, the problem was identified. Accounting has started a program which should run once a week and affects the billing information of most Key Customers. This program was started at it's usual time, with usual parameters. The problem was rectified by stopping the processing and postponing it for after business-hours
Upon further investigation of the incident it was identified that the problem has occured before, at regular intervals, but was never reported as an incident. The situation has been handled by the IT department, who communicated the problem to the software company which created the software as a bug.
When i requested a status update from IT on this bug report, i received a shocking information: The software company has closed the bug report with a status of DENIED
So I called the release manager at the software company, and i got an even bigger shock: He explained that the software company decided to deny this bug report due to overwhelming change requests and bug reports from our company. In his words, this bug was a mere nuisance since it blocked part of the software for about an hour once a week - just run it during lunch!
At this point, the incident was no longer just an incident, it became a support contract issue, so i reported the situation to management and recommended an intervention from their side.
This incident is a very good lesson in the different priorities and focus of the parties involved:
For a user of the system any problem can be a show stopper.
For the manufacturer of the system, the same problem can be played down to an importance of an itch. There can be many reasons for such a difference in opinion, but here are a few:
- There are insufficient human resources to address the issue
- There are profitable change requests or projects to to address, so this element is merely postponed since the software company will not see a profit from engaging their resources into correcting this problem.
- The problem is caused by a design flaw in the system, that is either very difficult or impossible to rectify in a reasonable time and within reasonable budget
The only way to increase the value of the users' incident to the manufacturer is through applying proper controls and penalties in the support contract. That is why security incidents history and results should also be used as a very valid input into the preparation and negotiation of the SLA
Labels: Incident Management, information security, information strategy, SLA

