A Service Level Agreement (or SLA) is the minimal level of service agreed on between a company and their customer in their contract. The level of service could include uptime, performance guarantees, customer service requirements, data security, and incident response time requirements. The agreement also includes what happens if the level of service isn’t met - whether that’s refunds, credits or other penalties.
Why use SLAs?
SLAs are mostly used in Enterprise contracts where the service provided is business critical. For companies like payment platforms or web hosting services, customers are taking a big risk in trusting a third party to maintain availability and performance. While providing historical, transparent data on past performance does help build trust in availability, SLAs are critical in ensuring customer confidence in your service.
SLAs provide a clear definition of expectations. Both parties agreeing to what an “acceptable level” of service is can be difficult, so being precise with contract SLAs helps prevent future misunderstandings.
What you need to include in an SLA
Acceptable Service Levels
It may seem obvious, but you need to state what service level you agree to provide. You’ll also include definitions of what “available” means, what an “outage” is, and what “maintenance” includes.
- Availability - usually stated as a series of 9’s (ie. 99.99%), it’s the amount of time your service is available over a time period. Check out Uptime.is to translate SLA availability percentages into seconds and minutes.
- Performance - usually stated as a ping response time. How responsive is your service? If service degrades past a certain speed, customers may not be able to use your service properly.
- Customer Support - usually stated in response and resolutions times. Customers want to know their questions will be answered quickly and effectively. A customer support SLA prevents signing up to a service and then being hung out to dry when you have questions.
- Security - what lengths do you go to when protecting customer data? If a breach occurs, an SLA can help explain what protection needed to be in place (ie 2 factor auth, security clearance for engineers, etc)
When something goes wrong, how quickly will your team acknowledge the issue and resolve it. Magneto does a great job of breaking down response time by severity level of the issue. They also provide customers with the exact steps they will take to resolve problems.
What happens if an SLA isn’t met? The contract should also include any penalties or credits as a result of a missed SLA. This can be broken down by level of service or amount of downtime. PagerDuty’s penalty agreement below is an excellent comprehensive example.
If a penalty wasn’t included in the original SLA, the customer may be able to terminate the agreement penalty-free due to breach of contract.
Exclusions and Exceptions
Make sure to include any exceptions to the SLA, otherwise your customer might come calling for compensation.
If you are using SLAs, you need to be able to measure your performance. If you can’t show your uptime, response time or performance reliably and quickly - how will customers be able to hold you to guarantees?
Using a monitoring tool like New Relic or Pingdom will not only help prevent long outages by giving you a heads-up, they’ll also help with the clean-up afterwards. How long was the service unaccessible? How slow was the dashboard? Without reliable tracking tools, you’re left comparing your word against your customers. Contractually, that’s not a great position to be in!
After an Outage
Proactively contacting customers who’ve had SLAs breached will earn you goodwill in a tough situation. Often customers won’t even be as upset as you expect! (Check out our experience with outage communication here).
SLAs are a legal agreement, so you need to uphold them. Entering into an SLA is a potential risk, but perhaps a necessary one.