SLA
Category: Marketing
What is SLA?
Service Level Agreement (SLA) is a formal contract between the service provider and the client, which defines the level of service that is expected to be provided. In the context of web development, SLA defines the standards for performance, reliability and support of web applications and websites.
SLA serves as a guarantee for the quality of the service and establishes clear expectations from both sides.
Main components of SLA:
- Metrics and indicators Specific, measurable indicators for performance and reliability
- Timeframes Defined deadlines for reaction and problem resolution
- Service level Specific objectives for performance (Service Level Objectives - SLOs)
- Consequences of non-compliance Compensation or reductions for non-compliance with the agreed levels
- Reporting procedures How and when reports on the performance of SLA are provided
Key metrics in web SLA:
Performance
Response Time
- Objective: < 200ms for API requests
- Objective: < 2s for page loading
- Measurement: Percentiles (p95, p99)
Throughput
- Objective: X requests/second
- Objective: Y concurrent users
- Measurement: Requests per second (RPS)
Reliability and availability
Uptime
- Standard: 99.9% (≈ 8.76 hours downtime/year)
- High: 99.99% (≈ 52.6 minutes downtime/year)
- Critical: 99.999% (≈ 5.26 minutes downtime/year)
Recovery time
- MTTR (Mean Time To Repair): Mean time to repair
- MTBF (Mean Time Between Failures): Mean time between failures
- RTO (Recovery Time Objective): Maximum allowable recovery time
Support and maintenance
Incident response time
- Critical incidents: 15-30 minutes
- High priority: 1-2 hours
- Medium priority: 4-8 hours
- Low priority: 24 hours
Resolution time
- P1 (Critical): 2-4 hours
- P2 (High): 8-24 hours
- P3 (Medium): 48 hours
- P4 (Low): 5 working days
Quality of service
Errors and successful requests
- Objective for successful requests: > 99.9%
- Error rate: < 0.1%
- HTTP status codes: Monitoring of 4xx/5xx errors
Security
- Time to apply patches: X days after publication
- Pen tests: X times annually
- Security audits: X times annually
Types of SLA in web development:
SLA for applications (Application SLA)
Focuses on the performance and functionality of a specific web application
- Page loading time
- API response times
- User transaction success rates
SLA for infrastructure (Infrastructure SLA)
Covers the main infrastructure and hosting services
- Server accessibility
- Network performance
- Storage accessibility
SLA for support (Support SLA)
Defines the level of technical support and responsiveness
- Ticket response time
- Problem resolution
- Support outside working hours
SLA for development (Development SLA)
Determines deadlines and quality for new feature development
- Delivery deadlines
- Code quality standards
- Testing requirements
Process of creating SLA:
- 1
Requirement analysis
Understanding business needs and technical requirements
- 2
Metric definition
Choosing relevant, measurable indicators
- 3
Setting realistic objectives
Based on current capabilities and resources
- 4
Documentation
Creating a clear and comprehensive document
- 5
Implementation of monitoring
Setting up systems for tracking metrics
- 6
Regular reviews
Quarterly or annual evaluations and updates
Tools for monitoring SLA:
Performance monitoring
- New Relic - Application performance monitoring
- Datadog - Cloud-scale monitoring
- Dynatrace - AI-powered observability
- Pingdom - Uptime and performance monitoring
Logs and errors
- Sentry - Error tracking and performance monitoring
- LogRocket - Session replay and product analytics
- ELK Stack - Elasticsearch, Logstash, Kibana
- Splunk - Platform for searching, monitoring, and analyzing
Uptime monitoring
- UptimeRobot - Website monitoring
- StatusCake - Uptime monitoring
- Site24x7 - All-in-one monitoring
- Amazon CloudWatch - AWS monitoring
Consequences of non-compliance with SLA:
Financial compensation
- Service credits - Reductions on future bills
- Partial refunds - Return of part of the paid amount
- Penalty fees - Penalties for non-compliance
Corrective actions
- Action plans - Improvement plans
- Root cause analysis - Analysis of the causes of the problems
- Process improvements - Improvement of processes
Contractual consequences
- Contract termination - Right to terminate the contract
- Renegotiation - Reconsideration of the terms
- Legal actions - Legal actions in extreme cases
Good practices for SLA:
- Be realistic - Set achievable objectives
- Use clear language - Avoid technical jargon
- Define exceptions - Planned maintenance, force majeure
- Include all interested parties - Development, operations, business
- Regular reviews - Adapt to changing needs
- Automate reporting - Real-time monitoring and reporting
- Include escalation procedures - Clear steps for problems
Real examples of SLA clauses:
Uptime guarantee
"Service availability: 99.9% monthly If the availability is less than 99.9% - 10% credit from the monthly fee If the availability is less than 99% - 25% credit from the monthly fee If the availability is less than 95% - 50% credit from the monthly fee"Support response
"Time for initial response: • P1 (Critical) - 30 minutes • P2 (High) - 2 hours • P3 (Medium) - 4 hours • P4 (Low) - 8 hours"Performance objectives
"Application performance: • API response time p95 < 200ms • Page load time < 2 seconds • Error rate < 0.1% • Throughput > 1000 requests/second"