How to determine a server's bottleneck, quickly fix the bottleneck, improve server performance, and prevent regressions.
Overview
This guide shows you how to fix an overloaded server in 4 steps:
- Assess: Determine the server's bottleneck.
- Stabilize: Implement quick fixes to mitigate impact.
- Improve: Augment and optimize server capabilities.
- Monitor: Use automated tools to help prevent future issues.
Assess
When traffic overloads a server, one or more of the following can become a bottleneck: CPU, network, memory, or disk I/O. Identifying which of these is the bottleneck makes it possible to focus efforts on the most impactful mitigations.
- CPU: CPU usage that is consistently over 80% should be investigated and fixed. Server performance often degrades once CPU usage reaches ~80-90%, and becomes more pronounced as usuage gets closer to 100%. The CPU utilization of serving a single request is negligible, but doing this at the scale encountered during traffic spikes can sometimes overwhelm a server. Offloading serving to other infrastructure, reducing expensive operations, and limiting the quantity of requests will reduce CPU utilization.
- Network: During periods of high traffic, the network throughput required to fulfill user requests can exceed capacity. Some sites, depending on the hosting provider, may also hit caps regarding cumulative data transfer. Reducing the size and quantity of data transferred to and from the server will remove this bottleneck.
- Memory: When a system doesn't have enough memory, data has to be offloaded to disk for storage. Disk is considerably slower to access than memory, and this can slow down an entire application. If memory becomes completely exhausted, it can result in Out of Memory (OOM) errors. Adjusting memory allocation, fixing memory leaks, and upgrading memory can remove this bottleneck.
- Disk I/O: The rate at which data can be read or written from disk is constrained by the disk itself. If disk I/O is a bottleneck, increasing the amount of data cached in memory can alleviate this issue (at the cost of increased memory utilization). If this doesn't work, it may be necessary to upgrade your disks.
The techniques in this guide focus on addressing CPU and network bottlenecks. For most sites, CPU and network will be the most relevant bottlenecks during a traffic spike.
Running top on the affected server is a good starting place for investigating bottlenecks. If available, supplement this with historical data from your hosting provider or monitoring tooling.
Stabilize
An overloaded server can quickly lead to cascading failures elsewhere in the system. Thus, it's important to stabilize the server before attempting to make more significant changes.
Rate Limiting
Rate limiting protects infrastructure by limiting the number of incoming requests. This is increasingly important as server performance degrades: as response times increase, users tend to aggressively refresh the page - increasing the server load even further.
Fix
Although rejecting a request is relatively inexpensive, the best way to protect your server is to handle rate limiting somewhere upstream from it - for instance, via a load balancer, reverse proxy, or CDN.
Instructions:
Further reading:
HTTP Caching
Look for ways to more aggressively cache content. If a resource can be served from an HTTP cache (whether it's the browser cache or a CDN), then it doesn't need to be requested from the origin server, which reduces server load.
HTTP headers like Cache-Control, Expires, and ETag indicate how a resource should be cached by an HTTP cache. Auditing and fixing these headers will improve caching.
Although service workers can also be used for caching, they utilize a separate cache and are a supplement, rather than a replacement, for proper HTTP caching. For this reason, when handling an overloaded server, efforts should be focused on optimizing HTTP caching.
Diagnose
Run Lighthouse and look at the Serve static assets with an efficient cache policy audit to view a list of resources with a short to medium time to live (TTL). For each listed resource, consider if the TTL should be increased. As a rough guideline:
- Static resources should be cached with a long TTL (1 year).
- Dynamic resources should be cached with a short TTL (3 hours).
Fix
Set the Cache-Control header's max-age directive to the appropriate number of seconds.
Instructions:
Graceful Degradation
Graceful degradation is the strategy of temporarily reducing functionality in order to shed excess load from a system. This concept can be applied in many different ways: for example, serving a static text page instead of a full-featured application, disabling search or returning fewer search results, or disabling certain expensive or non-essential features. Emphasis should be placed on removing functionalities that can be safely and easily removed with minimal business impact.
Improve
Use a content delivery network (CDN)
Serving static assets can be offloaded from your server to a content delivery network (CDN), thereby reducing the load.
The primary function of a CDN is to deliver content to users quickly by providing a large network of servers that are located close to users. However, most CDNs also offer additional performance-related features like compression, load balancing, and media optimization.
Set up a CDN
CDNs benefit from scale, so operating your own CDN rarely makes sense. A basic CDN configuration is fairly quick to set up (~30 minutes) and consists of updating DNS records to point at the CDN.
Optimize CDN Usage
Diagnose
Identify resources that are not being served from a CDN (but should be) by running WebPageTest. On the results page, click on the square above 'Effective use of CDN' to see the list of resources that should be served from a CDN.