API Performance Testing
Why Performance Test Your API?
Developers want to build on an API that’s fast and reliable. Customers want to know that the API can be relied upon, and won’t bog down or crash when they need it most during a traffic spike.
Understanding how your API performs in different traffic situations will save you the stress and high cost of reacting to issues that would otherwise arise in production.
Performance testing makes sense for most types of APIs, including REST, GraphQL, many RPC-style web services like XML-RPC, JSON-RPC, SOAP… and an alphabet soup of other specifications.
Nearly all modern APIs share characteristics that make performance testing them relatively straightforward:
- Stateless protocols. Usually HTTP/S. When we say the protocols are stateless, we mean that each request to the API is self-contained and made independently.
- Request-Response. Most web APIs operate on a request-response model, meaning that the caller or consumer sends a request, and the API answers it.
- Built for programmatic access. API stands for Application Program Interface, so by definition they are meant to be consumed programmatically. Of course that means they’re an ideal fit for testing programmatically too.
Overall, APIs are ideal candidates for automated testing. Their performance is relatively easy to measure and quantify.
API Performance is Multidimensional
There are two things about API performance that might seem obvious, but are worth pointing out.
Every API endpoint performs differently. Unless your API only has a single endpoint, callers will get different response times from one endpoint to the next. To understand your API’s performance, you’ll need to compare response times between different endpoints, and even the same endpoints invoked with different payloads.
APIs perform worse under load than at baseline. Every API is powered by a finite amount of server and infrastructure resources. When many requests are coming in at once, there is resource competition, which means at some point the demand on the API will exceed its ability to process the requests. When this happens, the API’s performance is degraded and everything bogs down, leading to a crash or at the least a bad experience for callers.
Measuring Your API’s Performance Characteristics
Since API performance is multidimensional, we recommend looking at your API through a few different lenses to see the big picture.
Passive API Monitoring
Passive monitoring isn’t really testing per se, but it’s closely related to API performance, so we’ll mention it here.
It means passively actual API traffic and measuring the response times and other aspects of each request. Passive
monitoring is backwards-looking: your monitoring data can show a clear picture of past and present usage, but
it doesn’t help you anticipate how your API might handle increased or different usage in the future.
Passive monitoring can be low-touch (tail -f your server logs) or high-touch (custom instrumentation,
observability stacks, or using APM tools like New Relic or Dynatrace).
Active API Monitoring
Active monitoring means testing your API by making actual requests on a schedule. Active monitoring is also called
“synthetic monitoring” because you generate traffic just to test the API. Like passive monitoring, active monitoring
is backwards-looking: your monitors will tell you how your API responded to synthetic requests at various times and
alert you of errors or anomalies, but they won’t help to anticipate how the API will behave in the future under
heavier load. Active monitoring can be as simple as a cronjob making curl requests and logging the output, or you
can use Loadster’s Site & API Monitoring capability to run monitoring scripts on a schedule and
alert you of any problems.
API Load Testing
Load testing also means generating synthetic requests to test your API, but instead of sending a single request at a time, a load test hits your API with many requests at once. Your API’s performance under heavy load might be far worse than it was under minimal or baseline traffic! An API that runs fine normally might crash when there’s a traffic spike. Crashes during traffic spikes are especially bad because they happen at the worst possible time, when the most callers are expecting a response!
How Load Testing Helps You Prepare for Traffic Spikes
Load testing means testing a system with many concurrent requests, to see how the load from all these requests impacts the system. Load testing is most concerned with how API performance changes or degrades as the load increases.
The intensity of load and the duration of load can both impact your API’s performance. You may find that your API handles moderate traffic spikes just fine, but a larger traffic spike exhausts backend resources and crashes the system. Your API might recover gracefully from a short traffic spike, only to collapse if the heavy traffic is sustained.
Different types of load tests are meant to test your API with various traffic spike intensity and durations.
Baseline Load Testing
You might want to start with a load test simulating some reasonable baseline amount of traffic. A baseline load test should generate only a modest amount of load and doesn’t need to run very long.
Let’s say in real life your production API receives about 20 requests per second: a baseline load test would simulate this typical amount of load as realistically as possible, so you have a baseline “blue sky” test result to compare with more aggressive tests later.
Spike Testing
A spike test simulates a traffic spike, or a short burst of heavy traffic, that is significantly higher than the amount of traffic your API normally experiences. The spike portion of a spike test is short but intense.
Traffic spikes could be intermittent (a minute or so) or longer (several hours). When you run a spike test, you’re trying to find out how much worse the API performs during the traffic spike than at baseline.
Ideally you’ll have some performance requirements in mind so you can tell if your API passed or failed the spike test. For example, you could say that during a traffic spike the API must maintain average response times less than 500ms, and that the 95th percentile response time not exceed 2.0 seconds.
Stress Testing
A stress test also means hitting the API with excessive traffic, with the goal of actually breaking it. Stress tests have maximum load intensity over a fairly short duration.
The primary goal with a stress test is to determine what happens when the API is pushed past its limit. Does it just get progressively slower for all callers? Does it rate limit them with an HTTP 420/429 and tell them to try again later? Does it gracefully prioritize certain critical operations over other less critical ones? Or does it block all incoming requests to the point of 100% failure? Worst of all, does it lose data or commit half-finished transactions that leave the system or your customers’ data in a permanently broken state?
Stress testing is a way to mitigate these disasters before they happen. You should seriously consider it, particularly for read-write APIs.
Soak or Stability Testing
A soak test, or stability test, focuses less on the intensity of excess load on your API, and more on the duration of the load. Soak tests typically generate moderate traffic over a long duration.
Some APIs have gradual resource leaks that get worse over time. For example, your backend might leak memory by holding onto global references to objects after the requests have already been handled, or failed transactions might leave database connections open until they eventually run out. Sneaky problems like this might go unnoticed for a long time and be hard to reproduce, but a soak/stability test can detect them. The likelihood of such problems depends on your API’s architecture and technology stack, but soak testing is never a bad idea.
Performance Tuning & Optimization
A repeatable load test can be very helpful when you’re tuning your API’s configuration and environment for better scalability. You’ll want to test and re-test the API with the same load after each experimental change. A repeatable performance test will tell you whether the change caused performance to get better, stay the same, or get worse. It’s usually best to change just one thing at a time, running the same load test before and after for comparison.
Planning Your API Load Testing
A load test is a simulation of real world traffic patterns, but the outcome of the simulation is only as valid as the assumptions going into it. If your test doesn’t represent reality, neither will the results. Planning realistic tests is important.
Consider questions like…
-
Which API endpoints and functionality are in scope for load testing? Start with the endpoints that are most critical for your business, most likely to have performance problems, and most often invoked. In an ideal world, the distribution of endpoints you load test will mirror the distribution of real-life requests coming in from your API consumers.
-
What test payloads should be sent to the endpoints? Calling the same endpoint with a different payload might produce a very different result! For instance, invoking a search endpoint with a common term will return a bigger response than searching for an uncommon term, and it might be slower (because there’s more data to retrieve) or it might be faster (because the results were cached on the backend). Testing your API’s performance with identical payloads is a common pitfall you’ll want to avoid, so when building your load test, try to use varying payloads.
-
How fast does the API need to perform? Under how much load? You and your team should agree on performance requirements that dictate how fast the API needs to respond, possibly broken down for critical endpoints that might be faster or slower. Additionally, you should have scalability requirements for how many simultaneous requests or concurrent consumers the API needs to handle at peak, while still delivering acceptable performance.
Document and share these considerations with your team when planning your load tests. That way, everyone shares the same assumptions, and you can collectively agree whether your API meets the performance requirements.
The API Load Testing Process
Most of this guide is focused on ideas and methodology, but we’ll share some examples for Loadster. If you have a different tool you can probably apply many of the same ideas.
Creating API Test Scripts
Loadster’s Protocol Bots are ideal for testing APIs because they automate requests at the HTTP layer (as opposed to Browser Bots which use headless browsers to test websites and web applications). Protocol Bots run protocol scripts.
Each step in a protocol script represents an HTTP request. The step includes the HTTP method and URL, and you can optionally add authentication, custom headers, and a POST or PUT request body or payload.
A script can hit a single endpoint, or it can call multiple endpoints one after another. When invoking an API, it’s common for the consumer to parse the output of one response and then use it in the payload of a follow-up request, often chaining together many related requests in sequence. A test script can do this too, so that it mimics your real API consumers as closely as possible.
When the script runs, each step fires off a request to your API, gets back a response, and reports how long it took.
Your script can validate the response and capture or extract values from it to use later on.
There are several ways to build scripts in Loadster. You can add steps manually, import them from your API’s OpenAPI/Swagger specification (if you have one), or record them in your browser with a browser extension (if your API is the backend of a web application).
Read up on Protocol Scripts in the Loadster manual to see how it’s done.
For API load testing, you’ll need at least one working test script that hits one or more of your API endpoints.
Designing API Load Test Scenarios
In a load test, many bots run your scripts in parallel, simulating lots of API consumers hitting your API at once. You can load test with different levels of traffic simply by changing the number of bots.
The bots can originate from the same geographical region, or different regions. Distant geographical regions will probably introduce extra latency. If your API consumers are geographically distributed, spreading the bots across various regions is a good idea.
Loadster’s bots arrive in groups. The number of active bots increases gradually and then holds steady and then decreases, according to a pre-set pattern.
Here’s an example of a load test with two groups of bots, ramping up with an aggressive pattern, maintaining steady load for a while, and then ramping down with a natural bell-curve drop off pattern.