Load testing & golden signals

How BugBrain's load testing maps results to the Four Golden Signals — latency, traffic, errors, and saturation — alongside SLA thresholds (p50/p95/p99 latency, error rate) and the degradation knee where performance falls off under load.

A load test answers a different question from a functional test: not "does it work?" but "does it still work under pressure, and where does it stop?" BugBrain runs many concurrent requests at one saved API request, then organizes the results around a model the industry already trusts — the Four Golden Signals — so the report is consistent and easy to reason about regardless of which endpoint you tested.

The Four Golden Signals#

The Four Golden Signals are a standard way to describe the health of a service under load. BugBrain maps every load-test result onto them:

  • Latency — how long requests take to complete. Reported as percentiles so you see the typical case and the slow tail, not just an average.
  • Traffic — how much demand is hitting the endpoint, measured as throughput (requests per second).
  • Errors — the rate of requests that failed or returned an error response. A system can be fast and still be quietly failing.
  • Saturation — how full the system is getting: the signs that it's approaching its capacity, such as latency climbing and errors appearing as load rises.

Looking at all four together tells a complete story. Latency alone can look fine while errors spike; traffic alone says nothing about whether the service kept up. The four signals are the lens BugBrain's analysis uses to find the real bottleneck.

Latency percentiles and SLA thresholds#

Averages lie about performance, because a few very slow requests get washed out by many fast ones. BugBrain reports latency percentiles instead:

  • p50 — the median. Half of requests completed faster than this.
  • p95 — 95% of requests completed faster than this; the remaining 5% were slower.
  • p99 — 99% completed faster; this exposes the slow tail that hurts your least-lucky users.

On top of the numbers, you can define SLA thresholds — objective targets the run must meet, such as "p95 latency under 300 ms" or "error rate under 1%." BugBrain evaluates the run against each threshold and reports a pass or fail, turning a wall of charts into a clear, objective verdict you can gate on.

The degradation knee#

As you push more traffic at an endpoint, performance doesn't degrade evenly — it tends to hold steady and then fall off a cliff. BugBrain identifies that degradation knee: the point where latency and errors start climbing sharply for each additional unit of load.

The knee is the practical answer to "how much can this take?" Below it, the endpoint absorbs load gracefully; above it, each extra request makes things disproportionately worse. Knowing where the knee sits tells you your real headroom and where to focus optimization before it bites in production.

Read the signals together, then find the knee

Start with the four signals to see what is degrading (slow? erroring? saturated?), then look at the knee to see when it starts. The two together turn raw load numbers into an action: scale up, optimize, or set a safe traffic ceiling.

Frequently asked questions

What are the Four Golden Signals?

Latency, traffic, errors, and saturation. They're a widely used way to summarize the health of a service under load, and BugBrain maps every load-test result onto them so the report reads the same way every time.

What do p50, p95, and p99 mean?

They're latency percentiles. p50 is the median (half of requests were faster), p95 means 95% of requests were faster than this value, and p99 means 99% were. The high percentiles reveal the slow tail most users never average out.

What is the degradation knee?

It's the point where performance falls off as load rises — where latency and errors start climbing sharply for each extra unit of traffic. It tells you roughly how much load your endpoint can take before it stops behaving well.

How do SLA thresholds work?

You set targets — for example p95 latency under 300 ms and error rate under 1%. BugBrain evaluates the run against them and reports pass or fail per threshold, so you get an objective verdict, not just charts.