Mozilla proxy server refusing Connection
If your application is important to people, then it’s worth spending a moment thinking about disaster scenarios. These are the good kind of disasters where your project becomes the apple of social media’s eye and you go from ten thousand users a day to a million. With a bit of preparation you can build a service that can persevere during traffic bursts that exceed capacity by orders of magnitude. If you forego this preparation, then your service will become completely unusable at precisely the wrong time – when everyone is watching.
Your Server Under Load
To illustrate how applications with no considerations for burst behave, I built an application server with an HTTP API that consumes 5ms of processor time spread over five asynchronous function calls. By design, a single instance of this server is capable of handling 200 requests per second.
This roughly approximates a typical request handler that perhaps does some logging, interacts with the database, renders a template, and streams out the result. What follows is a graph of server latency and TCP errors as we linearly increase connection attempts:
Analysis of the data from this run tells a clear story:
This server is not responsive: At 6x maximum capacity (1200 requests/second) the server is hobbled with 40 seconds of average request latency.
These failures suck: With over 80% TCP failures and high latency, users will see a confusing failure after up to a minute of waiting.
Next, I instrumented the same application with the code from the beginning of this post. This code causes the server to detect when load exceeds capacity and preemptively refuse requests. The following graph depicts the performance of this version of the server as we linearly increase connections attempts:
One thing not depicted on the graph is the volume of 503 (server too busy) responses returned during this run which steadily increases proportional to connection attempts. So what do we learn from this graph and the underlying data?
Preemptive limiting adds robustness: Under load that exceeds capacity by an order of magnitude the application continues to behave reasonably.
Success and Failure is fast: Average response time stays for the most part under 10 seconds.
These failures don’t suck: With preemptive limiting we effectively convert slow clumsy failures (TCP timeouts), into fast deliberate failures (immediate 503 responses).
To be clear, building a server that returns HTTP 503 responses (“server is too busy”), requires that your interface render a reasonable message to the user. Typically this is a pretty simple task, and should be familiar as it’s done by many popular sites.
How To Use It