In my earlier post about concurrency control I mentioned that due to the exponential characteristic of response time with increasing concurrency it is better to potentially reject the extra requests than letting them negatively affect your system. While rejecting these requests is an option it is not the most desirable option. Request Queuing offers a much more powerful solution as explained below.
Request Queueing Example
For simplicities sake lets assume the following base response time characteristics for a system with no concurrency controls or request queueing:
- at a concurrency of 500 or below the average response times are 1 second.
- at a concurrency of 600 the average response times are 2 seconds for all requests.
- at a concurrency of 700 the response times are 4 seconds for all requests.
The exponential degradation starts after a concurrency of 500. With the above numbers in mind lets assume that you have set up your concurrency controls optimally such that 500 concurrent requests can get through and the next 1000 requests (requests 501-1500) are set up to wait. That is : The Max Queue Size is set to 1000.
Now lets say you get a sudden traffic spike of 1501 requests. Following is what will happen
- The first 500 requests will get through and be served on average in 1 second. The next 1000 requests(requests 501-1500) are queued . The 1501st request is rejected.
- As requests 1-500 finish the requests 501-1000 are sent forward. So requests 501-1000 will be served in 2 seconds total. (1 second execution time and 1 second queue time)
- As requests 501-1000 finish, requests 1001-1500 are forwarded on. So requests 1001-1500 will be served in 3 seconds total. (1 second execution time and 2 second queue time)
Characteristics of Systems With Request Queueing
- Request Queuing allows your system to operate at optimal throughput. In the above example : the optimal throughput was at 500 concurrency. The concurrency at which optimal throughput is achieved is usually right below where exponential degradation starts to take place. At all times the system was operating at this optimal throughput
- Your users only experience linear degradation versus exponential degradation. As shown in the diagram, with no request queueing your users and your system would have experienced exponential degradation after 500 requests. Requests 0-500 take 1 second, 501-1000 takes 2 seconds, 1001-1500 take 3 seconds and so on – With Request queueing – the response times become linear
- Your system experiences NO degradation – This is worth repeating. The system is always operating at an optimal throughput. The only attribute that is dynamic is the queue size. The system remains in the green zone as highlighted in the diagram.
If even during times of unusually high load your system can show the above 3 characteristics you are in good shape.
Haproxy is one solution that can do both concurrency control and request queueing and is one of the solutions that should be considered for high availability.