How many times have we all run across a situation where the performance tests on a piece of software pass with flying colors on the test systems only to see the software exhibit poor performance characteristics when the software is deployed in production?
A lot of time is wasted on verifying(and re-verifying) the validity of the test, checking for hardware/network differences, checking hundreds of parameters on the test systems versus production in the hopes of finding a meaningful difference.
By far the biggest reason I have seen for the performance discrepancy above is not due to a faulty test but due to the stress test being executed on wildly different data sets than what is in production. The data sets between production and test systems in many cases are an orders of magnitude different in size and richness.
A quick example to make the point
- Lets say that there are 1 million users in production of which 1,000 at any one time are using the system.
- Lets say that the new system requirements(after some scalability refactorings) are to support 2,000 concurrent users.
- This is typically simulated by creating 2,000 users and then scripting appropriate actions for these 2,000 users simultaneously. Typically all is great, everyone high fives each other and the release is scheduled
Technically the test is correct. It simulates 2,000 concurrent users. However the data that the action is performed on is almost 3 orders of magnitude greater on production(1 million users versus two thousand)
It does not take much for a non optimized SQL query or a full directory scan on a NAS to cause the slow down in production
Ensure that the data set you run your stress tests on is representative of the data set in your production system. This is a great way to gain confidence in your Data Storage Layer and this also appropriately tests how your software interacts with this layer.
One simple and easy way to run meaningful performance tests is to take a snapshot of your production data (minus any personal/private information of course) and to execute the stress test on this data set. I prefer to go one step further still and make sure that the data set in the quality system is 2x or 3x of what is in production.