As a performance guy, my world is full of benchmarks – both good and bad. I wanted to share my perspective of them, as they often seem to make people crazy.
Lets start with defining benchmark.
A benchmark is a quantitative result of running a test, in order to assess the relative performance of a specific configuration. In addition, a full disclosure of the configuration needs to be available. A score and disclosure by itself becomes a baseline.
Why do we use them?
Once a score is assigned to a test, it becomes easy to generate comparisons. This is both useful and dangerous.
Sometimes these scores can be used for direct comparisons if the test is standardized and audited (ex: TPC or VMmark). This allows you to often compare against large pools of results. Scores can also used to show progress as changes or optimizations are made to a baseline (ex: re-sizing a SQL buffer pool).
What are their challenges?
Not understanding what a score represents and making unfair comparisons is the biggest challenge I see. Unless the configuration and test is prescriptive and audited, the comparison is relative versus direct. So now you can use a score but only in the context of the configuration.
What drives people crazy?
Sometimes people don’t agree with the configuration or test. Lots of people argue about whether the results represent real-life or not. But the reality is, each score is documenting a valid configuration result whether one agrees with the configuration or not. Sometimes seen as drag racing, it is still a valid measurement. It’s up to you to use it appropriately.
What benchmark should I use?
That depends. The best benchmark is your own application and configuration. This is as real-life to you as it gets. By selecting an application (or business) KPI that you need to meet and then measuring to see if your specific application and configuration can achieve it, is the best benchmark.
Example: See if your applications meets indicators like how fast your user screen refreshes, or that an important report is generated in a timely fashion. Those are useful benchmarks versus loading a database instance with sqlio to see what happens.
As a VCDX I’m a strong proponent of holistic design. As a performance guy I’m a strong proponent of designing and executing a benchmark to meet your specific needs and then using them responsibly. Benchmarking is a very detailed science that requires time, patience, powerful observation and analysis.
Benchmarks are a powerful tool for baselines, measurement, troubleshooting, comparisons and yes – even marketing 😉