As a performance guy, my world is full of benchmarks – both good and bad.  I wanted to share my perspective of them, as they often seem to make people crazy.

Lets start with defining benchmark.

A benchmark is a quantitative result of running a test, in order to assess the relative performance of a specific configuration.  In addition, a full disclosure of the configuration needs to be available.  A score and disclosure by itself becomes a baseline.

Why do we use them?

Once a score is assigned to a test, it becomes easy to generate comparisons.  This is both useful and dangerous.

Sometimes these scores can be used for direct comparisons if the test is standardized and audited (ex: TPC or VMmark).  This allows you to often compare against large pools of results.  Scores can also used to show progress as changes or optimizations are made to a baseline (ex: re-sizing a SQL buffer pool).

What are their challenges?

Not understanding what a score represents and making unfair comparisons is the biggest challenge I see.  Unless the configuration and test is prescriptive and audited, the comparison is relative versus direct.  So now you can use a score but only in the context of the configuration.

What drives people crazy?

Sometimes people don’t agree with the configuration or test.  Lots of people argue about whether the results represent real-life or not.  But the reality is, each score is documenting a valid configuration result whether one agrees with the configuration or not.  Sometimes seen as drag racing, it is still a valid measurement.  It’s up to you to use it appropriately.

What benchmark should I use?

That depends.  The best benchmark is your own application and configuration.  This is as real-life to you as it gets.  By selecting an application (or business) KPI that you need to meet and then measuring to see if your specific application and configuration can achieve it, is the best benchmark.

Example: See if your applications meets indicators like how fast your user screen refreshes, or that an important report is generated in a timely fashion.  Those are useful benchmarks versus loading a database instance with sqlio to see what happens.

As a VCDX I’m a strong proponent of holistic design.  As a performance guy I’m a strong proponent of designing and executing a benchmark to meet your specific needs and then using them responsibly.  Benchmarking is a very detailed science that requires time, patience, powerful observation and analysis.

Benchmarks are a powerful tool for baselines, measurement, troubleshooting, comparisons and yes – even marketing 😉

About the Author

Mark Achtemichuk

Mark Achtemichuk currently works as a Staff Engineer within VMware’s R&D Operations and Central Services Performance team, focusing on education, benchmarking, collaterals and performance architectures.  He has also held various performance focused field, specialist and technical marketing positions within VMware over the last 7 years.  Mark is recognized as an industry expert and holds a VMware Certified Design Expert (VCDX#50) certification, one of less than 250 worldwide. He has worked on engagements with Fortune 50 companies, served as technical editor for many books and publications and is a sought after speaker at numerous industry events.  Mark is a blogger and has been recognized as a VMware vExpert from 2013 to 2016.  He is active on Twitter at @vmMarkA where he shares his knowledge of performance with the virtualization community. His experience and expertise from infrastructure to application helps customers ensure that performance is no longer a barrier, perceived or real, to virtualizing and operating an organization's software defined assets.