I spoke last week at ISC Cloud 2012 in Mannheim, Germany about the performance of HPC applications in the cloud, citing results from several studies. I have summarized the talk below and my PDF slide deck is available here. For full details of ISC Cloud 2012, I recommend the coverage at HPC in the Cloud.
I gave my talk to share some specific data about HPC performance in virtual environments, but I first described what new capabilities can be gained by virtualizing an HPC environment. I felt this was necessary because in HPC discussions it is often the case that virtualization is only discussed in a negative context: how much performance is it going to cost me? That's a legitimate question, but it is also important to understand what additional capabilities might be gained in return to do a proper cost-benefit analysis.
Those additional capabilities include:
I focused on performance in the second half of the talk, making the point that single-process HPC applications across a range of vertical markets (Life Sciences, Digital Content Creation, Electronic Design Automation) generally show slowdowns of about 0-5% relative to bare-metal performance. For specific Life Sciences data, I referenced the paper Pragmatics of Virtual Machines for High-Performance Computing: A Quantitative Study of Basic Overheads by Cam MacDonell and Paul Lu of the University of Alberta. Here are their results for HMMer, an important biosequencing code:
And for GROMACS, a molecular dynamics code:
I then discussed distributed applications, starting with Hadoop which is a workload we have seen run faster in some cases when virtualized. We previously published a technical white paper about this, which can be found here.
Turning to more challenging cases, I showed the results of Intel experiments in 2009 showing HPCC and STAR-CD performance using InfiniBand in passthrough (VM DirectPath I/O) mode. HPCC generally ran well with two exceptions, one of which we have explored and understand (MPIRandomAccess should be run with large pages to reduce TLB miss rates) and the other (NaturallyOrderedRingBandwidth) which has not yet been investigated. The HPCC results are shown below for two, four, and eight-node configurations.
STAR-CD, a computational fluid dynamics code, ran with an overhead of about 15% in the tested configuration. The results for an eight-node case are shown below. As mentioned in the slides, STAR-CD is moderately latency sensitive due to the number of small messages it exchanges. Other less sensitive applications will see less overhead and more sensitive applications will see more.
Intel's tests were run on DDR InfiniBand using ESX4, both of which are dated technologies today. To partially address this, I also shared our results from QDR InfiniBand experiments in 2011 that showed we could achieve ping-pong latencies under two microseconds with passthrough mode using vSphere 5.1. These results were recently reported in a research note, available here. We are also working with Intel and Mellanox to run additional MPI tests with newer hardware and software and expect to report those results soon.
My message to ISC Cloud attendees was that today's cloud is not tomorrow's cloud: As virtualized performance continues to advance and as cloud providers see a business value in deploying high-bandwidth, low-latency interconnects, the number of applications that will run well in a cloud environment will continue to expand. In the meantime, many single-process applications run very well in virtualized environments, MPI overheads may be acceptable for some applications, and virtualization can offer new capabilities not available in traditional, bare-metal HPC environments.