Designing the characteristics of the HPC system for a virtual environment requires an understanding of how the architecture will be compared to that of traditional bare metal HPC environments as well as defining the workload parameters required for a successfully deployed virtual HPC.
With traditional HPC environments the entire solution is made up of multiple physical servers. Each physical server is dedicated to a specific role that is managed independently. Compute Nodes are controlled by additional servers running cluster management software and are connected via high speed interconnects.
Figure 1: Traditional Bare Metal HPC Cluster
When virtualizing a traditional HPC environment design, a significant benefit for someone familiar with HPC is the ability to maintain and recreate the same familiar environment simply by using VMware software to create resource clusters (grouping of servers) and virtual machines (software-based servers). This simplifies the time to get up and running and allow the existing procedures to be maintained. Many of the HPC components can be virtualized as shown below.
Those more familiar with virtualization will view virtualized HPC as a dedicated application environment, built using their existing virtualization skills. This would be similar to how other applications are deployed but needs a deeper awareness of the components and workflows required for the HPC solution.
Figure 2: Virtualized HPC Cluster
This design is targeted for the following types of workloads:
Parallel distributed applications consist of multiple simultaneously running processes that need to communicate with each other, often with extremely high frequency, making their performance sensitive to interconnect latency and bandwidth. Parallel distributed applications are also called Message Passing Interface (MPI) applications because they are enabled by MPI parallel programming on distributed-memory systems. MPI is a standard for multiprocessor programming of HPC codes. Typical parallel distributed applications include weather forecasting, molecular modeling, and design of jet engine, spaceship, airplane automobile, where each program is able to run in parallel on a distributed memory system and communicate through MPI processes.
Numerous scientific applications that run in a distributed way use the MPI library and take advantage of communication primitives such as point-to-point operations (e.g., MPI_Send and MPI_Receive) and collective operations (e.g., MPI_Bcast, MPI_Reduce). MPI libraries are designed to use the best available pathways between communicating endpoints, including shared memory, Ethernet, and – for high performance – RDMA-capable interconnects. Data-intensive workloads require parallel file system with high performance I/O. Data-intensive workloads with large amount of data require effective storage technologies. Parallel file system is designed to achieve high performance for handling of large datasets. It can distribute file data across multiple hosts and provide concurrent access for applications with parallel I/O implementation with MPI.
Throughput workloads require a large number of individual jobs to be run in order to complete a task with each job running independently and no communication between the jobs. Typical throughput workloads include Monte Carlo simulations in financial risk analysis, digital movie rendering, electronic design automation and genomics analysis, where each program run in a long-time scale or have hundreds or thousands even millions of executions with varying inputs.
Where traditional High-Performance Compute environments are built with physical servers and physical high speed devices, a virtualized High-Performance Compute environment utilizes VMware solutions and technologies.
- VMware vSphere (ESXi) is an Enterprise class, type 1 hypervisor that the underlying physical assets are still present, however much of the complexity is reduced by standardization of physical resources which become encapsulated by the hypervisor to where compute, storage and networking are presented spanning all physical resources and sub systems within a cluster. This highly optimized layer forms the foundation to the virtual HPC infrastructure and allows for software to define HPC clusters and virtual machines where the HPC workloads will be scheduled and executed.
- VMware vCenter Server Appliance (VCSA) provides the centralized management of all virtualized infrastructure and delivers a single management interface to interact, manage and monitor the virtualization configuration, settings and services.
- VMware NSX provides software defined networking (SDN) to where network functions like switching, routing, firewalling and load balancing are attached to your virtual High Performance Compute applications creating optimized networking and security policies distributed throughout the environment.
- VMware vRealize Operations delivers tools to optimize the operating physical, virtual and application environment with intelligent alerting, policy based automation and unified management.
- VMware vRealize Automation accelerates the deployment and management of applications and compute services, embowering IT to quickly standardize the deployment of automated virtual High Performance Compute platforms that can be requested or delivered on demand as needed.
- VMware Horizon delivers the ability to connect and operate your virtual High-Performance Compute environment securely from a remote location without any data or communications leaving the virtualized datacenter.
The use of these VMware technologies and solutions are designed to be modular so that only the functions and features desired can be utilized. Further information can be found at http://www.vmware.com
In part 2 we will look at the design and show sample virtual reference architectures for HPC workloads.