Hyperconverged Infrastructure vSAN

VMware Virtual SAN and Scale IO: Fundamentally Two Different Approaches to Software-Defined Storage

There’ s a lot of buzz and excitement around Software-Defined Storage (SDS) and hyper-converged storage solutions.  Particularly around VMware’s recently introduced product: VMware Virtual SAN.


VMware Virtual SAN is seeing tremendous traction in the market since its release. After only two full quarters of availability, we already have many hundreds of customers happily running a variety of applications on Virtual SAN, from VDI to test & development  to production applications and databases.  Virtual SAN customers love the product’s simplicity and integration with the VMware stack.


But along with increased awareness and traction, we are also seeing an increasing level of confusion in the market on the key differences between Virtual SAN and other SDS products.  Particularly, we have been receiving a great deal of questions from our customers and partners about differences between Virtual SAN and EMC ScaleIO. They are asking us about where Virtual SAN should be used, where ScaleIO should be used, and whether there’s any real difference between the two.


This type of confusion is unfortunate because VMware Virtual SAN and ScaleIO follow two fundamentally different approaches to SDS.

  • VMware Virtual SAN is designed specifically around tight integration with vSphere – with the objective of providing super-simple management and very high levels of performance for vSphere VMs.  Virtual SAN is always deployed in a hyper-converged configuration, where storage is converged with the vSphere compute nodes. Virtual SAN is targeted at the generalist IT professional, not just storage experts.
  • ScaleIO has a different design point – to provide highly scalable server-based storage for heterogeneous platforms, including multiple hypervisors and physical servers. ScaleIO has its own installation, configuration and management workflows which are typically driven by expert storage administrators.


The confusion between VMware Virtual SAN and ScaleIO is partially fueled by recent press articles, which claim full integration of ScaleIO into vSphere’s ESX kernel.  This claim is not accurate. There are no plans to port the core ScaleIO product in the ESX kernel or integrate it with the rest of the vSphere stack.


More specifically, ScaleIO consist of two  components: a) a block storage server that is the core of the ScaleIO product and which serves block storage to its clients through the ScaleIO protocol; b) a client which connects to the server and allows VMs and applications to access storage on ScaleIO clusters. This model is very similar to an iSCSI target server serving data to iSCSI initiators. EMC has written an ESX kernel driver that implements a ScaleIO client module. It ‘talks’ the ScaleIO protocol and accesses the ScaleIO server(s). It exposes storage to VMs running in vSphere in a way similar to iSCSI volumes. This ScaleIO driver has been written using  public kernel APIs that are available to any VMware partner who develops kernel drivers in ESX. The ScaleIO server is not being ported in or integrated with vSphere and the ESX kernel. The ScaleIO server runs on Linux servers, either on bare metal or as a virtual appliance.


This architectural model allows ScaleIO to be a great SDS solution for heterogeneous platforms.


In the case of bare metal deployments VM I/O goes through the in-kernel driver and onto the external ScaleIO cluster over an IP network as it is the case with other storage arrays. In the virtual appliance case, a VM I/O operation traverses the ESX storage stack through the virtual appliance.


In contrast, VSAN and all its components are natively integrated with vSphere. The key functional components of VSAN, including its “server” functionality, run in the ESX kernel. This fundamental difference in architecture allows VSAN to be optimized for vSphere VMs in an unparalleled way.  VSAN is also integrated directly with the ESX control plane, vCenter  and vSphere APIs to provide a simple and effective management experience.  Together, these integrations provide important benefits to vSphere customers:

  • Performance and Overhead: The full kernel integration gives VMware Virtual SAN higher levels of performance and efficiency because Virtual SAN can more efficiently utilize the available memory and CPU cycles. Hence, Virtual SAN’s memory footprint and CPU cycles consumed per operation are the lowest in the market. Furthermore, compute and storage operations are executing inside the same layer of software, minimizing communication latencies.  This efficiency translates to  more compelling performance and total-cost-of-ownership  for the end user[V3] . By contrast, no other hyper-converged solution has its “server” logic integrated in the vSphere kernel, limiting the gains and efficiencies that can be achieved by these solutions.
  • Management integration: Virtual SAN is designed to be managed through vCenter, by any administrator who is familiar with vSphere.  The setup, configuration and ongoing management of the product are simple and  fully integrated with vSphere management workflows.  As a result, there are no separate management consoles and solutions. The required storage properties of each VM and virtual disk are expressed in the form of policies.  Effectively, storage becomes a quality of every VM, not a separate function.
  • Programmatic APIs: The functionality of Virtual SAN’s control plane is exposed through new or extensions to existing vSphere APIs. These are stable APIs with s wide range of language bindings that VMware customers have been using for years to automate their operational processes.
  • vSphere Features: In addition, since Virtual SAN is embedded within the hypervisor, all vSphere features such as DRS, vMotion, SVMotion, High Availability, vSphere Replication, and others are seamlessly supported with VSAN.


VMware Virtual SAN’s architectural model allows it to be the best storage solution for hyper-converged vSphere environments and for vSphere VMs. It does not address non-vSphere storage needs today.


So what does it mean in terms of where each solution should be used? In practice things are never black or white as we would like them to be, but at a high level there are some key aspects that we can keep in mind when comparing:

  • Use VMware Virtual SAN if you value deep integration with vSphere, both on the data path and control plane. Virtual SAN is deployed in a hyper-converged model, where storage is converged with compute on the same x86 hosts and storage scales in alignment with vSphere clusters (up to 32 nodes per cluster today, soon to become 64). We believe the Virtual SAN approach delivers highly differentiated, unique advantages for customers of all sizes looking for an SDS solution for vSphere.
  • Use ScaleIO when delivering highly scalable shared storage from a single storage pool to different hypervisors or across multiple vSphere clusters. The primary use case for ScaleIO is serving storage for a heterogeneous environment (i.e when storage is served to a diverse set of hypervisor clients or between virtual and physical environments) or when the storage system needs to scale beyond the size of a vSphere cluster.


The picture below should help clarify what these two products are positioned for:

VSAN                ScaleIO


3 comments have been added so far

  1. Hi Dan,
    that blog and this one are consistent.

    In this blog Vijay stated that “EMC has written an ESX kernel driver that implements a ScaleIO client module” this client module is the ScaleIO Data Client. In Itzik’s blog he writes that “the big news is the direct kernel integration of the ‘SDC’ component into the ESXi kernel”. The SDC Itzik refers to is also the ScaleIO Data Client[1].

    The other major component of Scale IO, the ScaleIO Data Server (or SDS), is still deployed outside of ESXi. In Itzik’s blog you can see the diagram shows that the ScaleIO Data Server was deployed on the SVM (or ScaleIO Virtual Machine). This again is consistent with the split between client and server functionality that Vijay highlights in this blog.

    As you can see both blogs are saying the same thing, just using slightly different terminology and abbreviations. I hope this comment helps clarify things for readers.

    [1] See “EMC ScaleIO Basic Architecture (June 2015), page 5” for the ScaleIO terminology table – https://www.emc.com/collateral/white-papers/h14344-emc-scaleio-basic-architecture.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *