vSAN

20+ Common vSAN Questions

VMware vSAN is the industry-leading software powering Hyper-Converged Infrastructure solutions. We continue to get questions from customers and what follows is the list of interesting questions we’ve encounter most often, along with quick answers.

I hope you find this helpful!

1. How does vSAN use external storage arrays?

pic-1

vSAN doesn’t use external storage arrays — it uses server-based disk drives and flash devices to create its own shared pool of storage. The environment is much simpler and more cost-effective as a result.

However, a given cluster can use both vSAN and external storage arrays at the same time using Storage Policy Based Management — and there are useful capabilities to place and move workloads intelligently.

2. What storage protocols does vSAN support?  E.g. iSCSI, NFS, etc.

As vSAN only communicates with vSphere virtual machines, there’s really no need for a standard storage protocol.  vSAN uses a proprietary protocol within the cluster that’s more efficient than the familiar choices.

If you want to expose vSAN storage outside of the vSphere cluster, there are some good third-party choices for that.

3. Can vSAN support any server, disk, flash, IO controller, etc.?

There’s a long and constantly-growing list of supported components, but — no — not *everything* you might encounter in the wild.  It’s vitally important to use only listed components, drivers and firmware on the Hardware Compatibility List (HCL) especially if you don’t want nasty surprises.

However, we strongly recommend that customers use vSAN Ready Nodes as a starting point — either purchase them as configured, or use them as a starting point for your own design.

4. All-flash vSAN looks interesting — how do you do read caching?

All-flash clusters have two types of flash: very fast and durable write cache, and more capacious and cost-effective capacity flash. Here cache is 100% allocated for writes, as read performance from capacity flash is more than sufficient. Many more writes are held by the cache and written to the capacity layer only when needed, extending the life of the capacity flash tier.

In both hybrid and all-flash vSAN, the general sizing recommendation is the same – New Cache guidance will factor in target performance, and workload IO mix: Read this blog to learn more about vSAN All Flash Cache ratio.

5. What makes vSAN so fast?

A number of things, really.  First, vSAN is fully integrated into the vSphere kernel.  That means optimized IO paths and less resource usage.  Second, there’s no need for IOs to traverse the typical storage network and array controller to get work done — everything is integral to the server cluster.

And, finally, there’s some pretty smart storage software at work behind the scenes.

6. How does vSAN protect my data?

umbrella_protection

Like any enterprise storage product, that can be a long answer, but at a high level:

  • Administrators declare the level of protection they want for their VM’s data via policy, which is applied at provisioning time.  This is expressed in terms of “failures to tolerate”: 0, 1, 2, etc.  Most of the time, people choose FTT=1 which is the default — e.g. basic redundancy. However, a single vSAN cluster can support different policies (and thus different FTT levels) in the same data store — no need to pre-configure storage.  Also, policies can be changed at any time, and vSAN will automatically reconfigure.
  • vSAN differentiates between failed components and removed components, known as DEGRADED and ABSENT internally.  The idea is that an admin might be rebooting a server, changing a network, or moving around stuff — and the component is missing only temporarily and is expected to return shortly — so no need to do a full reprotect.
  • If vSAN detects a hard device failure (e.g. a failed disk drive or controller), it declares DEGRADED and immediately begins reprotecting data using other resources in the cluster.  If instead vSAN detects that a good component has simply gone missing (e.g. ABSENT) it waits 60 minutes before deciding it’s gone for good and starts reprotecting.  This value is configurable.
  • Protection against network partitions is done through a quorum voting algorithm.  If the network partitions for some reason, >50% of the votes for a given storage object is needed to allow reads and writes to proceed.  This protects against split-brain syndrome.

There’s much more detail available, of course, but these are the core concepts.

7.   I ran a single VM against a single VMDK, and didn’t find performance all that impressive.  What’s going on?

huh_450

That’s not a surprise, really.  vSAN is designed to be a shared storage service for an entire cluster, and not individual VMs.  Many storage arrays will spread a given VMDK around many disks using striping or their file system, which gives great performance to a single VM, but loses effectiveness when there are many VMs all competing for the same capacity devices.

You can achieve a similar effect with vSAN by setting the striping policy to a higher number.  This will be effective for smaller numbers of VMs, but loses effectiveness if the entire cluster is busy.  However, we’d encourage you to test what you actually plan to use — which most often is multiple VMs on a cluster, each doing different things.

8.  My storage team is interested in managing vSAN — what tools do you have for them?

storage_team

Well, vSAN was mainly designed to be managed by vSphere administrators.

That’s one of its unique strengths — no specialized storage skills required.

However, in some situations, there’s a preference to have the storage team continue to do this job.  Today, they would have to use the same tools as the vSphere administration team.  A few customers have decided to go this way, and it seems to be working for them.

9.   Why don’t you have deduplication and compression?

Well, this is no longer true. With the introduction of vSAN 6.2, space efficiency features like Dedup, compression and erasure coding were introduced.

10. What backup products do you support?

There’s a long list — basically anything that supports standard VADP as a backup interface.  Ask your favorite vendors.

11. What remote replication products do you support?

As vSAN doesn’t use external arrays, it doesn’t support array-based replication — all replication has to be done at the host and at an individual VM level.

In addition to VMware’s own vSphere Replication, there are also options from EMC (RecoverPoint for VMs) as well as Zerto.

12. Where are the “gotchas” for vSAN?

Generally speaking, most customer experiences have been very positive, but there are two areas worth calling out.

gotcha

The first is the HCL, especially drivers and firmware.  A bad IO controller driver can really ruin your day.  We’ve published a list of what we’ve tested and we know works reliably, so stick to that — not only on day 1, but through the entire life of your cluster.

The second is sizing.  Some VMware admins try to design their own configs, and are rather new to storage.  They might not understand the performance implications of using, say, a single 4TB 7200 RPM NL-SAS drive vs. four 1TB 10K SAS drives — or might try to use super low-end components.

The vast majority of people are better off with pre-configured vSAN Ready Nodes for different workload profiles that you can either buy as a SKU, or use as a starting point for your own configuration.  We also have a thorough design and sizing guide if you’d like to understand the theory behind the recommendations.

13. You say that vSAN is ready for tier 1 enterprise workloads — why?

bulldozer

That claim is the result of multiple years of real-world customer experiences coupled with our own internal testing.  vSAN is clearly a rock-solid product that delivers more than enough performance, reliability and availability to do the job.  We have plenty of customers who are running very demanding production workloads on vSAN, and are very pleased with the results.

Additionally, we’ve got a few reference architectures for popular enterprise apps published today, and there’s more coming soon.

14. We’re looking at a VDI environment, and vSAN keeps coming up.  What’s the win?

VSAN_VIEWVSAN_VIEWThat’s no surprise — Horizon View and VSAN were made to work with each other.VSAN_VIEWVSAN_VIEW

vSAN helps you avoid an external array.  All at once, you’ve got an environment that’s simpler, less expensive and easier to manage.  That’s a big win, right there.

When vSAN was being designed, VDI workloads were studied carefully and incorporated into the architecture from both a performance and manageability viewpoint.

Generally speaking, hybrid vSAN configs can approach the performance of entry-level all-flash arrays, and all-flash vSAN configs easily go toe-to-toe with higher-end all-flash arrays.

However, for many customers, manageability is the big win.  As the VDI admin creates user pools and assigns policies for performance, protection and persistence — all of that just flows downwards to vSAN via a shared policy mechanism.   That means that the admin can easily reconfigure their VDI environment without having to explicitly reconfigure an external storage array.  We hear that’s a pretty cool feature.

15.  I’m not into designing my own vSAN-based cluster — it looks complicated.  What are my options?

If you’d like something with an extremely simple, out-of-the-box experience, take a look at VxRail.  VxRail was developed by both VMware and Dell EMC and is the simplest and fastest deployment option.

If you’ve got some basic vSphere skills or want to do a bit of customization, take a look at the 175+ vSAN Ready Nodes that are designed for specific workload profiles, again available directly from our hardware partners.  There’s a bit more work (e.g. installing drivers, etc.) but nothing too difficult.

By the way, there is a nifty sizing and TCO calculator here.

16.  My storage team has concerns about vSAN — what should I say?

Well, the goal of vSAN was to make storage essentially “disappear” from the perspective of the vSphere administrator: very simple, no special skills required. The industry has been using the external storage array model for over twenty years, and — by comparison — vSAN doesn’t look like an external storage array, so there’s that.

On the other hand, vSphere admins are pretty adamant about the need for change.  They point out it’s wasteful and inefficient to have to go to the storage team for each and every thing they need.  Why not let the vSphere admins do storage?

The debate ends up being around two things:

  • Are the benefits worth introducing a new technology?  Even a simple vSAN TCO analysis will open a lot of eyes — both CAPEX and OPEX.
  • Where does it best fit?  — identifying the parts of the environment where it makes sense to continue with a traditional external storage model, and parts of the environment where it makes sense to collapse storage into the hypervisor with vSAN.

The real win for the storage team is that they now have more time to go work on things that require their specialized expertise vs. day-to-day routine operations.

17.   We’re looking at vSAN as well as Nutanix.  What would you emphasize?

bfc

First — performance.  All of our head-to-head testing (as well as customer testimonials) shows that vSAN has a stunning performance advantage on identical hardware with most demanding datacenter workloads, and uses less memory and CPU as well — so better consolidation ratios.

Second — operational simplicity.  With vSAN, everything is managed through vCenter and a single interface designed to be used and supported as a whole.  The administrative workflows are far simpler and more obvious as a result — no need to go back and forth as you work with two products and two vendors.  People have used both generally agree with this observation.

Third — feature support.  For example, vSphere’s DRS is a popular feature that rebalances cluster workloads.  As Nutanix depends on data locality, that can create adverse performance effects as VMs are moved and their data attempts to follow them around.  Maintenance mode is another example — the evacuation and reprotection of data is automated with vSAN, but a manual set of steps with Nutanix.

Fourth — cost.  Everyone looks at different pricing, but — generally speaking — an environment with vSphere and vSAN will cost less (hardware and software) than an environment with vSphere and Nutanix.

Fifth — choice.  If you already have a preferred server vendor, or are looking for a tailored configuration, vSAN gives you a wide world of hardware choices.

18. How fast can vSAN go?  How big can it get?  Does it show linear scalability?

PatG_VSAN_Intel

vSAN is software — scale and performance is mostly a function of the hardware you bring: CPU, memory, network, flash, controllers, etc.  Every time the hardware gets faster, vSAN gets faster as a result — and there’s plenty of cool new hardware always coming to market.

As far as maximum size, the math is easy: up to 64 nodes in a cluster, and each server supports up to 35 capacity devices (five disk groups, seven capacity devices each).  A bit of quick math yields a max of 2240 capacity devices.  Using 4TB drives, that’s just shy of 9PB raw in a single cluster.  Probably more than you need.

We’ve published multiple tests that show linear scalability as you add more nodes.  Performance also scales as you put more devices into each server: disk groups, flash cache and capacity devices — scale up as well as scale out.

19.  Can the cluster network be a limitation to performance?

Not really.  Although we support 1Gb, 10Gb is highly recommended.  Our internal testing shows that you need to get into nosebleed multi-million IOPS territory before network overhead even starts to become a factor.  For those folks (and they are out there!), there’s 40Gb.  We also have a deep-dive network design guide if you’re doing a very large multi-rack cluster.

20.  What about blade servers?

Blade servers were designed for external storage, so they don’t have a lot of internal capacity for storage devices.  VMware has qualified a few SAS-connected external storage enclosures, with more coming.

However, some of our more adventurous customers are experimenting with MCS (memory channel storage) which is essentially flash storage right on the motherboard.  They are seeing great densities as well as great performance.

As this technology matures, blade server designs should get more interesting with hypervisor-converged storage solutions like vSAN.

21.  I want to share vSAN storage across multiple clusters — how do I do this?

Sorry, that’s not how it was designed to be used.  We found that most customers think in terms of designing individual clusters, and vSAN respects that design boundary.  If you want to have a large storage pool shared across multiple clusters, you’re back to a dedicated shared storage model — using storage specialists — and a lot of the operational benefit disappears as a result.

However, as mentioned above, there are third party products that can expose a vSAN cluster’s capacity to other entities via NFS or iSCSI.

22.  How do I protect against rack failures?

vSAN 6.0 introduced a new feature — fault domains — that ensures placement of protection components across separate racks.  It’s pretty easy to use, and it’s very effective.

23.  What is this Health Check thing I’m hearing about?

roadmap

It’s a new feature in vSAN 6.0 and even received additional upgrades in 6.2, and makes it much easier for a vSphere administrator to quickly ascertain that the vSAN environment is healthy: storage devices, networks, resources, driver and firmware versions, etc. — and if there’s a problem found, what to do about it.

Every vSAN customer should be using it.

24.  Your roadmap seems to be moving pretty fast — how can I get a view into future releases?

Yes, the roadmap is moving pretty fast — and that’s the direct result of our great engineers!  Standard VMware process is to request this through your sales team, who can arrange a briefing from one of our product managers or similar under NDA. Make sure to also sign up for announced betas so you can get first-hand experience of upcoming releases.