There are certainly a number of blogs on the Web that talk about software-defined storage, and in particular Virtual SAN. But as someone who has worked at VMware for nine years, my goal is not to rehash the same information, but to provide insights from my experiences.
At VMware, much of my time was spent working for Global Support Services; however, over the last year-and-a-half, I have been working as a member of the Professional Services Engineering team.
As a part of this team, my focus is now on core virtualization elements, including vSphere, Virtual SAN, and Health Check Services. Most recently I was challenged with getting up to speed with Virtual SAN and developing an architecture design for it. At first this seemed pretty intimidating, since I had only heard about the marketing details prior to this; however, Virtual SAN truly did live up to all the hype about being “radically simple”. What I found is that the more I work with Virtual SAN the less concerned I became with the underlying storage. After having used Virtual SAN and tested it in customer environments, I can honestly say my mind is very much changed because of the absolute power it gives an administrator.
To help simplify the design process I broke it out into the following workflow design to not only simplify it for myself, but to help anyone else who is unaware of the different design decisions required to successfully implement Virtual SAN.
Workflow for a Virtual SAN Design
When working with a Virtual SAN design, this workflow can be quite helpful. To further simplify it, I break it down into a four key areas:
- Hardware selection – In absolutely every environment I have worked in there has always been a challenge to select the hardware. I would guess that 75 percent of the problems I have seen in implementing Virtual SAN have been as a result of hardware selection or configuration. This includes things such as non-supported devices or incorrect firmware/drivers. Note: VMware does not provide support for devices that are not on the Virtual SAN Compatibility List. Be sure that when selecting hardware that it is on the list!
- Software configuration – The configuration is simple—rarely have I seen questions on actually turning it on. You merely click a check box, and it will configure itself (assuming of course that the underlying configuration is correct). If it is not, the result can be mixed, such as if the networking is not configured correctly, or if the disks have not been presented properly.
- Storage policy – The storage policy is at first a huge decision point. This is what gives Virtual SAN its power, the ability to configure what happens with the virtual machine for performance and availability characteristics.
- Monitoring/performance testing/failure testing – This is the final area and it is in regards to how you are supposed to monitor and test the configuration.
All of these things should be taken into account in any design for Virtual SAN, or the design is not really complete. Now, I could talk through a lot of this for hours. Rather than doing that I thought it would be better to post my top “gotcha” moments, along with the lessons learned from the projects I have been involved with.
Inevitably, “gotcha” moments will happen when implementing Virtual SAN. Here are the top moments I have run into:
- 1. Network configuration – No matter what the networking team says, always validate the configuration. The “Misconfiguration detected” error is by far the most common thing I have seen. Normally this means that either the port group has not been successfully configured for Virtual SAN or the multicast has not been set up properly. If I were to guess, most of the issues I have seen are as a result of multicast setup. On Cisco switches, unless an IGMP Snooping Carrier has been configured OR IGMP snooping has been explicitly disabled on the ports used for Virtual SAN, configuration will generally fail. In the default configuration it is simply not configured, and therefore—even if the network admin says it is configured properly it may not be configured at all—double check it to avoid any pain
- Network speed – Although 1 GB networking is supported, and I have seen it operate effectively for small environments, 10 GB networking is highly recommended for most configurations. I don’t just say this because the documentation says so. From experience, what it really comes down to here is not the regular everyday usage of Virtual SAN. Where people run into problems rather is when an issue occurs, such as during failures or periods of heavy virtual machine creation. Replication traffic during these periods can be substantial and cause huge performance degradation while they are occurring. The only way to know is to test what happens during a failure or peek provisioning cycle. This testing is critical as this tells you what the expected performance will be. When in doubt, always use 10 GB networking.
- Storage adapter choice – Although seemingly simple, the queue depth of the controller should be greater than 256 to ensure the best performance. This is not as much of an issue now as it was several months ago because the VMware Virtual SAN compatibility list should no longer have any cards that are under 256 queue depth in it anymore. Be sure to verify though. As an example, there was one card when first released that artificially limited the queue depth of the card in the driver software. Performance was dramatically impacted until an updated driver was released.
There are always lessons to be learned when using new software, and ours came with a price of a half or full day’s work in trying to troubleshoot issues. Here’s what we figured out:
- Always verify firmware/driver versions – This one always seems to be overlooked, but I am stating it because of experiences onsite with customers.One example that comes to mind is where we had three identical servers bought and shipped in the same order that we were using to configure Virtual SAN. Two of them worked fine, the third just wouldn’t cooperate, no matter what we did. After investigating for several hours we found that not only would Virtual SAN not configure, but all drives attached to that host were Read only. Looking at the utility that was provided with the actual card itself showed that the card was a revision behind on the firmware. As soon as we upgraded the firmware it came online and everything worked brilliantly.
- Pass-through/RAID0 controller configuration – It is almost always recommended to use a pass-through controller such as Virtual SAN, as it is the owner of the drives and can have full control of them. In many cases there is only RAID0 mode. Proper configuration of this is required to avoid any problems and to maximize performance for Virtual SAN. First, ensure any controller caching is set to 100% Read Cache. Second, configure each drive as its own “array” and not a giant array of disks. This will ensure it is set up properly.As an example of incorrect configuration that can cause unnecessary overhead, several times I have seen all disks configured as a single RAID volume on the controller. This shows up as a single disk to the operating system (ESXi in this case), which is not desired for Virtual SAN. To fix this you have to go into the controller and configure it correctly, by configuring each disk individually. You also have to ensure the partition table (if previously created) is removed, which can—in many cases—involve a zero out of the drive if there is not an option to remove the header.
- Performance testing – The lesson learned here is you can do an infinite amount of testing – where do you start and stop. Wade Holmes from the Virtual SAN technical marketing team at VMware has an amazing blog series on this that I highly recommend reviewing for guidance here. His methodology allows for both basic and more in-depth testing to be done for your Virtual SAN configuration.
I hope these pointers help in your evaluation and implementation of Virtual SAN. Before diving head first in to anything, I always like to make sure I am informed about the subject matter. Virtual SAN is no different. To be successful you need to make sure you have genuine subject matter expertise for the design, whether it be in-house or by contacting a professional services organization. Remember, VMware is happy to be your trusted advisor if you need assistance with Virtual SAN or any of our other products!
Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments.