I recently was asked about the operational benefits and cost of ownership of using vSAN with VMware Cloud Foundation. Jeff Hunter posted an excellent blog covering why “VMware Cloud Foundation Starts with vSAN.”
Having worked with vSAN in production, it simplified how we provisioned and managed storage. While no single action in standing up a new array was time-consuming the sum of the parts was quite a bit more work once you added them up. Even worse, many configurations would require different teams, different skillsets, or different architectural review that often spanned outside of the core virtualization or infrastructure teams. Here is are some example runbook for bringing up a new “pod” and deploying a new array.
Standing up a New Array
- Schedule with the vendor install team
- Configuring the array and its services
- Username and or LDAP authentication
- Configure DNS
- Configure NTP
- Configure Syslog
- Configure SMTP gateway
- Configure SNMP community
- install or configure array licensing
- RAID group or disk pool creation.
- Create LUNs
- Setting up iSCSI Targets, or FC port host group ports.
- Configuring Target Groups storage extensions (Support for VAAI/T10 flags)
- Adding IQN or WWNs (Consult with server team)
- Deploy or Configure Array management software.
- If part of a federated pool assigns federation IDs.
- Verify licensing for management software.
- Configuring Array monitoring software and phone home services.
- Configure Hosts
- Update FC HBA driver/firmware.
- Configure NPIV or other settings host side required.
- Install or verify custom Path Selection Policy.
- Verify VAAI detected from hosts.
- Review vendor-specific KBs and adjust APD timeouts or other vendor-specific recommendations.
- Update FC HBA driver/firmware.
- Create and MAP LUNs
- Configuring array side security (WWN masking, for iSCSI CHAP and IQN filtering, export files and Kerberos for NFS)
- Configuring fabric security (VLAN”s for iSCSI and NFS, hard and soft zoning for Fibre Channel)
- Deploy vCenter
- Configure SSO
- Configure security and networking for vCenter and hosts
- Configure hosts
- Configure VASA provider
- Configure certificates between array and VASA
- Configure HA for VASA (If not native to the array, may need to protect it with FT/HA)
- Install The SRA for SRM
- Configure Stretched Cluster
- Deploy and configure FCIP gateways
- Alternatively, request a dedicated lambda from the WAN team who managed the WDM devices.
- In some cases engage vendor consultants at significant cost to “Validate” design.
- Configure volumes that will be replicated.
- Deploy and configure FCIP gateways
While this isn’t everything and scripting can help, it is a fair amount of operations that will slow you down in different ways.
1. Some actions like LUN provisioning might require you to touch 2-3 different devices.
2. Conflicting projects and timelines – Some of this action require different teams or individuals with different skills. If the team who is working on WDM’s is on change freeze, or the fabric team is in the middle of a director replacement this may require a pause in this project.
3. External Staff Dependencies – In many shops some of these skills may be provided by consultants, MSPs or the vendor. As a result, aligning the giant chart to get this done may take even more time and effort.
4. Lack of redundancy – It’s not uncommon for some of these skills to never be properly cross-trained in an organization. Having to wait on Bob to come back from vacation because he is the only one who knows how to configure fabric zones is a real issue.
5. Automation is possible – While many of these steps can be automated (or removed with vVols!), it is worth noting that this often takes one individual or team actually understanding the end to end workflow. If the storage team’s lack of knowledge of VMware, or the VMware team’s lack of knowledge of storage this often means only half of a workflow becomes automated. This automation tooling may also not be portable from one array platform to another.
vSAN powered VCF offers to simplify these operations quite a bit.
- Host centered provisioning – The skill to rack and stack new VMware hosts is a fairly common one in most organizations. It also reduces the number of teams or skills involved.
- Common vSphere workflows – By removing the fibre channel fabric and array-specific skills, this reduces the need to align different organizations to accomplish provisioning
- Push button deployment – VMware Cloud Foundation offers tested, proven wizard-driven bring up capabilities that are supported end to end.
- Storage and Networking – VMware Cloud Foundation covers not only storage but also networking with NSX capabilities. A cluster that lacks networking orchestration may need to wait days or weeks on additional provisioning.