The Problem of Hidden Costs
Years ago I bought a (relatively) fancy car. for a reasonable price. It got great gas mileage and I got it at a great price. I quickly found out though that the car suffered from a few problems.
- This car required expensive parts when things broke. A lot broke before I could really get any use out of it at the beginning.
- This car had enough problems that it spent a lot of time in the shop. This was far too much of my car ownership.
- I was paying a lot for this car to sit idle in an airport parking lot.
I learned the lesson that Cost!=Price. This past year I did the math and discovered that idle time on my car at airport parking was more expensive than using Lyft/Uber to get to the airport. The cost overheads of “idle time” could be 3x the cost of taking a different transportation approach. In the same way, IT infrastructure costs are not always what they appear. We need to stop looking at only the initial purchase cost, and instead factor in the cost of operating that infrastructure.
HyperConverged Infrastructure (HCI) powered by VMware vSAN helps optimize costs in many ways. IT assets (much like cars depreciate in value quickly). It’s time to quit paying for infrastructure that is stuck.
Idle Resources – A Parked Asset Still Has Cost
The speed of bringing an asset into and out of production is a critical cost that is often underlooked. “Migration Limbo” is the time that an asset spends not being used, and depreciating the capital purchase value, or burning monthly lease payments sitting idle. If I spend 300K on a storage system with 3 years of support and it spends 6 months being brought online, and 6 months in transition on the way out I’ve amplified my yearly cost of ownership by 50%. What can cause infrastructure to sit idle?
Hardware Deployment Limbo
Too big to fail/move – If the space for the replacement system needs to go where the old system was this can cause challenges. How do we do a swing migration when we are out of rack space? When using HCI this process is as simple as replacing a single server at a time. Storage arrays have to be stood up simultaneously for the data to be migrated, and it’s often difficult or impossible to “cut them in half.”
Rack Space Jenga – Even if existing rack space exists it may not be contiguous. Storage arrays are often limited by SAS cabling to small areas they can operate in. Some systems require proprietary and thus dedicated racks. Once deployed individual shelves that are connected by daisy chains cannot easily be moved, and certainly should. vSAN avoids this challenge as nodes can be separated great distances, and physically moved one server at a time. vSAN can even take advantage of the location of nodes using Fault Domains to make sure data is mirrored across racks/rooms that the nodes were scattered across.
Specialty installers/movers required – A line item at the bottom for many storage arrays is for professional services to do the installation. While this may or may not be at a significant price, there are hidden costs to this. The first is time to schedule with these teams. It can sometimes take weeks to schedule these resources, and acquire badge access to the datacenter. This adds additional people to the Gaant chart to bring a system online. This cost is also extended to migrations where support will not be guaranteed or honored additional services fees (and scheduling delays) are not invoked to move the systems. HCI uses regular rack servers which lack these restrictive requirements and was designed to be field installable. Faster tooling like “rapid rails” has been designed for simple, toolless installation. At a conference I helped racked a vSAN Cluster with the help of an otherwise non-trained “marketing guy” in a few minutes.
Cable Plant Complexity – Replacing a large central storage array can require a large quantity of fiber-optic cable plant be run, or replaced at a single time. This may cause cascading requirements of new patch panels, new cable runs to take advantage of faster speeds.
Software and Configuration Limbo
Compatability List Alignment – Deploying a new storage array requires validating that the Array Microcode, Storage Fabric OS version, HBA Driver and Firmware, storage virtualization engines, Hypervisor and operating systems are all rated as cross-compatible. The alignment of the different compatibility lists may require stair step upgrades. Multiple arrays from different vendors needing to talk to the same cluster make this more difficult to reconcile. The conflicting compatibility lists may force a situation where it is impossible for everything to be in a supported configuration. Given the number of teams involved in these upgrades making existing infrastructure “new array ready” can take weeks or months. HCI powered by vSAN avoids these challenges by having the storage software embedded in ESXi. This removes the need for cross-compatibility testing of vSAN and ESXi. Leveraging Ethernet for storage transport removes the need for fabric software compatibility. Lifecycle tooling to update controllers driver and firmware on the hosts removes the largest compatibility bottleneck.
Self Service Updates – The software to upgrade storage array and storage fabric products is often hidden behind logins. In some cases it may only be available to certified specialists who are scheduling may further delay a roll out. As most systems ship with older software, being able to self-service update to the desired version makes HCI powered by vSAN much faster to get from racked to ready. vSAN ReadyNodes having pre-installed ESXi can make this process even easier.
Identifying what version to deploy – Proprietary storage systems often hide, or require NDA’s to read release notes or critical documents. Their release matrices will commonly require an active support agreement to view. This can delay deployment of a system while anecdotal evidence of what changed in a release must be collected manually to identify what version should be deployed. VMware HCI powered by vSAN has publically available release notes. VMware compatibility matrices are not hidden behind portals.
Connections and security – Fibre Channel arrays need Fibre Channel Zones created, and host masking configured. Even iSCSI needs IQN groups set up, and masking configured. This requires potentially multiple teams communicate and in cases of automation deployment new plugins, or workflows created. vSAN powered HCI automates ESXi firewall rules between hosts, and simply needs it’s own isolated layer 2 or 3 Ethernet transport to function.
Expansion – While adding a tray to a storage array may require a product specialist, adding a host to a vSphere cluster is a skill most VMware administrators are familiar with. Host profiles make sure that a host is configured as expected (same DNS, NTP, security settings, Syslog location, etc.). Configuration Assistant (vSAN) helps remediate issues like setting up vMotion or other networking elements on new hosts, and the vSAN health check makes sure the new host is communicating with the other hosts in the cluster. Adding capacity is as simple as adding a new disk group. This is a straightforward process that takes a few clicks.
Removal of assets – Long storage migrations that require app or file level migrations can force you to renew maintenance on a storage array “for one more year” while you get off of the platform. Some vendors will “spike” this support renewal to make the cost unappealing vs. a forklift migration to their new platform. This often results in a forced situation where the only cost-effective support renewal involves having to purchase something else from the vendor. This leads to unpredictable operational costs. This also leads to “big decisions” on expansion. You must balance the sunk cost of expanding the existing storage system against it’s planned obsolesce. Having to add capacity when a system will go end of life in the next 18 months requires less capital than buying a new system, but it also leads to poor usage of capital for those drives you will soon have to throw away.
The speed of decommissioning an asset is just as important as bringing an asset online. In many cases, if an asset cannot be brought out of production in time, expensive support renewals further sap operational budget. Many traditional arrays cannot shrink storage pools, remove disks, raid groups or shelves easily if at all. Migrations off of a platform also are often an “all or nothing” process that requires moving all data out of the frame. Non-disruptive migrations often require using the same storage hardware vendor, or storage virtualization that adds extra complexity and cost.
Removing nodes in a vSAN cluster is a relatively easy process. Putting a host into maintenance mode and selecting “full evacuation” and all virtual machines on a host and data from vSAN will be drained. Once it has entered maintenance mode, it can be removed from the cluster and pulled from the rack to return the server or send it off for recycling. A node can also easily be repurposed for a different cluster or shipped to a location. Migration of data between clusters is also possible leveraging vMotion technology that existing staff are trained on. With vSAN disk groups and individual drives can also be evaluated for removal.
How Do We Drive Beyond Price?
The hidden costs of traditional storage models come to light especially during an operational process like migrating data between platforms. When purchasing cars or infrastructure make sure you look beyond the sales staff promising a low price. Make sure you understand the hidden fees will be needed to keep things moving.
Can’t get enough of vSAN? Follow us on Twitter and Facebook!
2 comments have been added so far
Great reality check here for storage appliance users. Thanks. Great write-up!