Home > Blogs > VMware vSphere Blog

VMware Virtual SAN Operations: Replacing Disk Devices

VSAN-Ops-LogoIn my previous Virtual SAN operations article, “VMware Virtual SAN Operations: Disk Group Management” I covered the configuration and management of the Virtual SAN disk groups, and in particular I described the recommended operating procedures for managing Virtual SAN disk groups.

In this article, I will take a similar approach and cover the recommended operating procedures for replacing flash and magnetic disk devices. In Virtual SAN, drives can be replaced for two reasons; failures, and upgrades. Regardless of the reason whenever a disk device needs to be replaced, it is important to follow the correct decommissioning procedures.

Replacing a Failed Flash Device

The failure of flash device renders an entire disk group inaccessible (i.e. in the “Degraded” state) to the cluster along with its data and storage capacity.  One important observation to highlight here is that a single flash device failure doesn’t necessarily mean that the running virtual machines will incur outages. As long as the virtual machines are configured with a VM Storage Policy with “Number of Failures to Tolerate” greater than zero, the virtual machine objects and components will be accessible.  If there is available storage capacity within the cluster, then in a matter of seconds the data resynchronization operation is triggered. The time for this operation depends on the amount of data that needs to be resynchronized.

When a flash device failure occurs, before physically removing the device from a host, you must decommission the device from Virtual SAN. The decommission process performs a number of operations in order to discard disk group memberships, deletes partitions and remove stale data from all disks. Follow either of the disk device decommission procedure defined below.

Flash Device Decommission Procedure from the vSphere Web Client

  1. Log on to the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the cluster object
  3. Go to the manage tab and select Disk management under the Virtual SAN section
  4. Select the disk group with the failed flash device
  5. Select the failed flash device and click the delete button

Note: In the event the disk claim rule settings in Virtual SAN is set to automatic the disk delete option won’t be available in the UI. Change the disk claim rule to “Manual” in order to have access to the disk delete option.

Flash Device Decommission Procedure from the CLI (ESXCLI) (Pass-through Mode)

  1. Log on to the host with the failed flash device via SSH
  2. Identify the device ID of failed flash device
    • esxcli vsan storage list


  1. delete the failed flash device from the disk group
    • esxcli vsan storage remove -s <device id>


Note: Deleting a failed flash device will result in the removal of the disk group and all of it’s members.

  1. Remove the failed flash device from the host
  2. Add a new flash device to host and wait for the vSphere hypervisor to detect it, or perform a device rescan.

Note: These step are applicable when the storage controllers are configured in pass-though mode and support hardware hot-plug feature.

Upgrading a Flash Device

Before upgrading the flash device, you should ensure there is enough storage capacity available within the cluster to accommodate all of the currently stored data in the disk group, because you will need to migrate data off that disk group.

To migrate the data before decommissioning the device, place the host in maintenance mode and choose the suitable data migration option for the environment. Once all the data is migrated from the disk group, follow the flash device decommission procedures before removing the drive from the host.

Replacing a Failed Magnetic Disk Devices

Each magnetic disk is accountable for the storage capacity it contributes to a disk group and the overall Virtual SAN datastore. Similar to flash, magnetic disk devices can be replaced for failures or upgrade reasons. The impact imposed by a failure of a magnetic disk is smaller when compared to the impact presented by the failure of a flash device. The virtual machines remain online and operational for the same reasons described above in the flash device failure section.  The resynchronization operation is significantly less intensive than a flash device failure. However, again the time depends on the amount of data to be resynchronized.

As with flash devices, before removing a failed magnetic device from a host, decommission the device from Virtual SAN first. The action allows Virtual SAN to perform the required disk group and devices maintenance operations as well as allow the subsystem components to update the cluster capacity and configuration settings.

vSphere Web Client Procedure (Pass-through Mode)

  1. Login to the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster
  3. Go to the manage tab and select Disk management under the Virtual SAN section
  4. Select the disk group with the failed magnetic device
  5. Select the failed magnetic device and click the delete button

Note: It is possible to perform decommissioning operations from ESXCLI in batch mode if required. The use of the ESXCLI does introduces a level of complexity that should be avoided unless thoroughly understood. It is recommended to perform these types of operations using the vSphere Web Client until enough familiarity is gained with them.

Magnetic Device Decommission Procedure from the CLI (ESXCLI) (Pass-through Mode)

  1. Login to the host with the failed flash device via SSH
  2. Identify the device ID of failed magnetic device
    • esxcli vsan storage listmag-change
  3. delete the magnetic device from the disk group
    • esxcli vsan storage remove -d <device id>HDD-UUID-CLI
  4.  Add a new magnetic device to the host and wait for the vSphere hypervisor to detect it, or perform a device rescan.

Upgrading a Magnetic Disk Device

Before upgrading any of the magnetic devices ensure there is enough usable storage capacity available within the cluster to accommodate the data from the device that is being upgraded. The data migration can can be initiated by placing the host in maintenance mode and choosing a suitable data migration option for the environment. Once all the data is offloaded from the disks, proceed with the magnetic disk device decommission procedures.

In this particular scenario, it is imperative to first decommission the magnetic disk device before physically removing from the host. If the disk is removed from the host without performing the decommissioning procedure, data that is cached from that disk will end up being permanently stored in the cache layer. This could reduce the available amount of cache and eventually impact the performance of the system.

Note: The disk device replacement procedures discussed in this article are entirely based on storage controllers configured in pass-through mode. In the event the storage controllers are configured in a RAID0 mode, follow the manufactures instructions for adding and removing disk devices.

– Enjoy

For future updates on Virtual SAN (VSAN), Virtual Volumes (VVols), and other Software-defined Storage technologies as well as vSphere + OpenStack be sure to follow me on Twitter: @PunchingClouds

Infographic – Walk Through of VMware Availability

Here’s a visualization we put together to help people understand the various offerings from VMware that can positively affect your levels of availability.

Hope you like it!

VMware-Availability  <click here for pdf>

IOUG Survey Webcast Replay Site

The permanent home for the IOUG “Oracle on vSphere” survey webcast is listed below along with the actual survey itself. The “Oracle on vSphere” VMware press book link is also listed.

IOUG Survey


IOUG Survey Webcast


IOUG Survey blogs


The official VMware press book and the definitive authority on the subject of Oracle on vSphere:


vSphere Replication Appliance Failure Prevention and Recovery

vSphere Replication is an excellent host-based, per-VM replication solution that is included with vSphere Essentials Plus Kit and higher editions. That’s right – if you have vSphere Essentials Plus or higher, you have vSphere Replication. There are several use cases for vSphere Replication: Migrating VMs from old hardware to new hardware, migrating VMs between data centers, and disaster recovery – with or without vCenter Site Recovery Manager (SRM) – to name a few. When talking with customers, we tend to cover the features and benefits for starters and move on to how it works – and then what happens when issues such as hardware failure, administrative mistakes, etc. occur.

In this article, we will not discuss all of the details around how it works, but at a high level, changed data for each protected VM is replicated from vSphere hosts at the source location to one or more vSphere Replication virtual appliance(s) at the target location. The vSphere Replication appliance(s) then write this replicated data to vSphere storage at the target location. This often raises questions about what happens if these vSphere Replication appliances go offline or are lost. That is what we will cover in this article.

Continue reading

Virtual SAN Backup with VDP – New White Paper

Hot off of the press: A new white paper that discusses backing up virtual machines running on VMware Virtual SAN (VSAN) using VMware vSphere Data Protection (VDP).vsan_vdp_white_paper These are the main topics that are covered:

  • VDP Architectural Overview
  • Virtual SAN Backup using VDP
  • Factors Affecting Backup Performance

The paper details test scenarios, how backup transport modes affect CPU and memory utilization of the VDP virtual appliance, and how the vSphere hosts management network is impacted when the Network Block Device over Secure Sockets Layer (NBDSSL) transport mode is utilized. The paper concludes with a summary of observations, recommendations when deploying the VDP virtual appliance to a Virtual SAN datastore, and some discussion around transport modes and running concurrent backups. A special thank you goes to Weiguo He for compiling this data and writing this paper!

Click here to view/download VMware Virtual SAN Backup Using VMware vSphere Data Protection


VMware Configuration Guide for Virtual SAN HCL Component Updates

The Virtual SAN Configuration Guide has been updated with new components. We recently certified 12 SSDs, updated 4 existing SSD certifications, and updated firmware information for 2 HDDs. Make sure to visit the VMware Configuration Guide for Virtual SAN for more details!

Here is a list of changes:

New SSDs
•  NEC S3700 400GB SATA 2.5 MLC RPQ
•  NEC N8150-712

Updated SSD Certifications
• Samsung SM1625 800GB SAS SSD1
• Cisco UCS-SD800G0KS2-EP
• EMC XtremSF1400 PCIEHHM-1400M
• EMC XtremSF700 PCIEHHM-700M

Updated Diskful Writes per Day (DWPD) for Samsung and Cisco drives
A new firmware, B210.06.04, was certified for EMC PCI-E SSDs

HDD Firmware Information Updates
•  Fujitsu HD SAS 6G 1.2TB 10K HOT PL 2.5” EP
•  Hitachi 6Gbps,900GB,10000r/min,2.5in.


Automatic Backup of VMs Migrated by SRM

A question came across my desk a few days ago around being able to automatically back up VMs that have been migrated by vCenter Site Recovery Manager (SRM). After a bit of thought, it seems this is fairly simple to solve. With SRM, I migrate VMs to a resource pool at my recovery site. Why you might ask? This resource pool is configured with shares set to “High”. This has no impact during normal operations, but when I migrate or fail over important workloads, I want to be sure these workloads have priority if there is contention for CPU and memory. However, this also creates a secondary benefit when it comes to backing up migrated VMs…


Continue reading

VMworld US 2014 in San Fransisco Panel Discussion “Applications on Oracle on vSphere” Panel

Mike Adams and Don Sullivan conducted a panel discussion at VMworld in San Francisco in August which highlighted a group of very successful VMware customers. The panelists answered questions pertaining to their varied application implementations running on Oracle on vSphere.  Their applications stories varied from SAP to Homegrown travel applications and from Oracle Identity Management to a broad sampling of Higher Education Applications.  The commonality was that they were all spectacularly successful.

Dan Young represented Indiana University when receiving the “VMware Innovation Award” rushed over from the Marriott Marquis to join the panel and highlighted the day with entertaining commentary of how he was sitting on the panel and receiving the award the same week that 128,000 Indiana University students were registering for classes.   He explained to the crowd how the virtualized database services model and the stability that the architecture provided was responsible for affording him this opportunity.  Jon Tucker from First National Bank of Nebraska in Omaha talked glowingly of the success of the bank’s Oracle Identity Management system and Kris Cook of SITA described how virtualization enhanced their homegrown travel applications environment.  Finally Mike Peter’s of Conagra got a big laugh when he described how their 20vCPU virtualized SAP Supply Chain system running on an Oracle DB was responsible for delivering all the frozen foods that the audience might consume that evening.

Each of these customers brings a unique experience with a common story.  “This is my application and it is very important and it runs better because it runs on vSphere”

For the first time VMware also conducted an Oracle customer panel at VMworld Europe in Barcelona but no video was produced of that event.  The event in Europe featured Eric Mealy CIO or Societe Generale bank of Luxembourg and Tobias Appelo of Cap Gemini who discussed their implementation at Centrica Energy in the UK.

Below is the YouTube link for the video from the San Francisco event.

VMworld US (SF) 2014 – Applications on Oracle on vSphere Customer Success Stories (Conagra, SITA, IU, FBNO)


Operationalizing VMware Virtual SAN: Configuring vCenter Alarms

VMware Virtual SAN has received amazing response from the virtualization community. Now as more and more customers are completing the acquisition and implementation processes, we are receiving more requests for operational guidance. Day 2 operations is perhaps my favorite topic to explore. Essentially the questions asked can be summed up as “Ok, I have done the research, proved the concept, and now have this great new product. Help me know the recommended practices to monitor, manage, and troubleshoot the inevitable issues that pop up with any software”. This question is the driver behind our new blog series, “Operationalizing VMware Virtual SAN“.

In this series, our aim is to take your most frequently asked questions around Virtual SAN Operations and provide detailed recommendations and guidance. In our first article in this series we look to answer the question “How do I configure vCenter Alarms for Virtual SAN?

(Many thanks to William Lam (@vGhetto), Christian Dickmann (@cdickmann), Rawlinson Rivera (@PunchingClouds), and Ken Werneburg (@vmKen) for their much appreciated interest and contribution to this series): [Joe Cook: @CloudAnimal]

Continue reading

Getting Started with VMware Virtual SAN – Join the Inaugural #VSANchat on 12/2!

By now, you’ve probably heard about VMware Virtual SAN – the industry leader in hyper-converged software defined storage for virtual environments. You’ve probably also heard how VMware Virtual SAN can lower TCO by up to 50%, or how it is the first policy-driven storage product designed for vSphere environments that simplifies and streamlines storage provisioning and management.

But in case you haven’t heard, we’d like to invite you to #VSANchat – our inaugural Twitter chat, where our experts and the larger Storage community will be discussing how to get started with Virtual SAN and things to consider. On December 2nd, 2014, at 11am PT, we invite you to discuss tips, best practices and insight on everything VMware Virtual SAN.

Continue reading