A new powerful feature of vSphere 6.7 is vSphere Health. vSphere Health works to identify and resolve potential issues before they have an impact on a customer’s environment. Telemetry data is collected from the vSphere environment then used to analyze pre-conditions within customer environments. Discovered findings can be related to stability as well as incorrect configurations in vSphere. Leveraging vSphere Health has allowed the detection of more than 100,000 potential problems per day of which approximately 1,000 are resolved daily.
Pete Koehler, Sr. Technical Marketing Manager in the Storage and Availability Business Unit (SABU), recently discussed vSAN Health and the framework around vSphere that allows for intelligent environments. vSphere & vSAN Health are very similar as they work in the same manner via data collection and analysis. Let’s dig in a bit more.
As of late vSphere Health contains ~30 different health checks that can be run against a vSphere environment. Based on recent reports, here are a few popular health checks in terms of correlation between issues discovered, and leveraging “Ask VMware” for remediation guidance:
- Memory leak in the bnx2x driver: https://kb.vmware.com/s/article/53353
- ESXi host that uses Intel Controller X710 for 10GbE SFP+ NIC fails with a PSOD: https://kb.vmware.com/s/article/2126909
- L1 Terminal Fault (L1TF – VMM) Speculative-Execution vulnerability in Intel processors: https://kb.vmware.com/s/article/55806
- ESXi 6.5 host fails with PSOD when IPV6 is disabled: https://kb.vmware.com/s/article/2150794
VMware Analytics Cloud (VAC)
VMware Analytics Cloud (VAC) is the platform that enables VMware products to send telemetry data from on premises and SaaS products to VMware. vSphere Health works in conjunction with the Customer Experience Improvement Program (CEIP) to send anonymous data to VAC for analysis which in turn provides the assessment within the vSphere Client.
When issues are discovered after VAC has analyzed the vSphere data, resolutions and recommendations are provided to guide the customer through remediation. These health checks can be incredibly helpful in terms of awareness and guidance for a customer to understand and resolve any issues found. As vSphere Health alerts/alarms are introduced, they are asynchronously displayed in customer vSphere environments that are running vSphere 6.7GA or higher. This model enables VMware to enhance issue detection in the datacenter without updating/upgrading the vSphere installation.
Telemetry data is collected and passed to VAC for analysis where data science tools and methodologies are used to provide analytical insights from the data collected. Once analyzed, insights and recommendations are sent to vSphere Health to be displayed for the customer to review and take action.
It is important to understand the data that is collected. Telemetry data received from customer environments is all anonymous and secured in VMware’s datacenters. By default hostnames, virtual machine names, and network information is not exposed or saved within the dataset.
Data collected is specifically available only for internal VMware usage and is strictly not shared with third parties. Data is accessible to VMware employees on a need to know basis only. All data flowing into VMware Analytics Cloud is governed by VMware-wide CEIP program which is pre-vetted by the Product Analytics Data Governance Board. The Product Analytics Governance Board is comprised of members from Legal, Engineering, Product, Security, Sales and IT who define the program guidelines. This team regularly monitors the program to ensure compliance.
What is CEIP? Since vSphere Health leverages CEIP lets start by explaining the CEIP program. “VMware’s Customer Experience Improvement Program (“CEIP”) provides information that helps VMware to improve our products and services, fix problems and advise you on how best to deploy and use our products. When you choose to participate in the Customer Experience Improvement Program (CEIP), VMware receives anonymous information to improve the quality, reliability, and functionality of VMware products and services.”
What data is collected via CEIP?
Enabling CEIP can be accomplished a few different ways. From the vSphere Client, during a vCenter Server Migration or Upgrade, or from the CLI. Here is an example of configuring CEIP via the vSphere Client.
Using vSphere Health
Now that we understand how vSphere Health is enabled and the prerequisites such as CEIP, we can now review how to use this feature of vSphere 6.7.
From the vSphere Client begin by clicking on the vCenter Server from the Hosts & Clusters view. Next click on Monitor and last on Health. This view will show you vSphere Health from the vCenter Server perspective including any clusters and hosts it is managing. Customers can quickly see successful health checks as well as any potential issues.
By clicking on a specific health check we can get more information. In this example I have clicked on “Customer Experience Improvement Program (CEIP)” and can review CEIP status, Configure CEIP directly from here, or Ask VMware which will jump to a webpage or KB article to provide more information or steps to resolve the issue found.
This next example allows us to review a potential issue as well as it’s resolution from VMware.
We can see that vSphere Health has found the L1TF Intel processor vulnerability on all 4 of my ESXi hosts.
Next when we click on Info I can review the findings in more detail.
When I click on Ask VMware I am taken to the VMware KB Article that can help resolve the “L1 Terminal Fault” vulnerability.
In this last example we review the vSphere Health check for vMotion when using a vSphere Distributed Switch. Now since this health check was Green it means there are no issues but we can also learn from the detailed info provided with the health check.
Clicking on the Info tab will allow the customer to review how & why MTU settings are important for vMotion operations. If additional info is needed, simply click the Ask VMware link to learn more. I feel this could be a useful way to understand how or why something is configured per VMware practices.
As you have learned, vSphere Health can be very valuable for VMware customers as it allows the environment to gain self intelligence as well as dramatically reduce potential configuration mistakes and datacenter vulnerabilities. Since vSphere Health is an emerging feature of vSphere 6.7, keep in mind it will evolve over time from what it is today. This evolution of proactive issue detection is just the beginning of extending automation between cloud and on premises for datacenter remediation purposes.
To learn more about vSphere Health please refer to the following:
- Check vSphere Health in vSphere Client
- VMware products that participate in the CEIP program
- vSphere Documentation
- Dynamic vSphere Health Checks in vSphere 6.7+
Please do not hesitate to post questions or even suggestions for additional vSphere Health checks in the comments below!