Home > Blogs > VMware Support Insider > Monthly Archives: May 2012

Monthly Archives: May 2012

ESX Hosts Randomly Lose Paths in Virtualized Storage Environment

Another post today in the Nathan Small (Twitter handle: vSphereStorage) series covering deep-dive log analysis in ESX/ESXi environments, where Nathan takes a look at a case where ESX 5 hosts randomly lose paths in an EMC VPLEX environment with HP EVA and EMC VMAX arrays behind a EMC VPLEX. The root cause for the path loss was the massive I/O load happening on the backend array, specifically the HP EVA. The resolution is to reduce the I/O load or spread it out over a longer period of time. Let's see how the story unfolded.

The ESX servers that are in front of the EMC VPLEX are experiencing path failures due to SCSI commands that timeout. These SCSI commands range from 10 bytes read and write operations to LUN heartbeat mechanisms (TEST_UNIT_READY). In this example, we can see that LUN 1 or “Vplex_HP1” disappeared for less than a second. Here is an example of the command timeout that leads to path failures. These messages can be pulled from /var/log/vobd.log

2012-05-22T07:02:04.680Z: [scsiCorrelator] 1680920122975us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T1:L1 changed state from on

2012-05-22T07:02:04.681Z: [scsiCorrelator] 1680923124828us: [esx.problem.storage.redundancy.degraded] Path redundancy to storage device naa.6000144000000010a00f545a97c5704a degraded. Path vmhba2:C0:T1:L1 is down. Affected datastores: "Vplex_HP1".

2012-05-22T07:02:04.775Z: [scsiCorrelator] 1680920217911us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T1:L1 changed state from dead

2012-05-22T07:02:04.776Z: [scsiCorrelator] 1680923219503us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.6000144000000010a00f545a97c5704a (Datastores: "Vplex_HP1") restored. Path vmhba2:C0:T1:L1 is active again.

The above messages would be similar to what is observed in vCenter. Here are the entries from the vmkernel log. Note that extended Qlogic driver logging was enabled:

2012-05-22T07:02:04.680Z cpu12:4628)<6>scsi(8:0:1:1): TIMEOUT status detected 0×6-0×0

2012-05-22T07:02:04.680Z cpu7:2200)PowerPath: EmcpEsxLogEvent:1252: Info:emcp:MpxEsxTestAndUpdatePathState: Path "vmhba2:C0:T1:L1" is changing to dead from on.

Here we have a command timeout, and the command that timed out was a TEST_UNIT_READY or SCSI command 0×0. Even though we cannot confirm the exact command that failed, we can infer this to be the case since the path is marked as dead immediately. If any other command were to fail, we would see NMP errors and then send out a TEST_UNIT_READY to ensure the path is still good to use. Since this wasn’t the case, we have to assume that this command was sent out as path of the Disk.PathEval code that runs every 300 seconds (default).

There is another ESX environment connected to the same backend HP array, however these hosts do not connect to the EMC VPLEX to get to the HP EVA LUNs. During this same time period in that other environment we see many command failures due to a SCSI device status of QUEUE FULL, which means the array cannot keep up with the I/O being issued to it:

2012-05-22T07:02:02.087Z cpu10:2058)ScsiDeviceIO: 2309: Cmd(0x4124403d5ac0) 0x2a, CmdSN 0x800e0011 from world 1829086 to dev "naa.6001438005ded7fb0000500009510000" failed H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.

2012-05-22T07:02:02.087Z cpu10:2058)ScsiDeviceIO: 2309: Cmd(0x4124414d9d40) 0x2a, CmdSN 0x800e007a from world 1829086 to dev "naa.6001438005ded7fb0000500009510000" failed H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.

2012-05-22T07:02:04.755Z cpu12:2060)ScsiDeviceIO: 2309: Cmd(0x41244120a8c0) 0x2a, CmdSN 0x800e006e from world 1829086 to dev "naa.6001438005ded7fb00005000096c0000" failed H:0×0 D:0×28 P:0×0 Possible sense data: 0×0 0×0 0×0.

The Queue Full condition is represented by a device status of D:0×28, which translates to TASK_SET_FULL. See KB article Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.0 (1030381) for more details, as well as other SCSI Device Statuses.

Even though this HP array cannot keep up with the load, no path failures are observed on these ESX hosts, mostly because the ESX host is aware of the QFULL condition and knows it is a transient one so it knows the paths aren’t actually dead. The ESX hosts that are in front of the VPLEX never see this SCSI device status because the VPLEX does not return this to the hosts. Instead the VPLEX will retry the command for up to 16 seconds before giving up on the command. The ESX hosts have a SCSI command timeout of 5 seconds, so the hosts will give up on the command and send an abort for it well before the VPLEX has given up retrying the command. This would cause the VPLEX to see many command abort requests from the ESX hosts. The VPLEX has a counter to confirm that it is receiving the QFULL condition from the HP array. When reviewing this counter we learned that the QFULL condition was being returned to the VPLEX from the HP EVA array.

The ESX hosts in front of the VPLEX are marking paths as dead because many commands are timing out, particularly TEST_UNIT_READY commands. If the QFULL condition were to be returned to the ESX hosts by the VPLEX, no paths would be marked as dead because the commands failed due to a condition that is meant to be transient. In the end, the amount of load on the HP EVA array needs to be spread out over a longer period of time or staggered more efficiently so that a QFULL condition does not occur.

There has been code introduced into ESX that will automatically reduce the queue depth of LUNs returning a QFULL status. For more information, you can read about it in KB article: Controlling LUN queue depth throttling in VMware ESX/ESXi (1008113).

Bookmarks for My VMware KBs

My VMware BookmarkIt seems ages ago now that My VMware launched. The Knowledge Management team has produced dozens of articles all relating to various aspects of the portal, and answering common questions we received from customers. We estimate that 3 Man-Months have gone into the creation of this content. What then follows is the problem of finding it all.

We decided to put together the grand-daddy of all My VMware KBs in the form of an index. We will be updating this index as time goes on if and when we add to the body of captured knowledge.

Click the following link and bookmark it if you are a frequent user of My VMware:

Index of My VMware articles (2020793)

My VMware Sneak Peak – New Customer Data

Here is the fourth in a series of videos aimed at providing an early look and insight into the new My VMware portal which launched a few weeks ago.

My VMware will transform your product license and support management experience by providing a new integrated, self-service, account-based interface focused on simplifying and streamlining your online activities with VMware

Additional information is available at http://www.vmware.com/my_vmware/overview.html

You'll find a great deal of "How To" and Troubleshooting based articles in our Knowledge Base at http://kb.vmware.com by simply changing the product filter to "My VMware".

Knowledge Base Update

***UPDATE: The Knowledge Base is now available.

VMware is aware of an issue where the knowledge base becomes slow or unresponsive. We are actively troubleshooting the issue. We do not have an ETA at this time.

We want to communicate to our customers that the outage is unplanned, and being worked on by every resource necessary to get things back to normal. We sincerely apologize for the inconvenience and we will update this blog post as more news becomes available.

You can also watch our Twitter feed at: http://twitter.com/VMwareKB for updates.

Mandarin Articles introduced to the Knowledge Base

Today we would like to introduce our Mandarin speaking customers to our first set of freshly translated Knowledgbase articles These articles have been carefully chosen from our vast repository due to their popularity and usefulness. Subscribe to this blog to receive notifications of more translations as they become available.

今天我们向中文客户介绍第一批翻译成中文的知识库文章,这些文章是从我们的知识库里根据点击率和实用性精心挑选的常用文章,订阅这个博客您将接收到关于更多翻译成中文的知识库文章发表的通知。

English title English KB number Mandarin Title Mandarin KB number
Overview of VMware tools 340 VMware Tools概述 2020928
Moving or copying vitual disks in a VMware environment 900 在VMware虚拟环境下移动或拷贝虚拟磁盘 2020929
Cleaning up after incomplete uninstallation on a windows host 1308 Windows主机上未完全卸载后的清理 2020930
Adjusting ESX host time zone 1436 调整ESX主机时区 2020931
VMotion CPU Compatibility Requirements for Intel Processors 1991 Intel处理器VMotion CPU的兼容性要求 2020948
Windows XP setup can not find any hard disk drives during installation-reviewed 1000863 Windows XP安装程序在安装过程中无法找到任何硬盘驱动器 2020953
Renaming a virtual machine disk(VMDK) via vSphere Management Assistant(vMA) or vSphere CLI(vCLI) 1002491 使用vSphere Management Assistant(vMA)或vSphere CLI(vCLI)重新命名一虚拟机磁盘(VMDK) 2020958
Recreating a missing virtual machine disk(VMDK) descriptor file 1002511 重建丢失的虚拟机磁盘(VMDK)描述文件 2020962
Testing port connectivity with Telnet-Revised  1003487 使用Telnet验证端口的连通性 2020963
Injecting SCSI controller device drivers into windows when it failes to boot after converting it with VMware converter 1005208 使用VMware Converter转换后的Windows虚拟机无法启动手动加载SCSI控制器驱动程序 2020967
Time keeping best practices for Linux guests-reviewed 1006427 Linux系统时间同步最佳实践 2020975

Scheduled Maintenance – May 25, 2012

MaintenanceVMware will be performing a system upgrade to several VMware web applications on Friday, May 25th, 2012 from 7:00PM until 8:30PM Pacific Time. During this time, we request that you file your Support Requests via phone.

If you need to file a support request while the upgrade is in progress, our global toll-free numbers for support can be found at: http://www.vmware.com/support/phone_support.html

These system upgrades are part of our commitment to continued service improvements and will help VMware better serve your needs. We appreciate your patience during this maintenance period.

VMware View 5.1 Tech Preview – A deeper look at the View Composer API for Array Integration

To start off with the new week we have a new video which will demonstrate our View Composer API for Array Integration in VMware View 5.1 which is currently in a "Tech Preview" state. "Tech Preview" means that the feature is experimental and should only be used in a test and development environment. For more details read about VMware Experimental Feature Support.

Essentially what the feature allows you to do is to offload the creation of the linked clones which back your View desktops to the storage array, and let the storage array handle this task. The main advantage of VCAI is an improvement in performance and a reduction in the time taken to provision desktops based on linked clone pools. This task can now be offloaded to the array, which can then provision these linked clones natively rather than have the ESXi host do it.

This video features our friend Cormac Hogan who is a Senior Technical Marketing Architect for Storage here at VMware.  You can read more concerning this topic in: A closer look at the View Composer API for Array Integration [incl. Video].

You can follow Cormac at @VMwareStorage on Twitter and keep updated with all things related to VMware Storage by subscribing to the VMware Storage Blog.

How to open a command or shell prompt

Today we have a new video for you which discusses and demonstrates how to open a command or shell prompt on a Windows-based system, a Linux-based system or a Mac OS-based system as well showing you how to connect to a system using a Secure Shell or SSH connection.

The process of opening a command or shell prompt is a required step in many of our KB articles. We realize this is very basic stuff for many of you, but many others have asked for this, and we're happy to oblige.

For additional details relating to this process check out VMware Knowledge Base article Opening a command or shell prompt (1003892).

For additional tutorial videos be sure to subscribe our KBTV YouTube channel and our KBTV Blog.

Announcing VMware’s first ever GSS Virtual Customer Support Day!

On June 7th, VMware will host its first online GSS Customer Support Day event. This event is a unique opportunity for you to get answers to your questions directly from our Global Support Services technical experts. You told us that you’d like to see an online version of the popular live events that we’ve hosted in various locations throughout the world. We listened, and now anyone, anywhere can tap in to the advanced technical expertise of our Global Support Services team in this dynamic, interactive forum. 

Participation is limited, so please register early.  The sessions will be recorded and posted in the Support Insider Blog for later viewing if you can’t attend at the scheduled time.

Thank you — and we look forward to seeing you on June 7th!

 

 

 

 

Please join us for our first ever GSS Virtual Customer Support Day. This event is the same style as we hold in various cities throughout the world but brought to you for the first time online.

On Thursday, June 7, 2012 please join us for a comprehensive and highly educational seminar. Expect to gain "best practices" and "tips and tricks" as provided by our Senior Technical Support Engineers. There will also be an invaluable Q&A session directly with our "in the trenches" support staff following each topic.

Agenda:

8:00 a.m.

Welcome and GSS Overview

8:30 a.m. – 9:30 a.m.

Networking Best Practices

9:30 a.m. – 10:30 a.m.

Performance Troubleshooting

10:30 a.m. – 11:30 a.m.

Storage Best Practices

11:30 a.m.

Wrap Up

Questions? Contact Beth Lawson.

The VMware Team

 

Register

June 7, 2012
8:00 a.m. – 12:00 pm.
US – Pacific Daylight Time

 

Location

ONLINE ONLY

*This event will be recorded for those that cannot attend.

Random participants will be sent a raffle prize upon completion of the survey after the event.

 
 
 

My VMware Passwords

We have a couple of new KB articles just published to address some things a few customers have reported to us.

If you see the error Unable to Complete Your Request when resetting your password from within My VMware, head right on over to Troubleshooting the My VMware error: Unable to Complete Your Request (2018716).

Otherwise if you're just having difficulties recovering a forgotten My VMware password Start with: Resetting a forgotten My VMware password fails (2020621).