Home > Blogs > VMware vSphere Blog > Monthly Archives: May 2012

Monthly Archives: May 2012

Troubleshooting Storage Performance in vSphere – Part 1 – The Basics

I frequently present at the various VMware User Group (VMUG) meetings, VMworld and partner conferences.  If you have ever attended one of my talks, you will know it is like trying to drink from a fire hose, it is hard to cover everything in just a 45 min session. Therefore I will take the time here to  write a few blogs that go over the concepts discussed in these talks in more detail (or at least slower). One of the most popular yet very fast paced talks I present is the Troubleshooting Storage Performance in vSphere. I’ll slow things down a bit and discuss each topic here, this might be just a review for some of you but hopefully as we get into more details there will be some new nuggets of VMware specific information that can help even the more advanced storage folks. 

Today’s post is just the basics.  What is bad storage performance and where do I measure it?

Continue reading

Host Profile: Syslog.global.logDir doesn’t match the specified criteria

Posted by Kyle Gleed, 30 May 2012

I recently received a call from a co-worker having problems getting his hosts to comply with a host profile he had created.  He mentioned that when he first created the host profile everything was compliant, but after changing the Syslog.global.logDir attribute all the hosts became non-compliant.   He noted that despite the host profile compliance failure, the hosts were logging to the shared datastore – so everything was working, it was just that the host profile compliance check was incorrectly failing on the Syslog.global.logDir attribute. 

A1

In troubleshooting we found the problem was the syntax used for setting the Syslog.global.logDir attribute.  If you set the attribute by selecting the host and choosing “Configuration -> Advanced Settings -> Syslog -> global” it shows the syntax as “[datastoreName]/logdir”  (note the forward slash (/) before the directory name):

A2

This was the syntax he initially used to set the Syslog.global.logDir attribute in the host profile editor.  However, what we found is that the host profile compliance checker doesn’t like the slash (/).  Through trial and error we discovered that in order for the host profile to properly recognize that the Syslog.global.logDir value is set correctly we had to replace the forward slash (/) with a space:

A3

Once we replaced the slash (/) with a space and ran another check all the hosts fell into compliance. 

A4

I verified this is a known issue so it should be fixed in a future patch.  Until then, remember when using host profiles to set the Syslog.global.logDir attribute replace the forward slash (/) with an empty space when specifying the directory name.

Also, remember anytime you configure your hosts to log to a common datastore make sure you also set the Syslog.global.logDirUnique to true so that the hosts will each log to a separate sub-directory.

A5

For notification on future posts follow me on twitter @VMware ESXi

 

HA / Distributed vSwitch problems after Storage vMotion scripts available

By Duncan Epping, Principal Architect.

Most of you have read about this issue by now where it is impossible for vSphere 5.0 HA to fail-over a virtual machine which has been SvMotioned and is connected to a distributed vSwitch. Both William Lam and Alan Renouf were so kind to work on scripts to detect and resolve this problem. They published it on their blogs a while back, but at the same time internally we had our team officially testing these scripts.

I was just informed that both the PowerCLI and the Perl based scripts have been officially approved and attached to the KB article that describes this problem. If you are running vSphere 5.0 (or above) I would like to recommend to download the script and see if any virtual machines are impacted or not.

http://kb.vmware.com/kb/2013639

I would also like to thank all teams involved with getting the KB updated and the scripts developed and tested. Great work if I say so myself.

[EDIT: 01 June, as there are still problems with the KB article I have attached to this post, see link below]

Download KB 2013639 – HA-FDM fails to restart a virtual machine

Oldies, but goodies – Site Recovery Manager Setup Videos

I recently stumbled across a bunch of recordings that I made for the VMware KBTV channel some time ago. If you are looking for a short overview on how to configure SRM with some different storage arrays and replication technologies, or how to use some SRM features like shared recovery site or the DR IP customizer tool, you might like to check these videos out.

In no particular order:

1. Configure an EMC Celerra for SRM:

 

 

2. Configure an EMC Clariion for SRM:

 

 

3. Configure a NetApp for SRM:

 

 

4. Configure an HP/LeftHand for SRM:

 

 

5. How to setup the Shared Recovery Site SRM Feature:

 

 

6. How to use the DR IP Customizer Utility:

 

 

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage

Introduction to the vSphere API Part 3 – Inventory Hierarchy

By William Lam, Sr. Technical Marketing Engineer

In Part 2 of the series, we took a look at the vSphere Object Model and how objects such as a Virtual Machine, ESXi host, Datacenter, etc. are represented in the API and how to access their properties and capabilities. In this article, we will take a look at the vSphere Inventory Hierarchy and how to go about navigating and searching through the vSphere Objects.

Continue reading

Mem.MinFreePct sliding scale function

One of the cool “under the hood” improvements vSphere 5 offers is the sliding scale function of the Mem.MinFreePct.

Before diving into the sliding scale function, let’s take a look at the Mem.MinFreePct function itself. MinFreePct determines the amount of memory the VMkernel should keep free. This threshold is subdivided in various memory thresholds, i.e. High, Soft, Hard and Low and is introduced to prevent performance and correctness issues. The threshold for the low state is required for correctness. In other words, it protects the VMkernel layer from PSOD’s resulting from memory starvation. The soft and hard thresholds are about virtual machine performance and memory starvation prevention. The VMkernel will trigger more drastic memory reclamation techniques when it approaches the Low state. If the amount of free memory is just a bit less than the Min.FreePct threshold, the VMkernel applies ballooning to reclaim memory. The ballooning memory reclamation technique introduces the least amount of performance impact on the virtual machine by working together with the Guest operating system inside the virtual machine, however there is some latency involved with ballooning. Memory compressing helps to avoid hitting the low state without impacting virtual machine performance, but if memory demand is higher than the VMkernels’ ability to reclaim, drastic measures are taken to avoid memory exhaustion and that is swapping. However swapping will introduce VM performance degradations and for this reason this reclamation technique is used when desperate moments require drastic measurements. For more information about reclamation techniques I recommend reading the “disable ballooning” article.

vSphere 4.1 allowed the user to change the default MinFreePct value of 6% to a different value and introduced a dynamic threshold of the Soft, Hard and Low state to set appropriate thresholds and prevent virtual machine performance issues while protecting VMkernel correctness. By default vSphere 4.1 thresholds was set to the following values:

Free memory state Threshold Reclamation mechanism
High 6% None
Soft 64% of MinFreePct Balloon, compress
Hard 32% of MinFreePct Balloon, compress, swap
Low 16% of MinFreePct Swap

 

Using a default MinFreePct value of 6% can be inefficient in times where 256GB or 512GB systems are becoming more and more mainstream. A 6% threshold on a 512GB will result in 30GB idling most of the time. However not all customers use large systems and prefer to scale out than to scale up. In this scenario, a 6% MinFreePCT might be suitable. To have best of both worlds, ESXi 5 uses a sliding scale for determining its MinFreePct threshold.

Free memory state threshold Range
6% 0-4GB
4% 4-12GB
2% 12-28GB
1% Remaining memory

 

Let’s use an example to explore the savings of the sliding scale technique. On a server configured with 96GB RAM, the MinFreePct threshold will be set at 1597.36MB, opposed to 5898.24MB if 6% was used for the complete range 96GB.

Free memory state Threshold Range Result
High 6% 0-4GB 245.76MB
  4% 4-12GB 327.66MB
  2% 12-28GB 327.66MB
  1% Remaining memory 696.32MB
Total High Threshold     1597.36MB

 

Due to the sliding scale, the MinFreePct threshold will be set at 1597.96MB, resulting in the following Soft, Hard and low threshold:

Free memory state Threshold Reclamation mechanism Threshold in MB
Soft 64% of MinFreePct Balloon 10244.26
Hard 32% of MinFreePct Balloon, compress 511.23
Low 16% of MinFreePct Balloon, compress, swap 255.62

 

Although this optimization isn’t as sexy as Storage DRS or one of the other new features introduced by vSphere5 it is a feature of vSphere 5 that helps you drive your environments to higher consolidation ratios.

vCLI + ESXCLI Authentication Options

By William Lam, Sr. Technical Marketing Engineer

Did you know the vCLI (includes ESXCLI) offers several different authentication options? This is actually not a very well known fact and I thought I share some of the different options, as this question comes up from time to time.

Continue reading

SDRS maintenance mode impossible because “The virtual machine is pinned to a host.”

By Frank Denneman, Sr. Technical Marketing Architect

When using virtual machines that have disks set to snapshot independent and are placed in a datastore cluster, attempting to enter SDRS maintenance mode for a datastore results in the error "The virtual machine is pinned to a host."

By default Storage DRS will not move virtual machines with independent disks. Independent disks can be shared or not. To determine if the disks are shared is a very expensive operation within the algorithm, as Storage DRS needs to investigate every virtual machine and its disks in the datastore cluster. To reduce the overhead generated by Storage DRS on the virtual infrastructure, Storage DRS does not recommend such moves.

However after getting feedback about the use of non-shared independent disk in a datastore cluster from customers and the community forums, the engineering team released a vpxd.cfg option sdrs.disableSDRSonIndependentDisk. By default this option is not listed in the vpxd.cfg and is treated as true. When specified and set to false, Storage DRS will move independent disks and the error “The virtual machine is pinned to a host” will not appear. Remember that this option will automatically apply to all datastore clusters managed by that vCenter server!

Please note that this option should only be used with non-shared independent disks! Moving shared independent disks is not supported

vSphere Metro Storage Cluster white paper released!

By Duncan Epping, Principal Architect.

***Download VSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES***

A brand new white paper was just published. This white paper was written by Lee Dilworth, Ken Werneburg, Frank Denneman, Stuart Hardman and I. It is a white paper on vSphere Metro Storage Cluster solutions and specifically looks at things from a VMware perspective. Enjoy!

  • VMware vSphere Metro Storage Cluster (VMware vMSC) is a new configuration within the VMware Hardware Compatibility List. This type of configuration is commonly referred to as a stretched storage cluster or metro storage cluster. It is implemented in environments where disaster/downtime avoidance is a key requirement. This case study was developed to provide additional insight and information regarding operation of a VMware vMSC infrastructure in conjunction with VMware vSphere. This paper will explain how vSphere handles specific failure scenarios and will discuss various design considerations and operational procedures.

    http://www.vmware.com/resources/techresources/10299

I am aiming to also have the kindle/epub version up soon and will let you know when they are released!

NFS Block Sizes, Transfer Sizes & Locking

Cormac_Hogan
Posted by Cormac Hogan
Technical Marketing Architect (Storage)

I've had a few questions recently around the I/O characteristics of VMware's NFS implementation. I'm going to use this post to answer the common ones.

 

NFS Block Sizes

 The first of these questions is usually around the block size used by NFS. The block size on NFS datastores is "only" based on the block size of the native filesystem on the NFS server or NAS array, so the size depends solely on the underlying storage architecture of the server or the array.

The block size has no dependancy on the Guest Operating System block size (which is a common misconception) because the Guest OS's virtual disk (VMDK) is only a flat file that is created on the server/array. This file is subject to the block sizes enforced on the NFS server's or NAS array's filesystem.

One more interesting piece of detail is that when there is a fsstat done on the NFS mount on the ESXi client, the ESXi NFS client always returns the default file block size as 4096. Here is an example of this using the vmkfstools command to look at the file block size:

Vmkfstools - 4k bs
 

Maximum Transfer Sizes

The NFS datastore's block sizes is different from maximum read and write transfer sizes. The maximum read and write transfer sizes are the chunks in which the client communicates with the server. A typical NFS server could advertize 64KB as the maximum transfer size for reads and writes. In this case, a 1MB read would be broken down into a 16 x 64KB sized reads. However, the point is that this has got nothing to do with the block sizes of the NFS datastore on the NFS server/NAS array.

 

NFS (Version 3) Locking

Another common question I get is around NFS locking. In NFS v3, which is the version of NFS still used by vSphere, the client is responsible for all locking activities such as liveliness and enforcement. The client must 'heartbeat' the lock on a periodic basis to maintain the lock. The client must also verify the lock status before issuing each I/O to the file that is protected by that lock. The client which holds the lock must periodically update the timestamp stored in the lock file to ensure lock liveliness. If another client wishes to lock the file, it monitors the lock liveliness by polling the timestamp. If the timestamp is not updated during a specific window of time (discussed later), the client which holds the lock is presumed dead and the competing client may break the lock.

To ensure consistency, I/O is only issued to the file when the client is the lock holder and the lock lease has not expired yet. By default, there are 3 heartbeat attempts at 10 seconds intervals and each heartbeat has a 5 seconds timeout. In the worst case, when the last heartbeat attempt times out, it will take 3 * 10 + 5 = 35 seconds before the lock is marked expired on the lock holder client. Before the lock is marked expired, I/O will continue to be issued, even after failed heartbeat attempts.

Lock preemption on a competing client starts from the detection of lock conflict. It then takes 3 polling attempts with 10 seconds intervals for the competing host to declare that the lock has expired and break it. It then takes another 10 seconds to establish its own lock. Lock preemption will be completed in 3 * 10 + 10 = 40 seconds before I/O will start to flow on the competing host.

 

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: Twitter @VMwareStorage