Hardware

Monitoring ESX hardware with PowerShell

Did you ever wonder where all the info under Health Status in VI Client comes from?

sensors

It turns out that your ESX server has a server built in to it that provides this information via a management standard called CIM, which is owned and promoted by the DMTF standards body. A lot of the information you can find in this CIM interface is also available through the VI API, but there is quite a bit that can only be found through CIM, most importantly, health status of the hardware hosting ESX. In this post we’ll go through some of the information that’s available and samples of getting at it.

When Microsoft released PowerShell v2 CTP3, they added some cmdlets that can access CIM. The implementation is very raw right now, for example the cmdlets don’t even have help. Also, getting this stuff working is a bit of a production. You need PowerShell v2 CTP3 and WinRM 2.0 CTP3. This latest version of WinRM only supports Windows 2008, Windows Vista and Windows Mojave, so you XP users are out of luck. You’ll also need to enable basic authentication.

Once you’ve got it set up, I’ve written a pretty easy advanced function that lets you take advantage of it.

This function makes it pretty easy to access ESX hardware information, as you’ll see below. All you need to do is connect to your ESX server, then run the function, supplying it a class name that you care about. You can find a complete list of classes in our documentation. Note that not all classes are available on all systems. As far as I know the only way to know for sure is to try it for yourself.

Here’s a sample script that puts this through its paces, and some screenshots of the results after I ran it against one of my ESX systems.

Let’s see it in action. First let’s check out all the numeric sensors that are on my system. The exact list of numeric sensors will vary from system to system, but common ones include fan speeds and temperature readings, as on my system.

sample1

It’s a little strange that my fan speeds are far below their normal minimum values, yet that doesn’t seem to be causing problems, the ambient temperature is far below the maximum value. Note that for some reason all the values here are multiplied by 100, and that the temperature are in Celsius (just like those snobby Europeans use). You can see all the possible values either by running the code or referring to this specific class in the documentation.

Next, let’s look at some actual health status, first taking on CPU.

sample2

So what the heck does HealthState 5 mean? Well I’m not sure, but one thing I am sure of is that the word uninscrutable is not in the DMTF’s collective vocabulary. To figure out what this mysterious 5 means, we again have to consult our trusty documentation. If we look at the HealthState entry we see rows for ValueMap and Values. All we have to do is correlate the value of 5 with the text in Values. Doing that we learn that HealthState 5 means “OK”.

This is true in general when you’re looking at health states: 5 always mean that the thing is OK. The value of 0 means that the state is unknown. This is not necessarily a problem, for example it may mean that the device in question is turned off. If you expect the thing to be on and useful, a value of 0 is a bad thing, otherwise it may not be worth worrying about. Any of the other values means you’ve got some sort of hardware problem that needs attention.

Now let’s look at my power supplies:

sample3

In this case I’ve got one power supply with health state 5 (OK) and one with state 0 (Unknown). Chances are my second power supply is not plugged in. (It’s in a datacenter far far away so I can’t check it.) In any event I’m not too worried about this Unknown value.

Let’s take a look at the health state of my storage. My ESX server has 3 disks, and as usual 5 is the best health state of all.

sample4

This next one shows that CIM can also be used to see data that’s also available through VI API. In this case I get the total amount of storage on this ESX system, which happens to be the same as the amount of storage in my local datastore.sample5

Finally, I take a look at some network stuff. In this case I look at Ethernet ports. Ethernet ports don’t have a health status, that’s reserved for the cards. But I can see the link status using CIM. In this case, vmnic0 and vmnic1 are plugged in while vmnic2 and vmnic3 are not.

sample6

I hope you found this useful, there’s a lot of really great information waiting to be discovered through the CIM interface.