In our last post, we looked at some SMBIOS based asset information. In this post we'll start to explore what information is available from IPMI.
As with our previous posts, lets explore the relevant technology a little before we dive into code. IPMI – the Intelligent Platform Management Interface, is a specification that defines how you can monitor and manage the health and hardware assets on a system using what is called a Baseboard Management Controller or BMC. The BMC is a secondary processor/controller that runs in the system and typically has it's own power tap on the power supply, so it's live even when the main system is powered off (as long as the power supply is plugged in.) Most modern server class x86 systems come with a BMC that supports IPMI. The BMC is then hooked up to various sensors and components on the system and can monitor their state independent of the primary CPU and running operating system. Some examples of the types of things BMCs may monitor are chassis intrusion sensors, temperature sensors, presence sensors for devices like hot-plug disk drives, and power supplies. Often the BMC is available on the network through a secondary IP address and can be used for remote management of the system, even if it is powered off (say to power it on – a capability which vCenter's Distributed Power Management feature leverages.) We're not going to focus on remote IPMI in this post, but instead look at the types of information you can retrieve from the BMC via the CIM implementation within ESX. The IPMI specification provides a lot of flexibility to system manufacturers in the type and number of sensors they implement. As a result, the data you see on one system may not match the data you see on another, particularly if those systems are from different manufacturers.
IPMI data maps to a number of different profiles. We'll take a look at two in this set of example code, namely DSP1009 – Sensors Profile, and DSP1015 – Power Supply Profile. In this example, we'll also touch on another aspect of CIM technology that can be a little tricky for beginners. The CIM schema often relies on enumerated types, or as CIM refers to them, Value Maps. When you look at the value of a property in an instance, it may be numeric, but that maps to some well defined set of values that are captured in the class definition in the MOF. Lets look at the HealthState property as a simple example of how this works. Here's an excerpt from the CIM_ManagedSystemElement MOF where this property is defined.
[Description (
"Indicates the current health of the element. This "
"attribute expresses the health of this element but not "
"necessarily that of its subcomponents. The possible "
"values are 0 to 30, where 5 means the element is "
"entirely healthy and 30 means the element is completely "
"non-functional. The following continuum is defined: n"
""Non-recoverable Error" (30) – The element has "
"completely failed, and recovery is not possible. All "
"functionality provided by this element has been lost. n"
""Critical Failure" (25) – The element is "
"non-functional and recovery might not be possible. n"
""Major Failure" (20) – The element is failing. It is "
"possible that some or all of the functionality of this "
"component is degraded or not working. n"
""Minor Failure" (15) – All functionality is available "
"but some might be degraded. n"
""Degraded/Warning" (10) – The element is in working "
"order and all functionality is provided. However, the "
"element is not working to the best of its abilities. For "
"example, the element might not be operating at optimal "
"performance or it might be reporting recoverable errors. n"
""OK" (5) – The element is fully functional and is "
"operating within normal operational parameters and "
"without error. n"
""Unknown" (0) – The implementation cannot report on "
"HealthState at this time. n"
"DMTF has reserved the unused portion of the continuum "
"for additional HealthStates in the future." ),
ValueMap { "0", "5", "10", "15", "20", "25", "30", ".." },
Values { "Unknown", "OK", "Degraded/Warning",
"Minor failure", "Major failure", "Critical failure",
"Non-recoverable error", "DMTF Reserved" }]
uint16 HealthState;
The metadata in the square brackets that precedes the property is called "Qualifiers," which are typed metadata in CIM. This particular property has 3 qualifiers, "Description," "ValueMap," and "Values." Let's focus on the latter two. What this pair of qualifiers tells us is that we can interpret numeric values as the logical equivalent of the English strings captured in the Values array. These two arrays are correlated – so "0" maps to "Unknown," "5" maps to "OK," and so on. Lets look at an example sensor using the same algorithm from the code we looked at last time. (Hint – to try this out, just change the classname to CIM_Sensor in the previous posting's code.)
OMC_NumericSensor
BaseUnits = 2
Caption = System Board 1 Ambient Temp
CreationClassName = OMC_NumericSensor
CurrentReading = 4100
CurrentState = Normal
DeviceID = 50.0.32.99
ElementName = System Board 1 Ambient Temp
EnabledDefault = 2
EnabledState = 2
EnabledThresholds = [1L, 5L]
HealthState = 5
Hysteresis = 0
IpmiSensorType = 1
IsLinear = False
MaxReadable = 12700
MinReadable = -12800
MonitoredDeviceId = 7.1
Name = Ambient Temp(50.0.32.99)
NominalReading = 4200
OperationalStatus = [2L]
PollingInterval = 15000000000
PossibleStates = [u'Lower Critical', u'Lower Fatal', u'Lower Non-Critical', u'Normal', u'Unknown', u'Upper Critical', u'Upper Fatal', u'Upper Non-Critical']
RateUnits = 0
RequestedState = 12
SensorType = 2
SettableThresholds = [1L, 5L]
SupportedThresholds = [1L, 5L]
SystemCreationClassName = OMC_UnitaryComputerSystem
SystemName = 602cfd62-cdef-3922-bf8d-6c4a0cf42058
TimeOfCurrentReading = 20100511100502.000000+000
TransitioningToState = 12
UnitModifier = -2
UpperThresholdFatal = 7000
UpperThresholdNonCritical = 5500
You'll notice that HealthState is reported as "5" which if we look back at the MOF, represents the value "OK." Now lets look at a little snippet of code that can help output these properties in a more human readable form.
# Dictionary to cache class metadata
classData = {}
def friendlyValue(client, instance, propertyName):
global classData
# Start out with a default empty string, in case we don't have a mapping
mapping = ''
if instance.classname not in classData:
# Fetch the class metadata if we don't already have it in the cache
classData[instance.classname] = client.GetClass(instance.classname, IncludeQualifiers=True)
myClass = classData[instance.classname]
# Now scan through the qualifiers to look for ValueMap/Values sets
qualifiers = myClass.properties[propertyName].qualifiers
if 'ValueMap' in qualifiers.keys() and 'Values' in qualifiers.keys():
vals = qualifiers['Values'].value
valmap = qualifiers['ValueMap'].value
value = instance[propertyName]
# Find the matching value and convert to the friendly string
for i in range(0,len(valmap)-1):
if str(valmap[i]) == str(value):
mapping = ' ('+vals[i]+')'
break
return mapping
Now if we call that new routine from our printInstance routine when we display property values, we can display friendly values for these properties as well. Here's the same instance from above with the new logic in place.
OMC_NumericSensor
BaseUnits = 2 (Degrees C)
Caption = System Board 1 Ambient Temp
CreationClassName = OMC_NumericSensor
CurrentReading = 4200
CurrentState = Normal
DeviceID = 50.0.32.99
ElementName = System Board 1 Ambient Temp
EnabledDefault = 2 (Enabled)
EnabledState = 2 (Enabled)
EnabledThresholds = [1L, 5L]
HealthState = 5 (OK)
Hysteresis = 0
IpmiSensorType = 1
IsLinear = False
MaxReadable = 12700
MinReadable = -12800
MonitoredDeviceId = 7.1
Name = Ambient Temp(50.0.32.99)
NominalReading = 4200
OperationalStatus = [2L]
PollingInterval = 15000000000
PossibleStates = [u'Lower Critical', u'Lower Fatal', u'Lower Non-Critical', u'Normal', u'Unknown', u'Upper Critical', u'Upper Fatal', u'Upper Non-Critical']
RateUnits = 0 (None)
RequestedState = 12 (Not Applicable)
SensorType = 2 (Temperature)
SettableThresholds = [1L, 5L]
SupportedThresholds = [1L, 5L]
SystemCreationClassName = OMC_UnitaryComputerSystem
SystemName = 602cfd62-cdef-3922-bf8d-6c4a0cf42058
TimeOfCurrentReading = 20100511105752.000000+000
TransitioningToState = 12
UnitModifier = -2
UpperThresholdFatal = 7000
UpperThresholdNonCritical = 5500
…and here's an example output of a system with two power supplies where only one is plugged in.
OMC_PowerSupply
Availability = 3 (Running/Full Power)
Caption = Power Supply 1
CreationClassName = OMC_PowerSupply
Description = Power Supply 1
DeviceID = 10.1
ElementName = Power Supply 1
EnabledDefault = 2 (Enabled)
EnabledState = 2 (Enabled)
HealthState = 5 (OK)
Name = Power Supply 1
OperationalStatus = [2L]
Range1InputFrequencyHigh = 63
Range1InputFrequencyLow = 47
Range1InputVoltageHigh = 26400
Range1InputVoltageLow = 9000
Range2InputVoltageHigh = 0
Range2InputVoltageLow = 0
RequestedState = 12 (Not Applicable)
SystemCreationClassName = OMC_UnitaryComputerSystem
SystemName = 44454c4c-5a00-1039-8058-c8c04f444431
TotalOutputPower = 930000
TransitioningToState = 12
OMC_PowerSupply
Availability = 8 (Off Line)
Caption = Power Supply 2
CreationClassName = OMC_PowerSupply
Description = Power Supply 2
DeviceID = 10.2
ElementName = Power Supply 2
EnabledDefault = 2 (Enabled)
EnabledState = 3 (Disabled)
HealthState = 30 (Non-recoverable error)
Name = Power Supply 2
OperationalStatus = [6L, 16L]
Range1InputFrequencyHigh = 63
Range1InputFrequencyLow = 47
Range1InputVoltageHigh = 26400
Range1InputVoltageLow = 9000
Range2InputVoltageHigh = 0
Range2InputVoltageLow = 0
RequestedState = 12 (Not Applicable)
SystemCreationClassName = OMC_UnitaryComputerSystem
SystemName = 44454c4c-5a00-1039-8058-c8c04f444431
TotalOutputPower = 930000
TransitioningToState = 12
Attached you'll find the complete sample which displays sensors and power supplies.