The ESXi Dump Collector service is an extremely useful feature to have enabled, this is especially important in a stateless environment where there may not be a local disk for storing core dumps generated during a host failure. By configuring ESXi hosts to send it’s core dumps to a remote vSphere Dump Collector, it still allows you to collect core dumps which will help VMware Support analyze and determine the root cause of the failure.
In addition, by leveraging the vSphere Dump Collector, it allows you centrally manage core dump collection in your vSphere environment in the rare occasion a host may generate a PSOD (Purple Screen of Death) without having to go out to the host and manually copying the core dump file. A potential challenge that may come up when configuring the ESXi Dump Collector service is how do you go about validating the configuration is correct and that everything will work if a host crashes?
In the past, there was not a very user friendly way of validating ESXi Dump Collector configurations, you had to either resort to triggering an NMI (Non-Maskable Interrupt) to force a PSOD or even using an unsupported command in the ESXi Shell to forcefully crash a host. In both scenarios, it required a user to crash a host to test the network core dump feature which is not very ideal in my opinion. This can especially be a challenge if you need to validate hundreds if not thousands of ESXi hosts in your environment.
With the release of ESXi 5.1, we have added an enhancement to ESXCLI which now supports the ability to check and validate the dump collector configurations on your ESXi host. This new command is available under:
esxcli system coredump network check
The check operation is not a simple network ping test, but it actually goes out and communicates with the configured vSphere Dump Collector to ensure that it can successfully send core dumps over the network. Let’s take a quick look at how this new command works:
vi-admin@vMA51:~> esxcli –server vcenter51 –vihost pod23-esx-01a.pml.local –username root system coredump network set -v vmk0 -i 10.24.140.65 -o 6500
Enter password:
vi-admin@vMA51:~> esxcli –server vcenter51 –vihost pod23-esx-01a.pml.local –username root system coredump network set -e true
Enter password:
vi-admin@vMA51:~> esxcli –server vcenter51 –vihost pod23-esx-01a.pml.local –username root system coredump network get
Enter password:
Enabled: true
Host VNic: vmk0
Network Server IP: 10.24.140.65
Network Server Port: 6500
vi-admin@vMA51:~> esxcli –server vcenter51 –vihost pod23-esx-01a.pml.local –username root system coredump network check
Enter password:
Verified the configured netdump server is running
In the above example, the first two commands is just demonstrating a regular network core dump configuration and after that I perform a “get” operation to display the current configuration on the ESXi host. The final command is using the new check operation and we can see from the output that we have successfully verified the dump collector service is properly configured and running. In case there is an error in your configuration, you may see the following message:
Attempt to contact configured netdump server failed: Configured netdump server did not respond in a timely manner
So how do we go about verifying that there was actual communication between the ESXi host and the vSphere Dump Collector as mentioned earlier and not just a pretty print message? For this, we will need to take a look at the Sphere Dump Collector logs. In this example I am using the VCSA (vCenter Serer Appliance) which already has the vSphere Dump Collector pre-installed. The logs are stored in /var/log/vmware/netdumper/netdumper.log and if you tail the log while performing the checking operation, you should a message similar to this:
2012-12-10T17:33:19.328Z| netdumper| I120: Posting back a status check reply to 10.24.137.77
We can see there is an acknowledgement from the vSphere Dump Collector that it can properly communicate with the ESXi host and it is posting a reply back to the host. Hopefully with this new check command in ESXi 5.1, you no longer have to crash your host to just verify that dump collector service on ESXi is properly configured and working.
If you would like to get more details on how the ESXi Dump Collector service works, you can watch this video here:
For more details on configuring the ESXi network core dump, please refer to this VMware KB article which also provides links to the ESXCLI documentation.
Get notification of new blog postings and more by following lamw on Twitter: @lamw