Log data has a long history of being difficult to work with. Almost every system, component, and application can generate massive amounts of unstructured data just cryptic enough to dissuade the typical administrator from harnessing the value it holds. Much of the time, log data is viewed only when something isn’t working. When issues occur, deciphering logs for a particular application or component is typically left to technical support professionals, who are the only ones to understand the nuances of the log data being generated.
Many perceive log data as nothing more than an insurance policy for when something goes wrong. This perception implies that if everything is functioning as expected, then there is little value to log data. This misconception overlooks one of the key benefits log data can provide for you and your environment. When used correctly, with the right tools, log data can provide context and understanding to changing conditions in the data center. Alarm mechanisms, while important, often indicate just a state or condition, and can be transient, only remaining visible during the period of time the alarm threshold is met. Log data tells a much more detailed story, and does so over time. Log analytics can complement other forms of information gathering, such as performance graphs, and alarms. Capitalizing on the intelligence buried in log data is a superb opportunity to manage a data center in a smarter way.
vRealize Log Insight
VMware vRealize Log Insight is a log management and analytics solution that gives the data center administrator an easy way to see context, correlation, and meaning behind otherwise obfuscated log content. Log Insight can aggregate log data from a variety of different sources, and creates a time series database of events that can be easily mined using a very simple query mechanism, and interactive graphs. Log Insight comes with a number of dashboards that are a collection of presets and queries to help the administrator easily interpret the data gathered. It’s fast, customizable, and extensible through the use of “content packs” found in the Log Insight Marketplace on VMware Solution Exchange. The content pack for vSphere unlocks data inside vCenter server logs, and provides an extraordinary amount of intelligence to any environment.
The vRealize Log Insight content pack for vSAN
The capabilities of Log Insight are extended to vSAN through its own dedicated content pack for vSAN. Found in the Log Insight Marketplace (accessible via web browser, or directly in Log Insight), the Log Insight content pack for vSAN provides a set of dashboards ready for any vSAN environment. The images below illustrate how different dashboards can be used to learn more about the environment.
Figure 1. Viewing a general change in vSAN error events in the content pack for vSphere.
In Figure 1, we can easily see a significant change in error events that vSAN is reporting courtesy of a general vSAN dashboard included in the content pack for vSphere. In order to gain deeper insight into detailed vSAN log activities, the Log Insight content pack for vSAN should be used. Using both content packs provides for a holistic understanding of activities, as issues may extend beyond specific components.
The power of these dashboard views comes in part from being able to easily identify change in an environment. Each dashboard view can be used to drill into each metric interactively to determine the exact log messages it is reporting.
Figure 2. Viewing changes in vSAN object components for the same time period as the vSAN error events.
In Figure 2, using the content pack for vSAN, we see changes in vSAN object components, which were a likely result of corresponding vSAN error events to trigger resyncing actions. This is a great example of how multiple views can be used to gain better context to the series of events can lead to the root cause of the activity.
Figure 3. Using Log Insight to send Resync and component activities as email alerts.
The reporting engine for Log Insight is not limited to just graphs within the application. Information can be sent to webhooks, vRealize Operations, or as seen in Figure 3, forwarded to an email address. This is a simple, but extremely effective way to gain understanding of what a vSAN environment is doing to maintain its resilience, health, and balance.
Let’s step through a simple, yet common scenario that will showcase the power of Log Insight content pack for vSAN.
Scenario description and click-through demo
A click-through demonstration walking through a scenario is available on StorageHub, and titled vRealize Log Insight Content Pack for vSAN – Diagnose vSAN Activity. The behavior demonstrated in the click-through demo was the result of physically disconnecting all network uplinks used by vSAN (active and standby) for a particular host (“ESX04”) in a 4 node vSphere cluster for a period of 1 hour and 7 minutes. The goal was to demonstrate how a spike in backend vSAN I/O activity several hours prior could be explained, even though there were no active alarms, and all VMs seemed to be operating normally.
Using the Log Insight content pack for vSAN in this scenario, we learned the following:
- Host ESX04 lost connectivity at 1:30pm to the vSphere cluster running vSAN.
- vSAN initiated the automatic resync/rebuild of components precisely 1 hour after the host connectivity failure. This event occurred to satisfy protection policies, and completed in about 14 minutes.
- We observed how the resync/rebuild times aligned to the same time period where the vSAN performance graphs reported increased backend IOPS, throughput, and congestion.
- We observed when ESX04 came back online 1 hour and 7 minutes after the initial host connectivity failure, it was discovered by vSAN, and all stale component objects were cleaned up.
All corrective actions taken by vSAN occurred automatically, and without any downtime of VMs in the environment. The scenario described represents two challenges all too common in most data centers; user error, or general communication breakdowns within or across infrastructure teams. In this example, whether a network port was physically disconnected, or a VLAN change was inadvertently made to the wrong interface port on a switch, the result would have been the same as what occurred in this scenario, and Log Insight helped expose activities that might have otherwise gone unnoticed.
Since vSAN could accommodate for the lack of connectivity to one of the hosts in the cluster, typical alarms may or may not accurately represent the series of events that occurred. Even though no VMs experienced downtime, and could reestablish their protection policies, having this new visibility would allow for the virtualization team to engage with the network team to work on how to avoid these situations in the future.
Notice that when stepping though the click-through demonstration, intimate knowledge of the actual log events being viewed was unnecessary. The content pack for vSAN already categorized presets for the administrator to easily discern how different types of activity are logged. Log Insight is easily customizable, but in this scenario, we simply used the default dashboards to gain better understanding of what occurred. This is a clear and simple example of why Log Insight should be running in every vSAN environment.
Log Insight paired with the content pack for vSAN is an easy way to gain a level of visibility and operational intelligence not only to vSAN, but to your entire environment. The log data already exists in your data center. Why not use it to your advantage?