vRealize Network Insight not only visualizes all network communication within your network and provides a wealth of information for troubleshooting, it can also help you to catch outlier behavior in virtual and physical servers using advanced analytic capabilities. If a certain VM breaks the pattern of its like-minded VMs, or if it starts showing network behavior that is outside its previous patterns, Network Insight can generate alerts for it. This post dives into the wonderful world of outlier detection of the analytics capabilities of Network Insight.
Analytics
First, there are 3 kinds of analytical capabilities inside Network Insight. Here’s a brief overview:
- Outliers: based on groupings that you create, it detects whether a specific VM shows different behavior than the rest of the group.
- Thresholds: alerting based on static or dynamic thresholds based on bandwidth rate and packet drops.
- Top Talkers: visualization of network behavior within a certain scope.
I will be talking about all analytical capabilities in the future but will limit this post to the outlier detection.
Outliers?
Let’s say you have a group of workloads that have the same function. For example, web servers that are serving the same application and are behind a load balancer for incoming user requests should have the same traffic patterns. If one breaks that pattern, either the load balancer is acting up or has been misconfigured, or the server itself is misbehaving.
As we grow into a more distributed and micro service modal, more and more workloads only have 1 job and it is possible to predict its behavior. This doesn’t exclude traditional environments though, as those will have servers with the same functionality as well; think of DNS, Active Directory, SQL Clusters, etc.
The outlier detection can be done based on network traffic rates (and in which direction), the traffic type; east-west or north-south and it can be limited to specific ports.
What do I get?
Apart from getting alerts (email or SNMP Trap) on outliers, you’ll be able to view the behavior of all group members in a single graph. Using this graph, you can quickly see the outlier behavior and which server was affected, then drill into that server to see what’s happening.
In the above example, we have a group of DNS servers and it’s pretty clear that the server cmbu-sc2dc-01 is serving more traffic over port 53 then its group members.
These outlier groups can contain one or more network ports that will be monitored. You can also choose the traffic direction; incoming, outgoing or both. The reason why you would choose either direction is when you’re working with a traffic receiving application (data processors) or a traffic sending application (DNS, web servers, etc.).
My advice would be to use both directions, so you catch anything weird happening. Regular web servers that start to show many outgoing HTTP requests, while the incoming requests remain the same; is usually not good.
Configuration
To create outlier groups, go to the Analytics menu and open up the Outliers page.
On the following page, you will get a list of all configured outlier groups. The list will have a handy overview of the number of any outliers that are detected, any related events, the scope of the outlier group (how many VMs or physical IPs) and the time when the last outlier was detected.
On the top right of the outlier list, there’s an ADD button to create a new outlier group. Click that and fill out all the details related to the outlier detection. Starting from the top, these are the options:
- Give the group a name.
- Define the Scope. This can be an application tier or an NSX Security tag.
- In case of an application tier, first select the application itself and then select the tier.
- Select a Metric. You can select total traffic (MB, GB, etc.), amount of network packets or sessions, or the traffic rate (per second).
- Chose the traffic direction; incoming, outgoing or both.
- Include only north-south (internet) traffic, east-west traffic (inside the data center) or both?
- Select the destination ports. You can monitor all used ports, but there is currently a limit of 20 ports. If those 20 ports are exceeded, the monitored ports need to be entered manually (as in the example below).
- Chose a sensitivity for the detection algorithm; low, medium or high.
- Once everything is set, Network Insight will generate a preview of the results with a line chart of the traffic for the included flows.
If the preview looks good, click the Submit button and the outlier group will enter into the monitoring system.
Events
Whenever an outlier is detected, an event will be generated. For every current outlier, that event will remain open. For the historical archives, whenever an outlier stops acting out, the event will close but remain visible.
Conclusion
Outlier analytics builds on the wealth of data that Network Insight contains and is a unique way to keep track of behavior within like-minded groups. If a VM or physical IP address starts to behave differently from the group, generate alerts based on the outlier events and deal with the misbehavior appropriately.
Stay tuned for more on the analytics use-cases of Network Insight!