Introduction
When it comes to system administration and operations, we all know it can sometimes be tricky to get the visibility into your environments at the granularity and verbosity level that you want. No one wants a call at 3am to say a NIC flapped, unless, for example, it’s happening across 10 hosts within a short period of time.
It is also true that no two companies monitor and operate their environments the same way, so finding solutions that work with your particular setup can be tricky. vRealize Log Insight (which comes with 25 free licenses with all vCenter editions) allows you to set up webhooks to send arbitrary messages to any web service based on alerting conditions you define within the product, this is extremely powerful and essentially allows you to route alerts to any service you can think of.
In this post, we’ll take a look at how you can set up vRLI to send alerts to Slack (or any other service of choice) based on particular vSAN states and log messages. This can, of course, be extended to other services like PagerDuty, ServiceNow, or indeed any of the plugins listed here.
What you need
This is pretty simple to set up, but we’re going to outline the requirements here – we assume you have the following already set up and operating within your environment:
- vSAN
- vRealize Log Insight
- Slack
- A linux box with Docker installed.
In addition to the above, we are going to deploy a very simple container that acts as the aggregation layer for all these services, it allows us to take the message fired by vRLI and translate it into whatever service you want the message to go to without having to do any coding.
The how
The container
You’ll need a linux box to run the container on, or if you’re running vSphere Integrated Containers it can run there, a Kubernetes cluster, PKS, etc. In this particular example, I am simply running the container on an Ubuntu VM for brevity and simplicity.
SSH into the linux box and pull down the latest version of the vmware/webhook-shims container:
1 |
docker pull vmware/webhook-shims |
With the container image pulled down locally, let’s deploy it and set it to always restart if it fails:
1 |
docker run -d -p 5001:5001 vmware/webhook-shims --restart=always |
The above command, when broken down does the following; Runs the container as a daemon process (in the background), maps port 5001 of the container to port 5001 of the host machine and restarts the container on all failures or host restarts.
Ensure the container is running by issuing the below command:
1 2 3 |
myles@docker01:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5f5dab820dc6 vmware/webhook-shims "/root/webhook-shims…" 4 months ago Up 14 minutes 0.0.0.0:5001->5001/tcp practical_mcnulty |
At this point, if we visit the host’s IP on port 5001 we will see a webpage like the below that contains the setup instructions for each plugin – this confirms the container is working as expected:
Slack
With the container set up, you will need to create an “Incoming Webhook” app for your slack team – instructions for doing so can be found here (follow steps 2 and 3). At the end of step 3 you should be furnished with a URL in the below format:
1 |
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX |
Take note of the three sections in the URL, beginning with,T
, B
and the random last string, as we will need them later on.
vRealize Log Insight
Within vRLI’s “Interactive Analytics” tab, click the Alerts icon and then “Create new alert”. Give your alert a name – I called this one “Slack Webhook” as we can route these alerts to multiple services and have discrete alerts set up for each.
Check the “webhook” checkbox and fill in the URL with the following format:
1 |
http://[your container IP]:5001/endpoint/slack/Txxxxxxx/Bxxxxxxxx/xxxxxxxxxxx |
Substitute in the IP address of your container, as well as the sections of the Slack URL from above.
As an example, let’s say my container host’s IP is 192.168.0.100
and my Slack URL is https://hooks.slack.com/services/T123456789/B87654321/ABCKSFSHDFKNGSDIGDFG
then my webhook URL in vRLI would be the below:
1 |
http://192.168.0.100:5001/endpoint/slack/T123456789/B87654321/ABCKSFSHDFKNGSDIGDFG |
And set up whatever alerting frequency you would like as the last item.
From here, click the Alerts icon again and then “Manage Alerts”, find the “Slack Webhook” alert we just created and click the “edit” button. At the bottom of the dialogue box that pops up you will see an “Edit Query” button – This is where you define the conditions that the alert will be raised to.
The query shown below is based on one of the built-in vRLI queries and looks for component state changes where a component changes from active to any other state (like degraded, stale, absent, etc):
With your alert defined – click “Save” on the bottom right-hand side of the window. Now that the alert is defined and has a query associated with it, all that needs to happen is for the query to match some results from the syslog it ingests – if you used the above example, put a host into maintenance mode and you will receive your alerts straight into the Slack instance you defined. Here’s a clip from our test lab:
And that’s it, an end-to-end solution to get your vRLI alerts into Slack, but as mentioned at the start – there are many services supported by the webhook-shims container and we encourage you to try these out for other things like automatic ticketing in ServiceNow, paging with PagerDuty, kicking off Jenkins jobs or even vRO workflows.