In part one of our blog, we configured the Fluentd agent to pulls Cassandra database logs into vRealize Log Insight. If we stopped there, we would only get a partial picture of Cassandra environment if anything were to go wrong. Let’s take it to the next level and monitor the health and metrics of our Cassandra environment with the full power of vRealize Operations!
Configuring event forwarding to vRealize Operations:
In the first part of this blog, we extracted some important fields from the Cassandra events coming from Fluentd and created some simple widgets. Lets use that same method to send ‘database down’ events to vRealize Operations, for monitoring and alerting.
First, let’s find a database down event using the ‘cass_db_status’ field we created, and let’s look that event from all of our Cassandra source nodes. Once we find our shutdown events, click the ‘Create Alert from Query’ button to the right of the search bar.
Clicking that button will bring up the screen to create a new alert. Let’s fill in the details, and under ‘Notify’, make sure you send the alert to vRealize Operations Manager. If you don’t have vROPS integrated with Log Insight, there is documentation on how to do so in our vRealize Operations VMware documentation site. We want to be alerted on any match of the event, and then we save this alert.
Now when this alert fires off in vRealize Log Insight, it will be sent as an alert in vRealize Operations, which we can then take action on.
It’s that simple to send a vRealize Log Insight alert to vRealize Operations. We could stop here, but then we’d be leaving out one of vRealize Operation’s biggest features: Application Monitoring. Let’s continue and see how we can configure metrics collection from Cassandra using vRealize Operation’s application monitoring capabilities.
Configuring the vRealize Operations Telegraf Agent:
The application monitoring agent for vRealize Operations lets us pull lots of great metrics and stats from dozens of applications. We can use these metrics to troubleshoot when a service goes down due to resource contention or a hardware/software failure anywhere down the application stack.
Let’s start by pushing a Telegraf agent to each Cassandra Linux VM via vRealize Operations. The actual steps are documented in our vRealize Operations documentation in the link I have provided above and are beyond the scope of this write-up.
We also need to configure the Jolokia Java plugin for Cassandra, to collect metrics from the database to send to vRealize Operations.
I’ve taken these instructions from the Cassandra Integration for Wavefront, as Wavefront uses the same agent as vRealize Operations:
Cassandra Integration – Telegraf
Once an agent is connected and collecting and your curl comes back successful for the Jolokia plugin on each node, we can configure them for Cassandra.
Configuring vRealize Operations Application Monitoring:
Back in vRealize Operations, under ‘Application Monitoring’, you should see discovered (but not configured) Cassandra services once the agents are deployed. Click on ‘discovered’ to start configuring the agents.
On the next screen, we see our VMs and the agent status. Click on the ‘Manage Services’ button in the top menu and choose ‘cassandra’.
In this configuration screen, we need to give a display name, the localhost Jolokia URL relative to the Telegraf agent, not relative to vRealize Operations, and port on the Cassandra VM, and the Cassandra install path in the VM.
Once we save, the agent should start collecting metrics from the Jolokia plugin.
Finally, we can click on ‘Show Detail’ for a specific agent, and see all the active alerts, metrics, and events for the entire stack, from the host hardware, to the datastore, VM, operating system and even the Cassandra database application.
Here is a list of the metrics that Telegraf collects for Cassandra. There is a document like this for every application’s metrics that we collect with application monitoring.
Cassandra Database Metrics – vRealize Operations
Now you can create vRealize Operations alerts based on these application metrics, and correlate alerts coming from Log Insight with metrics in vRealize Operations, to get a single pane of glass view of your application environment.
Now we have full end to end visibility into our Cassandra application environment, and a one stop area to troubleshoot and pinpoint where the issues are without spending hours upon hours digging through logs and trying to correlate events with metrics. Of course, this writeup can be a guide for monitoring any supported application with vRealize Log Insight and vRealize Operations, as the concepts are the same. Good luck and thanks for reading!