Riak KV is distributed key-value NoSQL database that provides a very reliable way to store massive amounts of unstructured data. It enables development of big data applications, so it is very important to monitor your RIAK KV instance performance to avoid problems in your applications.
The Wavefront SaaS-based metrics monitoring platform and analytics, combined with open-source Telegraf agents can provide real-time insight into database behavior without impacting its performance (not an easy task). Using the power of Wavefront Query Language, you can choose metrics that matter to you, whether you are a developer, a database administrator or a DevOps engineer.
With Wavefront packaged free integrations, you can easily ingest metrics from any data source. In this blog, I will share how to use Wavefront to monitor key RIAK KV metrics. Once your metrics are in Wavefront, you can correlate them with any other data source including custom application metrics or other cloud infrastructures.
Before we get started, you need the following pre-requisites:
- Sign-up for a Wavefront account. Please refer to the Wavefront free trial here.
- A Wavefront proxy must be configured. For more details, please refer to Wavefront documentation.
- Install the Telegraf agent in Virtual Machines that you would like to monitor.
- Make sure that your RIAK database is up and running.
Now let me go through configuration steps.
Step 1 – Validate That Your Riak Server is Operational
curl -v http://127.0.0.1:8098/types/default/props
The output above shows a successful response (
HTTP 200 OK) and additional details from the verbose option. The response also contains the bucket properties for the
default bucket type.
Step 2 – Enable Riak Server Plugin in the Telegraf Agent
First, create a file called
/etc/telegraf/telegraf.d and enter the following snippet: (Where Host-IP is the value of the
listener.http.internal property in
[[inputs.riak]] servers = ["http://<Host-IP>:8098"]
Configure global tags in the
telegraf.conffile to group Riak nodes into clusters, as in the snippet below. Without this configuration, the Wavefront dashboard will not work.
[global_tags] #Setting environment tags like prod, dev, perf, and test env = "prod"
Restart the Telegraf agent using the command below:
sudo service telegraf restart
Step 3 – Visualizing Your Riak Metrics in Wavefront
Go to your Wavefront interface.
Integration-> Riak KV Store icon:
And then, you can see all metrics available in the Metrics tab:
With the power of Wavefront and Telegraf combined, you can now visualize many key metrics from RIAK KV store. Some of those metrics are described below:
- Throughput Metrics From RIAK documentation: Graphing the throughput stats relevant to your use case is often helpful for capacity planning and usage trend analysis. In addition, it helps you establish an expected baseline – that way, you can investigate unexpected spikes or dips in the throughput. The following stats are recorded for operations that happened during the last minute.
- node_gets – Reads coordinated by this node
- node_puts – Writes coordinated by this node
- vnode_index_reads – Number of local replicas participating in secondary index reads
- Latency Metrics (From RIAK documentation: As with the throughput metrics, keeping an eye on average (and max) latency times will help detect usage patterns, and provide advanced warnings for potential problems.)
- node_get_fsm_time_mean – Time between reception of client read request and subsequent response to client
- node_put_fsm_time_mean – Time between reception of client write request and subsequent response to client
- Erlang Resource Usage Metrics (From Riak documentation: These are system metrics from the perspective of the Erlang VM, measuring resources allocated and used by Erlang.)
- memory_processes – Total amount of memory allocated for Erlang processes (in bytes)
- General Riak Load/Health Metrics (From RIAK documentation: These various stats give a picture of the general level of activity or load on the Riak node at any given moment)
- pbc_connects – Number of new protocol buffer connections established during the last minute
- read_repairs – Number of read repair operations this node has coordinated in the last minute (determine baseline, watch for abnormal spikes)
- pbc_active – Number of currently active protocol buffer connections
I hope you found this blog useful. To learn more about the power of metrics-driven analytics and monitoring check out Wavefront documentation. I am looking forward to receiving your feedback. Feel free to send your comments and suggestions to me on Twitter @lucgovmw.
About the Author:
Luciano (He likes to be called Lucky because it’s nice to hear your name correctly pronounced) joined VMware in 2012 as a Senior Consultant based in beautiful Rio de Janeiro, Brazil. Since then, he’s worked with many customers across diverse industries in almost all states of Brazil. Lucky was recognized as an MVP in his first year at VMware, and for the Latin America Delivery Award twice (3 years as a Senior Consultant and 3 Awards in a row). He holds multiple industry certifications including VMware VCP-DCV, VCP-NV, VCP-CLOUD, Double VCP, Cisco DCUD, DCUI and Riverbed RCSA-W. He’s a very enthusiastic supporter of Systems Management and uses every chance he gets to discuss with customers how to leverage their vROps License and avoid spending more time (and money) with other tools. Lucky’s 17+ year technical background helps him to understand customer’s business needs and to find the right technical solution to address those requirements. Connect with him on LinkedIn (https://www.linkedin.com/in/lucgomes/) and follow Luciano on Twitter (@lucgovmw).