(Editor’s note: Try Wavefront for free).
It’s always a challenge to optimize the performance of an application to be highly available and without customer-impacting latencies. For critical application services like databases, if there is a degradation in its performance, then all other services using that database will have their performance degraded or impacted. So, comprehensive visibility across the entire stack becomes critical. In many cases, however, monitoring tools providing needed visibility may consume the very same processing resources used by the monitored application itself. And the question becomes, how to monitor a critical application such as a database consistently and reliably without the negative performance hit or the need for disruptive fixes such as rebooting your critical servers?
Luckily, we have an answer for you. The Wavefront platform combined with open-source Telegraf agents can provide real-time database performance visibility without impacting database performance. Using the Wavefront Query Language, you can choose metrics that matter to you and easily customize Wavefront dashboards and alerts. In this blog, I will share how you can use Wavefront to monitor key MS SQL Server performance metrics. Before we get started, you need the following pre-requisites:
- Sign-up for a Wavefront account. Please refer to the Wavefront free trial here.
- A Wavefront proxy must be configured. For more details, please refer to Wavefront documentation.
- Install the Telegraf agent on Windows hosts you would like to monitor.
Now let me go through configuration steps.
Step 1 – Enable MS SQL Server Plugin in the Telegraf Agent
First, open the file telegraf.conf. Please do not use Notepad, Notepad++ is better. Add these lines at the end of the file:
- In the Server= put the Database Server IP and port used in your environment.
- User ID= put the user that has admin rights in your Database (SA user by example). See how to configure it here
- If you are running the Telegraf agent with a user that has already permission in your Database, you don’t need to change anything. See the comment below:
# Read metrics from Microsoft SQL Server
# Specify instances to monitor with a list of connection strings.
# All connection parameters are optional.
# By default, the host is localhost, listening on default port (TCP/1433)
# for Windows, the user is the currently running AD user (SSO).
# See https://github.com/denisenkom/go-mssqldb for detailed connection parameters.
servers = [
“Server=192.168.1.30;Port=1433;User Id=telegraf;Password=T$l$gr@f69*;app name=telegraf;log=1;”,
“Server=192.168.1.30;Port=2222;User Id=telegraf;Password=T$l$gr@f69*;app name=telegraf;log=1;”
- To have better visibility, change the line below, to see your custom hostname inside Wavefront UI:
hostname = “ControlCenter-LG”
After this change, restart the Telegraf agent :
Step 2 – Checking the MS SQL Server Plugin Metrics in Wavefront
In Step 2, I’ll go through the configuration instructions needed to start flowing your MS SQL Server metrics in Wavefront. To do that, first, go to your Wavefront UI and click on Browse –> Metrics.
Type the name of your Custom Hostname and see the metrics that come from MS SQL Server. Alternatively, you can type the specific metric name that you are looking for.
Now that you have all MS SQL Server’s metrics available in Wavefront, it will be very easy to create dashboards to help you visualize the most important performance metrics. The dashboard below visualizes key MS SQL Server performance metrics. However, if you’re not familiar with important database metrics you should be collecting, see a summary of top metrics to collect (courtesy of this Database Journal article).
|SQLServer: Buffer Manager: Buffer cache hit ratio||The buffer cache hit ratio counter represents how often SQL Server is able to find data pages in its buffer cache when a query needs a data page.|
|SQLServer: Buffer Manager: Page life expectancy||The page life expectancy counter measures how long pages stay in the buffer cache in seconds.|
|SQLServer: SQL Statistics: Batch Requests/Sec||Batch Requests/Sec measures the number of batches SQL Server is receiving per second. This counter is a good indicator of how much activity is being processed by your SQL Server box.|
|SQLServer: SQL Statistics: SQL Compilations/Sec
SQLServer: SQL Statistics: SQL Re-Compilations/Sec
|The SQL Compilations/Sec measure the number of times SQL Server compiles an execution plan per second.
When the execution plan is invalidated due to some significant event, SQL Server will re-compile it. Ideally you want to keep this counter less than 10% of the number of Compilations/Sec.
|SQLServer: General Statistics: User Connections||The user connections counter identifies the number of different users that are connected to SQL Server at the time the sample was taken.|
|SQLServer: Locks: Lock Waits / Sec: _Total||In order for SQL Server to manage concurrent users on the system, SQL Server needs to lock resources from time to time.|
|SQLServer: Access Methods: Page Splits / Sec||This counter measures the number of times SQL Server had to split a page when updating or inserting data per second. system will perform. Ideally this counter should be less than 20% of the batch requests per second.|
|SQLServer: General Statistic: Processes Block||The processes blocked counter identifies the number of blocked processes.|
|SQLServer: Buffer Manager: Checkpoint Pages / Sec||The checkpoint pages per second counter measures the number of pages written to disk by a checkpoint operation.|
To import this dashboard into your Wavefront instance, use the command below with this JSON file. Unzip after downloading the file.
curl -v https://<wavefront_instance>.wavefront.com/api/v2/dashboard -d @<dashboard_file>.json \
-H “Content-Type: application/json” -H ‘Authorization: Bearer <wavefront_api_token>’
I hope you found this blog useful. To learn more about the power of metrics-driven analytics and monitoring check out https://docs.wavefront.com/ . I am looking forward to receiving your feedback. Feel free to send your comments and suggestions to me on Twitter @lucgovmw.
About the Author:
Luciano (He likes to be called Lucky because it’s nice to hear your name correctly pronounced) joined VMware in 2012 as a Senior Consultant based in beautiful Rio de Janeiro, Brazil. Since then, he’s worked with many customers across diverse industries in almost all states of Brazil. Lucky was recognized as an MVP in his first year at VMware, and for the Latin America Delivery Award twice (3 years as a Senior Consultant and 3 Awards in a row). He holds multiple industry certifications including VMware VCP-DCV, VCP-NV, VCP-CLOUD, Double VCP, Cisco DCUD, DCUI and Riverbed RCSA-W. He’s a very enthusiastic supporter of Systems Management and uses every chance he gets to discuss with customers how to leverage their vROps License and avoid spending more time (and money) with other tools. Lucky’s 17+ year technical background helps him to understand customer’s business needs and to find the right technical solution to address those requirements. Connect with him on LinkedIn (https://www.linkedin.com/in/lucgomes/) and follow Luciano on Twitter (@lucgovmw).