Join the APM Beta
VMware is looking for customers
interested in the beta program for the next release of vFabric Application Performance Management (APM).
vFabric Application Performance Manager (APM) provides comprehensive application monitoring capabilities. APM installs with two components: vFabric AppInsight for Transaction and Code level monitoring (byte code instrumentation is courtesy of Spring Insight) and vFabric Hyperic for middleware and OS monitoring. Together, these products work to not only provide an early warning system for building performance problems, but they also provide critical perspectives necessary for Root Cause Analysis (RCA). In this post, we will discuss how using Hyperic data inside AppInsight will help speed RCA.
Analyzing Performance Problems
In production, triaging performance issues requires correlating multiple events from multiple sources. In many cases, a performance problem or an error visible at the application or user transaction level is caused by an underlying middleware or resource problem. vFabric APM lets the user view and correlate multiple metrics from various sources (middleware, network and code level) in order to pinpoint Root Cause Analysis and reduce Mean Time To Resolution (MTTR).
- In general, application performance is OK given all application objects health are green.
- Hit rate at ~ 204 hits per minute.
- Pay attention to the “Key Metrics” widget in the bottom of the picture where you can see the 3 default metrics: Error percent, Average latency and hit rate.
Simulated Performance Issue
We will now simulate a mistakenly implemented configuration change (we won’t disclose what that was just yet, let’s see if we can find this out in a few steps…)
- Hit rate declines rapidly, Average hit rate starts to decline from 204 to 141
- The Usage KPI turns red, which is being propagated to the Application KPI which turns yellow.
- Change notification is displayed in the Key Metrics over time graph, a clear indication that a configuration change related to JVM memory has been made. In fact, this change event is caught by Hyperic and seamlessly presented in AppInsight.
Root Cause Analysis
Having the spotlight turned to the JVM Memory, let’s add relevant metrics to the Key Metrics over time graph to help us in the RCA process.
Press the Customize button in the Key Metrics graph and select the
two metrics related to heap memory (Heap Memory Free and Heap Memory Used) in the Middleware section. This will now overlay these metrics on the Key Metrics over time graph and allow us to compare the heap memory metrics before and after the performance issue time.
Now looking at the Key Metrics over time chart, we notice that the 120.5 MB Free Heap Memory plunges to 0.2 MB and as a result Hit Rate drops from 224 all the way to 0. This lets us know that somehow heap has been constrained to the point where transactions are no longer possible. So what happened to the heap size?
Before Configuration Change After Configuration Change
The culprit must be in the JVM heap size configuration. A quick look at the setenv.sh file reveals the value was mistakenly configured to 24M instead of 1024M.
Reconfiguring the JVM heap size back to 1024M fixes the issue. Once the heap memory free is back to normal, our Konakart Application becomes green again!
Configuration mistakes are a common root case to severe performance issues. Analyzing performance issues may take a lot of time without indication for the initial RCA direction. In this example, utilizing Hyperic middleware metrics helped us find the culprit quickly by:
- Providing fast and clear indication of a performance issue
- Presenting informative change alerts helped focus the analysis in the right direction
- Overlaying AppInsight with relevant Hyperic metrics pinpoints the root cause analysis