By Caleb Stephenson – Caleb works in the VMware OneCloud group providing internal private cloud services to VMware. He is currently working on systems management problems facing cloud infrastructure.
Anyone who has used VMware vCloud Director knows that it can be difficult to manage once you start scaling out, especially when something goes wrong. As your environment grows mysterious issues tend to start popping up and without good visibility it is really hard to tell what exactly went wrong and more importantly, why. The VMware OneCloud Team has experienced many of these times over the recent past and now would like to share a tool to help make your lives all a little bit easier in this area.
The newly released Log Insight Content Pack for vCloud Director is available for free download on VMware Solution Exchange and is designed and built to capitalize on the new features in Log Insight 2.0! Originally designed and built for internal use, this content pack is used as a critical piece of how we manage and monitor our production cloud instances. Using this new content pack you will now be able to better visualize what your vCloud Director environment is doing, the trends of all your vApp operations as well as filtered error trending on top of all the log aggregation that Log Insight already provides. As an additional benefit, the OneCloud team has been experimenting with what alerts are important to a vCloud Director environment and feel that they have a good set of alerts that you can now capitalize on to bring a quicker and more accurately targeted response to issues as soon as they surface.
Good operational infrastructure is normally all about balanced trends, if something is broken then there is a trend that has also been broken. Being able to visualize those operational trends is critical. Without that visualization how could you tell before that the reason a single cell was unresponsive was because a user kicked off 150 jobs against that cell to modify vApps? Or that a load balancer was stuck and all operations were being serviced by a single cell? Or that all cells but one were successfully deploying vApps but that one unresponsive cell was responsible for all your failures?
Here are a couple of screenshots that show you what you can expect from a couple of the dashboards that this new content pack will provide: