Recently, we had the pleasure to talk to Manu Mathew, Senior Manager – Performance Engineering from Secureworks. Manu shared his team’s journey of supporting rapid service growth, some of the monitoring challenges they faced along that journey and how they overcame those challenges. Here’s a recap of Manu’s fascinating story.
About Secureworks
Secureworks is a world leader in cybersecurity solutions, combining machine learning with human intelligence to detect faster, respond smarter, and predict and prevent more threats altogether. Their information security services, used by well over 4,000 business customers worldwide, include detection, prevention, and remediation of cyber threats, processing more than 300 billion security events daily. Secureworks also recently launched Red Cloak, a Threat Detection and Response SaaS application built on advanced analytics, automation, and threat intelligence.
The Early Days
The engineering and operations side at Secureworks has a very diverse environment, with a good bit of Java workloads on Linux running on a multicloud that also includes AWS. For application services, they use MySQL, Cassandra, Kafka and more.
When Manu was hired about five years ago as the first official performance engineer, each new product release was a stressful occurrence. Compounding the challenge was the increasing volume of data to analyze should an incident emerge, driven by the high rate at which the company grew, the sheer volume of security events to process with new threats every day coming from new threat actors anywhere across the world. Secureworks is currently processing approximately 300,000,000,000 events per day from thousands of customers globally.
In the early days, when releasing a deployment, evaluating the real-time health of the production systems wasn’t easy. Over time, the production engineering team make it easier by:
- First, building a team that established monitoring as a practice, initially by developing in-house monitoring tools
- Next, they implemented various open-source monitoring tooling, which worked well for a while but eventually resulting in new challenges as their environment grew and complexity increased
- Eventually, they adopted Wavefront – a happy ending!
Over this evolution, they learned a lot about what makes useful metrics – and that insight became very valuable across their journey.
Challenges with Open Source and Traditional APM Tools
Monitoring at Secureworks had some growing to do. The performance engineering team tried a mix of open source tooling – one for alerting and another for trying to understand historical trends leading to an alert. Switching between these open source tools was time-consuming and added the challenge of trying to keep synchronized time stamps of events across disparate tools. Another challenge was scaling the monitoring architectures to handle the fast-growing volume of metrics.
They had considered various types of new monitoring tools before selecting Wavefront, also considering a variety of open source tools, traditional APM tools and so on. But the team did not like these tools because they all required a high-spend of limited developer resources to make it work together as a unified integrated platform. Some of the team started to implement Grafana as a unifying overlay, to compensate for the other tooling UI limitations, but that resulted in splintering the organization’s approach to monitoring.
As well, engineers looked in some detail at the combination of Graphite and Grafana to see whether that combination would suit them. Some engineers started to like this combination, but soon the same scaling and maintenance problems arose again: how to simplify managing all of the monitoring islands and associated pieces they had to integrate and maintain.
Eventually, they decided to go with Wavefront, a fully managed, SaaS-based monitoring and analytics service: They could make use of all their existing metrics pipelines, and assure the metrics got to Wavefront – then Wavefront takes care of everything else. When they started using Wavefront, all their scalability and maintenance problems immediately disappeared.
Starting with Wavefront Was Easy
The Secureworks environment is a multicloud environment with mostly private cloud and some public cloud. They already had implemented an infrastructure for gathering and forwarding metrics to a centralized location. So, it wasn’t too difficult to get that data flowing into Wavefront. They had just to put in a tap and send the data to the Wavefront proxies. Adding tagging to their metrics was new and exciting for them, as they knew how it was helping to search their data – a compelling Wavefront functionality that helps them find see trends across diverse data sources.
One migration issue to overcome was that some engineers had already built dashboards in Grafana. To make the adoption to Wavefront to go smoothly, the team leaders had to make sure they could create and use very similar dashboards in Wavefront. So, the Wavefront customer success provided both a tool and some code that they could tweak, to help them with the conversion. Secureworks engineers were then able to straightforwardly transfer most of their Grafana dashboards into Wavefront. This worked out well – almost a seamless transition.
“Getting metrics flowing into Wavefront is the easy part. Taking time to understand which metrics are important to track in your environment is the harder part but worth it. We’re surfing the Wavefront!”– Manu Mathew, Secureworks
Wavefront Delivers the First Pane of Glass for Secureworks
Wavefront was chosen to be the first pane of glass for Secureworks, unifying all performance metrics data in one location.
Today, practically all Secureworks engineers are already using Wavefront dashboards, charts, and alerts to monitor their environment. But the next step will be to make use of Wavefront’s distributed tracing, an emerging capability for the Secureworks monitoring ecosystem. In particular, they are now looking forward to looking at a service map displays in the Wavefront Tracing browser. The resulting application visibility will help them with debugging but also advance journey toward reducing the overall number of monitoring tools.
Instead of spending more time on the maintenance of additional open-source monitoring tools, Secureworks engineers can now be engaged in strategic matters. Now engineers can work on new service features that delight Secureworks customers, and they’re free to innovate instead of maintaining the monitoring.
Surfing the Wavefront: Clear Benefits for Secureworks
With Wavefront as a new enterprise observability platform, Secureworks got a unified view across all teams. Wavefront is their first pane of glass, helping Secureworks engineers to reduce MTTR dramatically, and eliminate triage delays from context switching across tools. The ability to quickly compose and share dashboards has become key to quickly diagnosis any production issues. Moreover, Wavefront’s technical support has been much appreciated all along the way, including a responsive Wavefront Slack channel. Finally, beyond MTTR reduction, by freeing engineering resources to focus on what matters for delighting customers and working on critical business features, Wavefront has had a positive impact on the Secureworks business.
Get Started with Wavefront Follow @stela_udo Follow @WavefrontHQ
The post Secureworks Surfing the Wavefront with Observability appeared first on Wavefront by VMware.