By Nikolay Nikolaev and Venil Noronha
Open source networking is an exciting space to work in, but it’s also a little confusing. For example, what’s the difference between a service mesh and the Network Service Mesh? How do they relate to each other and how might they intersect in the future? If you’re unsure but would like to know, this pair of posts is for you.
In part one, we’ll take a quick trip through the history of networking before examining open source service meshes through the lens of Istio and Envoy, arguably the most popular and robust open source service mesh technologies in use today. In part two, we’ll explore how the Network Service Mesh project differs from a typical service mesh and share our thoughts about how the two kinds of services will likely intersect in the future.
Modern networking – how we got to service mesh
Almost as soon as we made the first computers, we wanted them to talk to each other, so we connected them with wires. And when we wanted to connect more than two at a time, we created physical switches that would do that.
When we began virtualizing computers a few decades ago, we replicated that model, linking virtual machines with virtual wires in networks enabled by virtual switches and managed by software-defined network controllers.
In the last few years, though, it became clear we had more to do.
While the lowest network layers (i.e. L2/L3) still switched physical packets of data, the new virtual networks were designed to route APIs more intelligently at the higher (i.e. L4/L7) traffic levels. They exposed their capabilities through REST or gRPC to components like databases or key-value stores, but much of their work remained highly inefficient. We needed greater and more intelligent control over communication between all these services. That meant being able to manipulate the routing of the APIs, which is where the service mesh came in.
The promise of a service mesh
A service mesh aims to overcome a basic problem with application service design. In the conventional model, application services have both a core functionality and also various kinds of management functionality that lets them both discover other services and communicate with them. This has to be built into every single service, which is obviously non-optimal. It adds overhead to each service and prevents application developers from focusing on the business logic of the apps they are trying to build. It also forces us to leverage external service registries to manage these services.
A service mesh addresses both these functionality and registry problems. Very simply, it injects a proxy alongside your workloads that allows you to discover other services in your system. It removes your need for an external service registry by acting as a service registry itself and allows you to discover services in your cluster.
In addition, a service mesh is observable – in any kind of production system, you have to be able to monitor your basic metrics. How is your memory being allocated? How many errors occurred over the last n minutes? Knowing these facts allows you to take specific actions, like scaling up or down, or fixing your code. But if your monitoring system is part of your application, that’s extra overhead you have to maintain. With a service mesh, that overhead is considerably reduced – you can use a monitoring system like Prometheus or Grafana and let the proxy automatically generate the statistics you need. You can also instrument your services to offer both distributed tracing and trace identification – seeing how a request is flowing through the set of services in your system.
Service meshes additionally have inbuilt features that add to resilience. They let you do circuit breaking, retries, and rate-limiting, for example. Again, because they are built into the proxy, your applications don’t need to deal with them anymore.
That’s also true for configuration because these proxies expose APIs that can be configured at runtime. So, you can introduce dynamic configuration and won’t need to restart your proxies or services in order for a new application or new configuration to take effect. That further increases service uptime.
Lastly, a service mesh enhances security. Traditionally, a security team would have to make sure every single application owner was following best practices on network security, which again pulls developers away from where they can add the most value. A service mesh ensures that every proxy communicates securely without each developer needing to write a secure transport layer in their service.
What do we mean by “service mesh” today? The example of Envoy/Istio
As they came to be developed, service meshes did this application-centric work between layer 4 and layer 7 in the network – not surprisingly, since those levels are typically where virtualized application networking has taken place. They grew especially popular with enterprise applications running in the cloud.
Envoy is a proxy and data plane component – so all the packets that flow through a system first go through Envoy before reaching an actual application. You can run Envoy in your system as a straightforward service proxy or as an ingress or egress. When it’s the latter, it’s essentially a gateway that redirects traffic to different services. When it’s a service proxy, it can generate statistics about the data flowing in and out of any particular service.
In addition, Envoy integrates with observability backends like Prometheus, Jaeger and other components. All of these are viewable to, and integrated by, the open source Istio control plane, which also allows you to inject new configurations into the Envoy API to manage the data flowing from service to service.
Envoy is just one of the components that Istio can run, but it’s a central one. Istio uses Envoy for service-to-service communication as well as managing the traffic that enters and exits the system. Other Istio components (such as Grafana and Prometheus) let you track the statistics of all the services you are running, showing you how they are communicating with each other. And Jaeger lets you trace a request as it flows through the system.
A few use cases
We’ll finish part one with a few service mesh use cases. One is from a major open source player, where a customer had been splitting its service into ever smaller pieces with the goal of deploying a microservices architecture. But as they neared completion, they found they had many small components that were trying to communicate with each other in a way that wasn’t observable – so they couldn’t tell what was calling what or much anything else, making it really hard to support the service in production.
Their solution was to deploy Istio as a way to trace and monitor the communication between these small microservices, allowing them to operate in production with greater visibility into issues as they arose.
Another example comes from a major bank that had a network of point of sale terminals running multiple firmware versions. Each version required a different treatment of the same service. Here, Istio offered a way to redirect traffic based on the type of point of sale terminal that was sending the request – every terminal could now request the same service in the same format because Istio was able to recognize the terminal and route the request appropriately.
While that’s a very specific case, it’s common for developers to want to expose one service to the outside world, but have it served by different backends depending on the specific properties of the request.
The promise of Network Service Mesh
While service mesh solutions like Envoy/Istio have done wonders for advancing virtualized networking, they haven’t solved everything. There are still workloads, for example, for which a service mesh can’t offer much help.
Next time, we’ll look at an open source effort that Nikolay has been involved in – Network Service Mesh – that aims to address some of these use cases and help push virtualized networking in a new direction.