kubernetes

Five Key Decisions to Make Before Running Kubernetes in Production

Every day, organizations make decisions about technology. From which application features to prioritize or which bugs to fix first, or even which tool is best for the job at hand, the sum of all these decisions can make or break an organization’s strategy. Teams adopting Kubernetes are no different. Understanding the trade-offs when making decisions can make or break a Kubernetes deployment.

Multiple Clusters or One Cluster

The first decision to make happens before a single line of infrastructure management code is written: What kind of Kubernetes architecture do I need? Many opt for one large cluster and make the Kubernetes namespace the boundary for their applications and services. Some people instead go the other direction and take on the operational burden of managing multiple Kubernetes clusters based on hardware requirements, application lifecycle, teams or divisions, or other organizational-specific need.

Like all technical decisions, there’s no right answer. It is better to discuss the trade-offs for each architectural decision and go in with your eyes open. For those new to Kubernetes, it’s often a great idea to start with one cluster for a select few applications in an effort to get some internal wins before moving forward with migrating more applications. For lower-impact production applications, you can use one cluster as a learning experience to become familiar with cloud native applications and cluster management.

If you are further along in your Kubernetes journey, you can choose to run more than one cluster. These clusters may be segmented by services, teams, the underlying infrastructure platform, or other factors, such as whether services need persistent state or the application’s stage in its lifecycle. Multiple clusters make you more resilient in the case of an outage (assuming the same application is running in different clusters) but adds operational complexity around cluster management. Separating applications at a cluster level provides more flexibility with regards to hardware topology.. One cluster might be for persistent-state applications with very fast storage, while another cluster for stateless applications might have no shared storage at all.

Container Image Management

Kubernetes, in its simplest definition, orchestrates applications made up of container images. How those images are built, maintained, secured, and deployed will have a huge impact on whether teams are successful with Kubernetes.

Teams will need a container registry, whether that registry is on premises like Harbor or offered as a service and run by a provider like Google’s Container Registry. Images should be small so they can be built and deployed quickly, and image tags should reflect released versions (in production) or a SHA-1 hash generated with a git commit (for development). Images should be tested for security vulnerabilities both at build time and on a regular interval to make sure they can be deployed safely.

Resource Allocation

Once decisions have been made on how many clusters to deploy and how images will be built and distributed, you need to decide how tightly to pack worker machines. The Kubernetes scheduler will use requests and limits set in manifests when it makes scheduling decisions. If you do not have these set, the scheduler can do silly things like putting everything on a single worker node. Although you want developers to set proper resource requests on applications, setting default requests and limits at the namespace level will at least provide some guardrails for the Kubernetes scheduler to make better decisions. Setting quotas are also important so one team can’t take over a shared cluster.

More advanced Kubernetes administrators will want to set up Pod Priority to make sure high-priority applications are not affected by low-priority applications.  

Role-Based Access Control

How will your users access Kubernetes? Will developers get direct API access using the kubectl command-line tool? Will a deployment system access Kubernetes through service accounts? The documentation on Kubernetes role-based access is quite lengthy. Let’s start with what not to do: The worst thing to do is pass around a kubeconfig file with cluster-admin permissions to every system, team, or developer.

There are some real-world ways to start and evolve with role-based access control. Kubernetes comes with four ClusterRoles that you can use to build your access strategy:

  • cluster-admin: Administrator access across the cluster.
  • admin: Administrative access in a namespace (except for quotas)
  • edit: read-write for most objects in a namespace
  • view: read-only access in a namespace

Most companies start out using these built-in ClusterRoles and linking them to groups of users in their organization. The operations teams that manage clusters get cluster-admin permissions. Developers get admin or edit in their namespaces. Team leads or project managers get the view permission to see what is happening. Open source tools like Dex can help link traditional authentication systems like Active Directory to Kubernetes RBAC. Start with these, and then get more granular as needed.

Observability: Logging and Monitoring

A lot of companies that are adopting Kubernetes have a logging system, such as VMware vRealize Log Insight, already in place for their existing systems. In such cases, it’s a matter of how to aggregate and ship application and cluster logs to the external logging system. Fluentd is a CNCF project that can aggregate and ship logs for applications and Kubernetes itself to vRealize Log Insight.

Some organizations have developers run their own logging stack. Fluentd is great for aggregation and shipping logs no matter where they all end up, but several other log aggregation tools have Kubernetes support, such as Elastic’s Beats. Elastic maintains Helm charts to deploy Elasticsearch and Kibana on Kubernetes. If you do choose to run your log storage systems on Kubernetes, consider a multi-cluster approach. If an outage occurs, you don’t want to be without your logs.

On the monitoring front, Prometheus is the leading open source monitoring tool for Kubernetes. It is a CNCF project and can be deployed as an operator with built-in dashboards and alerting rules. Once you start deploying multiple clusters, it’s easier to consume monitoring as a separate service using Wavefront by VMware for applications and Kubernetes.

This is not an exhaustive list of decisions to make, but covers several that are high on the list for organizations adopting Kubernetes. Here at VMware, the cloud native architecture team helps organizations determine the trade-offs of each choice and decide how best to succeed with Kubernetes. What decisions are you facing and how can we help you?