Home > Blogs > VMware Research > Monthly Archives: July 2020

Monthly Archives: July 2020

DCM IS A SIMPLER AND FASTER WAY FOR DEVELOPERS TO BUILD CLUSTER MANAGERS FOR DATA CENTER SYSTEMS

A VMware Research team has created a tool that enables programmers to specify cluster management logic in a high-level declarative language, and synthesize the code to compute policy-compliant configurations automatically and efficiently. Their new tool, called DCM (Declarative Cluster Management), allows data center developers to use SQL to easily add, remove and modify constraints and policies, the essence of a cluster manager.

“The idea with DCM is to write policies in a declarative style where we use SQL to write what you would like the system to do,” said the DCM team leader, Researcher Lalith Suresh. “The details of how it gets done, the algorithms that you would otherwise have to write by hand, are automated. Behind the scenes there’s a compiler that will take care of a lot of the heavy lifting.”

Modern cluster management systems like Kubernetes, DRS, OpenStack and OpenShift are responsible for configuring a complex distributed system and allocating resources efficiently. Whether juggling containers, virtual machines, micro-services, virtual network appliances, or serverless functions, these systems must enforce numerous cluster management policies.

Currently, developers implement such systems by designing custom application-specific heuristics —- an approach that is proving unsustainable, as ad-hoc heuristics both perform poorly and introduce overwhelming complexity, making it challenging to add important new features. These heuristics have to continuously be adapted to work for arbitrary combinations of policies, making such systems hard to evolve over time.

With DCM, the developer maintains the application state in a relational database, and specifies constraints as database queries in SQL. The DCM compiler then generates code that can then be used to efficiently find configurations that satisfy all these constraints.

“DCM makes it very easy to get started with, for example, writing your own cluster schedulers,” said Suresh. “A lot of the complexity with building such systems is hidden by using DCM. Where these things typically take several years to stabilize, we’re hoping to cut that time significantly”.

“There is no other tool that has this capability today. The level at which we’ve lowered the barrier to building schedulers declaratively, I don’t think there’s any tool out there that gets anywhere close,” he said.

The DCM compiler uses structural information extracted from the SQL specifications. The tool generates code that efficiently translates the state from the database into an optimization model of the problem. At runtime, when a system configuration decision is to be made, the generated code extracts the current state of the system from the database, solves it using an off-the-shelf solver, and generates a new configuration that satisfies all the specified constraints.

USE CASE: KUBERNETES

In a paper published in 2019, the VMware Research team built a Kubernetes scheduler to show how DCM automates cluster management. The scheduler operates as a drop-in replacement for the default Kubernetes scheduler, supporting all its capabilities and adding new ones.

“We found that it was significantly easier to build our scheduler using DCM than how the Kubernetes scheduler is built today, which has more than 10,000 lines of code,” said Suresh. “With DCM, you basically have about a thousand lines of Java code, and a couple hundred lines of SQL, which we think is a significant benefit.

“Today, there’s really no way to easily build such policy-based cluster managers without having to code all of it from scratch. With DCM, your policies are easily specified using SQL” he said. “The tool handles how to take all the policies and find optimal decisions for you. So there’s a lot less work involved in using it.”

Lalith Suresh drove the DCM project after many years of thinking about how to simplify developing cluster managers. The VMware Research team working with him includes Senior Researcher Nina Narodytska, Senior Researcher Leonid Ryzhyk, Post-doctoral Researcher Sangeetha Abdu Jyothi, Research Intern João Loff and Research Intern Faria Kalim.

 

VMware Research released DCM as open source code in summer 2019. It is available for use on the VMware GitHub repository at

https://github.com/vmware/declarative-cluster-management/

For more details on DCM, download the paper Synthesizing Cluster Management Code for Distributed Systems, at

https://dl.acm.org/doi/10.1145/3317550.3321444