cloud native data databases DevOps Best Practices kubernetes Platform Engineering Best Practices podcast

The next generation of SQL is about simplicity and global scale

SQL has been around since the 1970s and, despite any number of attempts to displace it, SQL shows no signs of going away anytime soon. However, the world's most popular query language (as well as the relational database model, in general) continues to evolve with users' needs, sometimes inspired by new types of databases that aim to address SQL's shortcomings. Today, a new class of SQL databases—developed by startups such as Yugabyte, as well as every major cloud provider—is overcoming historic limitations in terms of speed and scale, and is even pushing the CAP Theorem to its breaking point. 

In this episode of Cloud Native in 15 Minutes, Peter Mattis—co-founder and CTO of Cockroach Labs—explains CockroachDB, his company's attempt at building the next big database, and the driving forces behind an entire marketplace of next-generation, distributed SQL offerings. Among other things, Mattis focuses on the value of operational and developer simplicity, even when scaling across data centers; the relationship between next-generation databases and next-generation platforms, such as Kubernetes; and the rationale for adopting an open source database. He also discusses the parade of technological advancements that make geographically distributed transactional databases possible.

Here is a transcript of the discussion with Mattis, which has been edited for readability.


DERRICK HARRIS (PIVOTAL): Can you explain in about 30 seconds what Cockroach Labs and CockroachDB are, and the idea of distributed SQL, in general?

PETER MATTIS (COCKROACH LABS): The idea behind distributed SQL came out because we noticed there's a gap in the landscape. There are these NoSQL databases, which were … promising easy horizontal scalability, and yet they tossed out so many easy-to-use functionalities that developers like, like indexes and transactions. And then the other side of the spectrum, there are traditional SQL databases that provide indexes and transactions and all sorts of other goodness, but they didn't have that horizontal scalability. So we wanted to build a system that had the best of both worlds—easy horizontal scalability, with all the SQL goodness that makes developer lives nice.

 

What's happening today, business-wise and technology-wise, that makes these new variants on SQL databases so valuable?

I would look at it not so much as valuable, as possible. Why are these all coming out near the same period of time? It's like why did multiple people independently come up with the idea of the atomic bomb? Groups come up with these breakthroughs.

Well, there's a technological change that occurred over the past 40 years since databases have been in existence, and that is: Computers have gotten faster. We all know about Moore's Law, but networks have even outpaced the improvements in processors, and now I can read data from a remote machine—especially one that's nearby in a data center—faster than I could read it from disk. That's pretty incredible. Not a lot of people realize it outside the industry; inside the industry, it's well known.

And then the other big change is that we actually have a much better understanding of how to do distributed transactions and distributed consensus. All this stuff has come out in the past 20 years about Paxos and Raft, which are distributed consensus protocols. And then there are distributed transaction protocols that exist on top of that.

So I think there's a realization by a whole bunch of different people that they can put these things together and make a distributed SQL system. And what we're seeing is various different ways to piece together these components into a working system.

 

Are there specific application types of that benefit from that? Why would I be interested in this type of a system?

Any of the applications that were being built on top of NoSQL systems—systems inside Google would have been BigTable, and, outside, say systems like Cassandra and HBase—you can build those on top of something like CockroachDB, as well. And the workloads those work well for are ones where you're doing a mixture of reads and writes. CockroachDB provides better transaction semantics for those workloads. So some things that might have been awkward to do in Cassandra are a little bit more natural to do in CockroachDB, or significantly more natural to do in CockroachDB.

Then one of the things that distributed SQL, in particular CockroachDB's distributed SQL, unlocks is the ability to do multi-region clusters and have geo-distributed replication. With the emergence of applications that have global concerns—global concerns in the sense of regulatory compliance concerns, like GPDR, or just users that are spread throughout the world —being able to have a database that scales and can adapt to where those users are is an important bit of new functionality that we provide.

 

Is CockroachDB, or anything similar on the market, a replacement for an existing Postgres or MySQL plus NoSQL database setup, or are they really targeting different workloads?

Oh, they're absolutely a replacement. If you were to use Postgres or MySQL directly, these are functional, battle-tested databases. And, yet, setting up the replication is headache, dealing with the failover is a headache, and dealing with the scalability …

Once you get to a certain size limit for a Postgres or MySQL instance, you actually end up having to do sharding, and that sharding is this huge burden on application developers. It requires them to put logic in their application to worry about how [they’re] spreading [their] data across multiple Postgres or MySQL instances. So any of those workloads that you're putting on Postgres or MySQL before, you can put on CockroachDB and we just handle that sharding for you. We handle all the replication just out of the box. The availability story is just much, much smoother.

 

Do users typically come at a certain size or scale? I would imagine that if I'm a two-person shop, I might not need that level of replication or distribution? What does that usual user profile look like?

We we see a mix. We see people who have a relatively small amount of data and just want to make sure that the high availability that story you provide is much, much cleaner. And we also see people and shops that are talking about very huge workloads.

One of the things that the industry as a whole is moving toward is database-as-a-service and providing the database as a service. If you’re a two-person shop, you don't want to be running Postgres or MySQL yourself; you're probably wanting to run [Amazon] RDS or whatever the equivalent is on Google. And we are also offering Cockroach as a service via a managed service offering. That's currently in beta and we expect to launch it into wider availability this fall.

 

If I decide to adopt a distributed SQL database, what do I need to do differently in terms of managing it and in terms of writing to it? I assume it's not a one-to-one mapping in terms of skills, but maybe you’re going to tell me differently.

There is a lot of overlap, and that's one of the reasons that we actually adopted SQL as the interface language for CockroachDB—SQL is lingua franca for the data-manipulation industry. There are just hundreds of thousands, millions, of developers who know SQL, who know the intricacies of SQL, and most of those just map perfectly onto CockroachDB.

The thing you do have to be aware of when you're getting into multi-region deployments is that you are having to be concerned about the the inter-region latencies. We don't violate the laws of physics there; if your data centers are on different sides of the continent, there's going to be latency between them. But when you know about those latencies, you can take them into account in your schema design. It's essentially equivalent to the realization that, "Oh, I need to know about the concept of an index inside a database and what impact that's going to have on performance."

There's a similar bit of concern with regard to the geo-distribution of your data. You need to take it into consideration, but it's something people can wrap their heads around.

 

There are so many cloud providers out there offering other sorts of these distributed databases. Why not just use Amazon Aurora or Microsoft Cosmos or Google Spanner? What's the what's the benefit of actually deploying software or deploying a specific database?

There are multiple answers to this. One is we're offering capabilities there that you don't get from Spanner or from Aurora yet, in our geo-distribution capabilities.

The other big answer that affects a ton of customers is that there's that vendor lock-in. If you're going with Aurora or you're going with Spanner, you're locked into those vendors and that can be a significant hardship … These big cloud providers, they're providing data centers all over the place, but sometimes they have a gap in their coverage and that can affect your users. But I think it solves that kind of cloud lock-in that scares a lot of people.

And this is becoming more and more of a concern. Amazon is just the huge elephant in this room; it's gobbling up mindshare. Are they benevolent, or not? It may be a little bit too early to tell. But even if they are benevolent, they can just accidentally turn and stumble over your company and crush it. I think a lot of people are starting to become concerned about that.

 

In the world of next-gen databases, there are these big big cloud providers and then there are a lot of startups. Lock-in aside, why would I trust a start-up if I'm a CIO or someone with decision-making power and budget? Why would I trust the startup for something mission-critical instead of one of these big cloud providers?

It's an excellent question. One of the reasons we provide the source code and went open source—now, we have a slightly modified [offering] where our license isn't pure open source, but it still provides a lot of the same benefits—is that the concern is: If we just suddenly fold up shop, what happens? And this is least gives some modicum of … people still have access to the source and a community can form around it to support it. That's our failsafe concern.

The other thing is that Postgresql and MySQL, they're heavily invested in by these big cloud providers, and yet they're still open source themselves … It's not like these are the flagship database products that people are putting their data on, they're just developed in the same way that the Cockroach is. We put significant effort into testing for correctness and stability and performance for our system. It's essentially like all those same development practices that they are doing, we do as well.

 

Speaking of the cloud, how do some of these new SQL databases to relate to other cloud-native technologies and practices? You might look at microservices or Kubernetes or Kafka, serverless and functions. Is there an inherent connection or integration with some of these other new practices?

Yeah, certainly. Kubernetes is the one. There's this huge wave occurring in the cloud, which is everybody's moving to Kubernetes and putting their microservices on top of Kubernetes. And for the most part, that works well. Kubernetes provides a great substrate for running stateless applications.

But then we have the stateful applications, and these are the ones storing data in databases, and then you have to wonder, “Where's the database itself running?” Cockroach actually runs really nicely on top of Kubernetes, and that is our number one distribution mechanism. The number one mechanism by which people run CockroachDB is on top of Kubernetes, and then they'll have Cockroach running on Kubernetes, their application running on Kubernetes—you have all the goodness from the Kubernetes scheduling.

One of the things that this new wave of distributed SQL databases is doing is because we have the replication baked into the product, it handles that elastic scale-out that Kubernetes offers and it handles the failover that Kubernetes offers. Where you don't get the same behavior from something like trying to run MySQL or Postgres on top of Kubernetes.

 

Do you see a common, or at least a relatively common, kind of data architecture as you're looking at customers and what they’re running? The idea of Kubernetes as a substrate for for running this stuff is intriguing, so I'm curious if there's standard architecture of Cockroach or SQL plus other components.

I can't say we see so much above the stack that's common, there seems to be a little bit more froth there, but certainly Kubernetes as the substrate. This has actually just completely changed over the past four years. When we were getting started with Cockroach Labs, I think Kubernetes was announced just right around that time or just slightly afterward, and it was like, "Oh, let's see how this goes."

And now it's just every conversation and every customer is asking about this. And this goes from small startups all the way up to the big Fortune 500 companies that are using us and experimenting with us—they all have some kind of Kubernetes mandate it feels like. So that's really the common threat you see going on there.

As far as other parts of the stack—people using us from JavaScript, from Java, from Go, from Python—it's quite a bit more diffuse there.

 

It seems like most new databases—including CockroachDB and some of its alternatives—are open source. Why is that? And how should enterprises think about using the free versus the paid versions of those products?

It's kind of two questions there. So the first question is, “What was the motivation behind using open source for these products?” And part of that is there's been some problematic history behind closed source databases provided by startups. FoundationDB was the one that occurred just right around the time, four years ago, that Cockroach Labs was getting started, where they were closed source, not-quite NewSQL. They weren't quite as advanced as CockroachDB in some ways; they were more of a key-value-store-type NoSQL. But they were closed source and they looked like they had a really nice architecture and everything, and then they got bought by Apple and got shut down.

We were paying attention at the time and were like, "Ah, you know customers are going to have a hard time adopting us if the rug can get pulled out at any time." So that was one of our motivations for going open source, is to provide this guarantee that no matter what else happens to us—we get bought, whatever, we can go under—you know that this is going to be there and you have that failsafe there.

Another aspect is just the marketing, the go-to-market of using open source. There's kind of a heavyweight go-to-market asking people, "Hey, try my closed source database software." And you usually have a kind of heavy sales effort involved there, where it's like you want to get access to the binary for trial purposes, a 30-day trial or what-not. What open source does is allows you to go in the back door; developers can just download your software, start trying it, kicking the tires and what-not.

They can even peek under the hood to see if the claims you're making on your website are valid. Someone says, "Oh this is absolutely reliable," and then you actually peek under the hood and you see what their coding practices are and you see what their testing practices are, and your eyes can open wide and be like, "Oh, no way would I trust that."

And what we do is all in the open. People can follow along. I'm not sure if they do, but that's part of the rationale behind it. It keeps us honest to have that openness baked into what we do.

 

How should enterprises, or any users, think about using the free version versus the paid version of an open source product?

One of the main reasons that people use our paid version is we actually put features into that enterprise version that you want to have if you're running in production. And we try to draw the line between our free version and paid version based on things that a small startup might not necessarily need when they're just getting started, but you almost certainly want when you're getting to any sort of scale. The most obvious example of that is backup and restore is on our paid version, and any big enterprise that's using a database product is probably going to want backup and restore.

But above and beyond that, there's usually support contracts that come along with our enterprise version, and that's also kind of a critical need for any enterprise. If you're putting—and we want people to put their crown jewels of their data into CockroachDB—it's almost a fiduciary duty to have a support contract at that point with the company providing that software.

 

Finally, Peter, what does your ideal future look like with regard to SQL and database technology, in general? In terms of capabilities, but also in terms of how you would like to see organizations and users architecting and building new applications to take advantage of what's coming down the pike.

The thing I'm excited about is this evolving landscape of applications that can work in a global manner, and having geo-distributed applications that can take advantage of a geo-distributed database. And there's going to be enhancements to the SQL abstractions to make this easier to do. I'd say we're still in the early days of this. CockroachDB provides a lot of mechanisms for doing geo-distributed applications, and yet those mechanisms are a little bit rough around the edges. So the area I'm excited to see innovation in is improving those SQL abstractions, improving how that functionality works, so it's just crystal clear and easy for developers use.

And then also seeing this evolution of a global app tier come into play. It's so easy to get a VM running in a data center, but actually getting an application running is quite a bit harder, then trying to get a geo-distributed application running is even harder still. And all that stuff is going to be addressed over time; we're seeing progress in these areas.

Before it was starting a VM. Now, it's like usually you get a Kubernetes cluster running, but getting a multi-region kubernetes cluster is still hard. I think these things are all going to be addressed in the coming years and we're going to get to this point where if I want to spin up a global application, I'll be able to do that in minutes and deploy it in minutes. That's still kind of a far-off pie in the sky, but I think we'll get there.

Subscribe here

Cloud Native in 15 Minutes publishes bi-weekly, and you can find it on most of your favorite apps and platforms, including: