Open Source State Machine Replication
Project Concord was introduced to the world just over two years ago as the basis of a decentralized trust infrastructure being created at VMware. Incubated inside VMware Research (VRG) and led by Guy Golan Gueta, a scalable byzantine fault-tolerant state machine replication (BFT-SMR) system was devised in SBFT and prototyped. To make SBFT available to the world, Guy rewrote much of the code and opened it in Concord-BFT.
VMware Blockchain Begins
At this point, the project was at a crossroads. The consensus engine, and some of the associated code was open sourced in concord-bft, and there was a public paper, but not much else to help in the way of using the code. VMware needed to make a decision about whether to leave this as an interesting research project, or invest to build an engineering team around the project and enable its use by a larger community. Fortunately for us, VMware chose the latter, and we now have over 20 full time engineers working on concord-bft and related code bases. The goal of this blog post is to introduce the reader to what has changed in the last two years, what we have built as open source projects and what we plan to do in the future to enable more of our code to become open sourced and allow greater community contributions.
Test System Buildout
By late 2018, a small engineering team had started to be assembled to harden concord-bft and make it production ready. While concord-bft was already very thoroughly tested by Guy and a small team of researchers, there was a lack of automated testing infrastructure that the community could use to validate correctness, or test their changes to the system. Seeing a solid testing framework as a means to accelerate development with confidence, teach engineers about the system, and document the high level capabilities, we started building out a testing system in March 2019. By January of 2020, we named the testing framework Apollo and began more widely utilizing it, with large members of the team adding to both the tests and the framework.
Customization Repo
concord-bft was originally intended to contain only the core BFT code that implements consensus, arguably the most complex bit of the code base, therefore allowing community members to build their own products on top of a high quality byzantine consensus engine. The theory was that a user would implement a few key interfaces and would then have a custom BFT-SMR application built on top of a world-class consensus engine. While this logical separation remains a major goal of the project, it turns out that even to test the consensus code in an open source manner, we at least needed to implement some of the interfaces in the concord-bft repository. We also noticed that initially, such purity of the repo was not maintained as concord-bft already contained a C++ client, a state transfer mechanism and a threshold signature library. To ease development and testing, we started moving in more “example” implementations of interfaces from our closed source code base into concord-bft. Soon, we had open sourced our low-level storage implementation as well as our key-value blockchain (kvbc) implementation that allows a blockchain abstraction to be built upon the consensus and storage layers. Looking forward, we expect all of our core code to be opened, and likely moved into the concord-bft repo. This will allow users to mix and match components necessary to create their own custom applications, as well as provide thorough examples of how to use various interfaces when our higher level abstractions do not suit their requirements.
Transparent Design and Development
As the project progressed, and the team grew, we began to have more conversations around necessary features, as well as mechanisms to make the code safer. While there is much more to go, we started replacing many raw pointers by managed pointers, and abstracting large chunks of code into their own components. We also began discussing the architecture of the code base as a whole. While most of the documentation surrounding the rationale of these decisions, and the decisions themselves, remains closed source, we plan to move nearly all of it into the concord-bft repo in the future. We know we have a significant lack of open documentation, and to gain community acceptance we must not just open our existing design documents, but embrace the idea of designing in public, and opening the process to members outside of VMware. This is something we are aware of and seek to improve on in the coming year.
Available Componentry
Since it’s not clear exactly what we’ve done in the past two years, besides what has already been mentioned, we’d like to give the community a broad overview of the incredible amount of work that has gone into the project in that time. Some things are feature complete, while others are work-in-progress, and others are in early stages. The team is proud of each of the individual components and believe some may be useful outside any blockchain efforts.
- Switched all paths from using threshold signatures to multisignatures.
- Implemented a new C++ BFT client that allows for configurable quorums and replica specific information to allow receiving different data from different replicas.
- Added a C++ copy-on-write merkle tree implementation, similar to the one provided by Diem.
- Added TLS support for replicas and C++ clients.
- Added TLS support to the python client we use for testing.
- Added support for secure key rotation.
- Added support for pre-execution which increases execution concurrency by optimistically executing transactions at F+1 replicas before consensus and then performing conflict detection after commit.
- Introduced new batching policies to increase throughput during consensus.
- Enhanced the view change protocol to limit unnecessary view changes.
- Added a canonical serialization format (CMF) suitable for cryptographically signed data.
- Started to justify our decisions publicly.
- Created a diagnostics subsystem for retrieving internal application state and performance data at runtime.
- Added support for Read-Only Replicas backed by an object store that only participate in state transfer, but not consensus.
- Added key generation support.
- Utilized great testing and collaboration to fix a view change bug involving a restart.
- Added many tooling related enhancements.
We hope to evolve our open documentation, and build a more inviting community project in the coming year. We invite you to join our community, to both benefit from and contribute to the very best practices for the future of blockchain technology.