To dispatch or not to dispatch? This is an old question we ask ourselves again and again and it is still relevant in the new world of Cloud Automation Services (CAS). And the answer will always be the same: yes, we need to dispatch. We need a frame around the extensibility we want to build, to ensure it is well-protected from changes, provides decoupling, and so on. We need a more coherent and holistic approach to the cloud automation services when multiple people work on the same project.
In this blog post, we will explore the different paths one might go when building extensibility for CAS and why dispatching is the best option.
Let’s have a look at what we can do in terms of extensibility within CAS, in particular Cloud Assembly (CA). CA is a cloud-based service for creating and deploying machines, applications, and services to a cloud infrastructure. The same is true for vRealize Automation (vRA). So, if we are currently using vRA, we might find the extensibility of CA familiar.
First, we create a subscription where we specify the event type, for example provisioning of a virtual machine. Then, we add some filters to receive only the events we are interested in.
Finally, we configure the event handler, which can be either a vRealize Orchestrator (vRO) workflow, or an action, which runs in the cloud.
Even though, extending CA looks very similar to the vRA extensibility, it doesn’t work the same way, because vRA and vRO both run on premise, while CA runs in the cloud. So, in order the CA to collect resources and provision machines, we have to deploy a data collector appliance on premise. The same mechanism is applied when building extensibility. We use a workflow from a vRO that runs in our datacenter (see the example screenshot, the workflow DumpProperties).
The Challenge
Extending the provisioning process with a single workflow for a single blueprint seems quite straightforward to do. But let’s look into a real world example. A typical enterprise may have 10 integration points and at least 4 different OS flavors. Make no mistake, their integrations can easily grow to a hundred if they are in the financial sector. And just to spice it up a bit more, they would like to provision to at least 2 clouds and on premise. So, how to handle the complexity of different combinations of OS flavors and integration points.
Option 1: Subscription per Workflow, per Workload Type
We use the CA out-of-the-box functionality and we get a decoupled and very hard to follow architecture with hundreds of subscriptions that differ by filters, such as “I would like this workflow to run only on Windows machines, On prem, with priority of 16”.
And here be the dragons – how to manage this huge amount of subscriptions.
Each workflow receives input parameters in a form of a payload. The payload contains data, such as the virtual machine ID, request ID, project ID, deployment ID, owner, etc. These IDs might be used as is in some cases, but most of the times, the workflow will resolve the VM object based on the ID and then, it might need to retrieve some other data, for example the name of the project. So, doing this for each workflow, we will quickly end up with a lot of duplicated code.
Option 2: Subscription per Event Type with a Master Workflow
We reduce the amount of subscriptions that we have to manage by having one subscription per event type and one master workflow (I’ve seen this a lot). This way, we have less subscriptions to worry about and a central entry point, which makes it easier to troubleshoot and reduces duplication. The master workflow will handle the different types of integrations required for the different types of workloads.
It sounds like a good option but what’s the catch?
If we do it directly in the workflow, with boxes and decision elements, we will end up with a huge workflow. And every time we want to update something, we will have to change a big junk of logic. So, one must definitely take into account the possible drawbacks:
- Increased chance of breaking other things, since they are tightly coupled and in the same “file”.
- Increased chance of conflicts, because everyone will work on the same huge central piece of logic. One can even override someone else’s work by accident.
- Everyone has to understand the whole thing in order to make changes in the automation.
Code will be duplicated only among the master workflows … hopefully. However, all integrations will have access to everything in the scope of the huge workflow, which makes it very easy to break data that others need and almost impossible to find a way around the hundreds of global vRO attributes.
Option 3: Subscription per Event Type with a Dispatch Workflow
We use the same approach as in option 2 but instead of using one master workflow, we use a generic dispatching mechanism. It will give us:
- An easy way to map which operation to run at which state for which workload.
- An enhanced payload, possibly lazy initialized, which would reduce the amount of code we would need and allow us to focus only on the business use case within the actual extensions. This increases the complexity in this central place, but it dramatically reduces the duplication and complexity everywhere else.
- Decoupling between the integration points and the extensibility mechanism provided by the system to allow different people:
- To focus on their task without worrying that they might break someone else’s work.
- To achieve a coherent and holistic approach to their extensibility development while working on a project independently.
Now the question is not whether to dispatch or not, but how to design the dispatching in a way that makes it easier to specify what to run, when, and for what. We used to do it with custom properties where everything is controlled in vRA, but now we have the opportunity to rethink our approach and to find a better way. I am personally I fan of the simple callback-based definitions like in a code-first fashion:
It gives a simple way to map the criteria with the actual execution code as in plain source. Stacking a few of those in the same file based on the context they would run in, makes it really easy to manage the extensions in one central place. So, when it comes to what will run and when, one can quickly figure that out by looking at the few lines of code.
We are currently experimenting with some other things as well, so stay tuned for more on the topic.