BOSH is a Cloud Foundry Project that automates the deployment and ongoing operations of distributed systems. While it can be used for almost anything, it’s primarily used by Cloud Foundry for both the Application Runtime (CFAR) and Container Runtime (CFCR).
Longtime Cloud Foundry operators are likely familiar with BOSH. It powers the platforms that support workloads for many Fortune 500 customers. But if you’ve just recently deployed Pivotal Container Service (PKS) for your Kubernetes environment, you might want to know a bit more about the release toolchain that’s powering your deployment.
BOSH fully automates the deployment of your PKS clusters. It also restarts failed components, recovers and replaces failed VMs. BOSH even handles seamless cluster upgrades for you!
Since I hadn’t worked with BOSH in a while, I figured I’d document my recent experiences with the tech. This isn’t a comprehensive BOSH guide. It’s more of a “cheat sheet” for a few helpful key commands and concepts you’ll likely need if you run PKS.
Getting Started with BOSH
BOSH is a very powerful (and therefore can be complex) system to automatically manage and deploy platforms like PAS and PKS. You don’t need a PhD in BOSH to get a lot of value out of it. Let’s first set the table with some high-level concepts:
-
Ops Manager– This is the first thing you deploy when you install PKS. It is where you configure your underlying infrastructure integration and where we’re going to interact with BOSH.
-
Director– The Director is the core orchestrating component in BOSH. The Director controls VM creation and deployment, as well as other software and service lifecycle events. Once you configure the infrastructure tile (in my case GCP) in OpsManager, the director is deployed in its own VM.
-
Bosh-CLI – the command line tool to manage BOSH.
-
Task Logs– Exactly as it sounds. Any work the director does (even just listing resource) is a task, and each one is numbered and logged.
-
Deployments– A collection of resources defined in a manifest. In the case of PKS, each cluster is a deployment
-
Instances– the VMs that BOSH has created and is managing.
Setting up the BOSH CLI
We’re going to use the CLI to interact with BOSH. You can download it and set it up on your own machine, but we’re going to use the copy installed on the Ops Manager.
If we ssh into the OpsManager vm and run the bosh env
command:
——-
$bosh env Expected non-empty Director URL Exit code 1 $
You can see that we don’t have the proper configuration for the CLI to work. The easiest place to get the config is through the OpsManager GUI:
After clicking on the BOSH Director tile, and selecting the Credentials tab, you’ll see an item appropriately named “BOSH Commandline Credentials.” and if we click on that we’ll see some JSON that looks like this:
{"credential":"BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh "}
So the data field for the “credential” key is what we need to run our BOSH CLI. If we copy and paste it into a shell prompt and add the env subcommand:
~$ BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh env Using environment '10.0.0.10' as client 'ops_manager' Name p-bosh UUID xxxxxxxxxx Version 268.2.1 (00000000) CPI google_cpi Features compiled_package_cache: disabled config_server: enabled local_dns: enabled power_dns: disabled snapshots: disabled User ops_manager Succeeded ~$
Well, we can see that worked but who wants to use that whole string every time? There’s a number of ways to set those BOSH environment variables but to me, the easiest is to alias the whole string:
alias bosh="BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=xxxxxxxxxxxxxx BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.10 bosh”
Now if you want to make that persist, you can add that command to the last line of your ~/.bashrc. If we just run `bosh env` we can see we’re all set.
Basic CLI Commands
Now that our CLI is working, let’s cover some important commands you’ll want to know.
bosh vms
This command will list all the deployments and their associated instances that the director knows about and is managing.
~$ bosh vms Using environment '10.0.0.10' as client 'ops_manager' Task 667 Task 666 Task 667 done Task 666 done Deployment 'pivotal-container-service-xxxxx' Instance Process State AZ IPs VM CID VM Type Active pivotal-container-service/xxxxxxxx running us-central1-a 10.0.0.11 vm-xxxxxx large true 1 vms Deployment 'service-instance_5a677c82-5be2-42f0-ac73-010e53953a21' Instance Process State AZ IPs VM CID VM Type Active master/5efdd3d8-ccdd-4523-8dde-65cd4eaf8c75 running us-central1-b 10.0.11.10 vm-6f5cae68-a151-4310-4cb9-43c989b79701 medium.disk true worker/90ad1b1f-bb69-4af5-8780-a930639fcf49 running us-central1-b 10.0.11.12 vm-dc87366d-659e-4047-4f3f-f2eef5a968aa medium.disk true worker/ee0176c5-26b6-4923-b5e8-02fc67a1e7da running us-central1-a 10.0.11.11 vm-ecf472bc-12a1-46eb-6b2c-0f50163433c2 medium.disk true worker/f9dd3292-1960-478a-9cf5-071b59c4d843 running us-central1-c 10.0.11.13 vm-f7073cf3-b271-4292-50de-8eb8c1c5daf0 medium.disk true 4 vms Succeeded :~$
Here’s the output from my PKS environment. Some things to look at:
-
At the top, you can see the task number. As I mentioned earlier, every operation is a task, has a number, and is logged. It’s useful to know the number of a given task for troubleshooting as you’ll see later.
-
There are two deployments—the first one is the PKS one which has a single API VM (this is what you interact with when you run PKS CLI commands) and the second is a single PKS cluster I have with one master and three workers. (I’ve exed out the UUIDs on the PKS instance since I’ll keep that but the PKS cluster is already gone so I won’t bother hiding those UUIDs)
-
For the VMs, the name on the left side (Instance) is what BOSH calls it and the right side (VM CID) is what the cloud provider (in this case GCP) calls it. You’ll need the former to interact with vms using BOSH commands and the latter if you need to do something on the cloud provider side (adjust security groups, etc).
-
The ‘Process State’ tells you if the BOSH agent is up and running and communicating with the director. In this case, running indicates it is.
bosh tasks
This will show you a list of currently running tasks and details about them. If you want to see recently completed tasks you can add the `-r` option.
bosh task [taskid]
This will show you a running log of whatever task is in process, similar to a docker log -f
command. If the task finishes (even if another one starts immediately after) the command will exit. If there is no task running, unsurprisingly it will respond with ‘No task found.’
If you optionally add a task ID number (remember earlier when I said knowing the task ID would be useful?) it will show you the log of that completed task.
In either situation, you can also add the --debug
flag for the full debug details.
bosh ssh
If I want to ssh into one of the vms for whatever reason, I’ll use the bosh ssh
command. This logs in with a temporary ssh key and a home dir that gets wiped out. The syntax is:
bosh ssh -d <deployment> <instance> example: ~$ bosh ssh -d service-instance_5a677c82-5be2-42f0-ac73-010e53953a21 master/5efdd3d8-ccdd-4523-8dde-65cd4eaf8c75
bosh delete-deployment
Let’s say you misconfigured your cloud credential permissions which led to a failed PKS deployment that it’s unable to delete (of course, that would never happen as you follow all the documentation perfectly). You’ll be able to clean it up using bosh delete-deployment as so:
bosh delete-deployment -d <deployment>
This command can fail as it tries to gracefully clean up the Instances so if it can’t it will stop. You can add –force at the end to have it ignore those errors and proceed
bosh stop –hard
This command shuts everything down in the deployment. It’s very useful if you’re running on a public cloud and want to save money when you’re not using it. The syntax is simple:
bosh stop -d <deployment> --hard
bosh recreate
This command allows you to force recreation of a whole deployment or individual instances.
bosh recreate -d <deployment>
Adding --fix
will to recover an instance with an unresponsive agent instead of erroring out.
These are just a few basic commands to get you started. The full BOSH CLI reference is available here. Hopefully, this will act as a good quick reference when you’re working with your PKS deployments. For more details on the BOSH architecture, there are some good diagrams available in the documentation.
If you’d like to learn more about BOSH and how it can help automate other operational tasks in your environment, check out this great whitepaper or watch this detailed talk about BOSH by Pivotal’s CTO of Cloud, Colin Humphreys.