Scale on Demand Unify Multi-Cloud Operations

Multi-tenancy in vRA 8.1 – Planning, gotchas, and best practices

Multi-tenant architecture is increasingly becoming the preferred option for businesses, but the set-up process is often cumbersome. That’s not the case with the newly-released vRealize Automation 8.1. VMware engineer Nikolay Nikolov demonstrates how easy it is to enable multi-tenancy using vRA 8.1.

vRA 8.1 is out! It’s got a lot of new features and some of them even make me want to jump for joy. PowerShell support for ABX is one of them. PowerShell support is rather self-explanatory, but I think there’s a feature that deserves more attention, so I’ve indulged in writing about it.

My guess is you’ve already read Karl Fultz’s excellent introduction to multi-tenancy .

Multi-tenancy in vRA 8.1 is very different from what it was in vRA 7.x. There is no embedded vIDM anymore, so the multi-tenancy approach had to change in almost every aspect. Things have become rather complex from a planning perspective, but if you’ve prepared well, the configuration process is quite simple. That’s because everything is done via the Lifecycle Manager with easy-to-use (really!) wizards.

Let’s start with the concept. In 8.1, the multi-tenancy is based on domain names, and not URLs as it was before. Here’s an example of a user trying to log in to a tenant called, for lack of ingenuity, “tenant1”:

How do we configure these FQDNs? Before I describe the configuration steps, I’ll review the starting point for a highly-available environment. The DNS records you need for a clustered environment are:

Clustered vIDM
•       1 DNS A record for each appliance •       vidm-01.rainpole.com (192.168.100.12), vidm-02.rainpole.com (192.168.100.13), vidm-03.rainpole.com (192.168.100.14)
•       1 DNS A record for load balancer •       vidm-lb.rainpole.local (1192.168.100.11)

 

Clustered vRA
•       1 DNS A record for each appliance •       vra-01.rainpole.com(192.168.100.16), vra-02.rainpole.com (192.168.100.17), vra-03.rainpole.com (192.168.100.18)
•       1 DNS A record for load balancer •       vra.rainpole.local (1192.168.100.15)

Once you have a clustered environment, you can enable multi-tenancy. There’s a very friendly wizard to do it, and it tells you exactly what requirements you need to meet for successful execution. Basically, you need another A record for the so-called master tenant for vIDM, and it needs to point to the load balancing virtual server IP address:

•       1 DNS A record for the master tenant different from the load balancer FQDN. It should point to the load balancer IP address. •       vidm-master.rainpole.local -> 192.168.100.11

That’s all. LCM will reconfigure vIDM with the new FQDN, and also reconfigure vRA to use “vidm-master” as an authentication endpoint. You no longer need the old “vidm-lb“ record. The new master tenant is your default tenant now. If you have already made any configurations to vRA on the default tenant, they will be kept.

Once you’ve enabled multi-tenancy, you can begin creating additional tenants. For our “tenant1” we need:

vIDM
•       1 DNS A record for each tenant pointing to the IP address of the load balancer FQDN or appliance FQDN if using simple install •       tenant1.rainpole.local -> 192.168.100.11

 

vRA
•       1 DNS CNAME record for each tenant pointing to the load balancer A record or appliance A record if using simple install. The CNAME record should append to the vRA load balancer or appliance FQDN as if it is a subdomain. •       tenant1.vra.rainpole.local -> vra.rainpole.local

 

Yes, it has to be a CNAME for reasons related to how vRA runs its microservices, and one major requirement for their healthy status – there should be no records of type A in the DNS other than the one pointing to the load balancing virtual server IP address –  that is, there should be only the “vra.rainpole.local” A record and nothing else.

If we follow the same logic, then for “tenant2” we need a DNS A record for tenant2.rainpole.local pointing to the load balancer of vIDM, and another DNS CNAME record for tenant2.vra.rainpole.local pointing to the vRA load balancer endpoint.

I bet the diagram at the beginning of this post is starting to make sense now.

I’m sure you’re already thinking of creating a separate sub-domain to host vRA tenants. This is definitely possible; it’s even recommended.

The really painful thing about all this is that you have to think about certificates. Both vIDM and vRA need only one certificate each. This means that for every new tenant, including the master tenant, you must generate and apply a SAN certificate to each system.

This also means there is an incurred downtime for your vRA (ouch!).

Following our example, if you want to create “tenant1”, you need:

vIDM A SAN-based certificate which includes:

–       vIDM appliances FQDNs

–       Load balancer FQDN (Master tenant)

–       Tenant1 FQDN

vRA A SAN-based certificate which includes:

–       vRA appliances FQDNs

–       Load balancer FQDN (Master tenant)

–       Tenant1 FQDN

 

Currently, there is no option to add a different certificate for each tenant.

And think twice before using wildcard certificates and DNS records.

How about vRO? Well, the embedded vRO does not create new tenants, so you have two choices:

  1. Work with the embedded vRO and add it to each tenant. Users will be able to see and edit the same content.
  2. Add external vROs for each tenant.

Finally, here is a list of best practices from me:

  • Always install the environment with multi-tenancy in mind. That is, always be prepared with a master tenant before installation. This way, if you need to create more tenants, all you have to do is create the DNS records and re-generate and apply the certificates.
  • You should first enable multi-tenancy, introduce a master tenant, and then scale out vIDM. The master tenant can then easily become your LB FQDN. If it’s the other way around, you’ll have to create an additional DNS record for the LB, which will never be used afterwards.
  • Always watch the LCM logs when configuring multi-tenancy.
  • Do not ignore pre-checks. Any failure of a pre-check is a sign of the process failing.
  • If you’re using wildcard certificates or DNS zones (or both), always make sure to cover all security and operational issues that come with them (see next section for details).
  • Never use wildcard certificates, nor wildcard DNS records, if vRA will be facing external (Internet) access unless you are aware of the security issues. Cloudflare does not work with wildcard DNS records.
  • Try to find the best approach towards DNS zones and records at the very start. Switching to a new approach later is possible, but might lead to problems because of the manual work involved.

 

Oh, and really finally, there’s troubleshooting:

  • My tenant enablement request failed: Make sure your master tenant FQDN is different from the vIDM cluster FQDN.
  • My directories aren’t shown in the New Tenant Creation wizard: Make sure to run inventory sync for the vIDM.
  • My request failed and the error message does not show much information: Check the /var/log/vrlcm/vmware_vrlcm.log file for more details.
  • My vRA cluster failed right after creating the DNS records: Make sure the tenant DNS records for vRA are CNAME records and not A records. vRA requires only one A record pointing to the cluster IP address. Respectively, only one PTR record is required as well.
  • Tenant creation is successful, but logging into the vRA tenant fails: Check if vRA is registered as a client in the vIDM tenant. If not, delete the tenant in LCM and re-create it. If the problem persists, try re-registering vRA into vIDM from the environment operations in LCM.
  • Master tenant creation (enable multi-tenancy task) fails: Check if there is an A record for the master tenant in the DNS used by LCM and vIDM. Check the master tenant name is not the same as the original vIDM load balancing FQDN or virtual appliance FQDN if using a single node environment. If it is, vIDM tries to create a new tenant with the same name and fails. Restore snapshots of vIDM and LCM and retry. Or, just delete the request from LCM with API. If the reason is somewhere else, check the /var/log/vrlcm/vmware_vrlcm.log file in LCM for more details.