In this article I would like to provide some suggestions on following topics:
- How to enable pre-defined compliance alerts in vRealize Operations Manager?
- How to create and modify the rules?
- How to create a detailed compliance report?
But first things first, why should we care about it?
IT Compliance has always been an interesting topic and a critical concern for IT organizations. No one wants to see one’s name in the headlines about outages, security breaches or stolen customer data. Don’t get me wrong I am not saying it can’t happen to you if you are compliant, but at least you can prove you did your best to avoid it. If you loose your server room key in a karaoke bar, no hardening guide will help.
There are at least three different kinds of compliance rules: regulatory compliance (for example PCI DSS), vendor best practices compliance (for example vSphere Hardening Guide) and internally defined compliance rules.
vRealize Operations 6 (6.0 and 6.0.1) provides alerts and symptoms which allow you to check your ESX host and VM settings for their compliance to the vSphere Hardening guide. This is a classic implementation of a vendor best practices compliance check.
So after the installation, you would only need a few clicks to bring life and color to your compliance badge. In most cases, the badge would be red, because most real-life environments are not “hardened”. You can see the details (symptoms which are “fired”) and start filing change requests.
Self-awareness is the first step and sometimes we just need to lower expectations ( or better – analyze the rules and identify the ones which are important and applicable) in order to be compliant and get this badge green. Most of my customers do not intend to be completely compliant to the vSphere hardening guide rules.
What they want is a basic compliance to their own rules – internal configuration compliance. They want to ensure things are configured in a way they are supposed to be configured. They want to define operational compliance, which may be a compromise but is suited to their environment in particular.
Having predefined alerts and symptoms for vSphere Hardening guide is great plus: each symptom could be considered and used as a blueprint for your own definitions. And if, for example, you want to have SSH enabled instead of disabled on your hosts in a test environment, you can create a rule, which will check exactly that.
How to Enable Pre-defined Compliance Alerts:
If you look at the alerts in vRealize Operations 6 and enter “hardening” as a search string, you will see two predefined alerts:
- One for the ESXi hosts
- One for VMs.
You can select one of the rules and click on the edit symbol and underneath symptoms, all the contained symptoms are displayed, each symptom representing one particular rule. At the time of writing in 6.0.1 version, you should see 20 symptoms for hosts and 49 for virtual machines. If you see less, you did not check the “overwrite out of the box content” option during the update (which is not a big deal, as you can re-apply the update with force and overwrite option at any time).
In order to enable those alerts, you have two options: first you could go and edit your current policy, search for alert definitions and enable those two alerts (containing “hardening”) in your active policy by selecting the ” local enabled” option in the dropdown list. This process is described in the documentation section “Customize a Policy to Enable the vSphere Hardening Guide Alerts“. I would consider using an alternative method: instead of enabling the alerts in your default policy, you could create a new dynamic or static group of hosts and virtual machines, which are relevant and assign the prebuilt “vSphere 5.5 Hardening Guide” policy to this group. This process is described in “Apply Policy to Object Groups in the Policy Workspace” section. Using the second method, you will be able to change the scope of compliance by changing the group definition and avoid alerts for hosts and virtual machines, which is not intended.
Following screenshot shows the predefined policy being assigned to “HardeningGP” group:
As soon as you have done that, the compliance badge will show up underneath Environment -> vSphere -> Host System or VM Analysis -> Compliance. By clicking on the standard, you will see details with all the violated rules listed.
How to Change Compliance Rules or Add Your Own
Before editing pre-defined alerts it is a best practice to clone those and save it under a new name. Once you have your own alerts, you can start removing or adding symptoms, negating symptom conditions. You can clone an existing alert by clicking on the “clone” icon above and giving it a new meaningful name, for example: “_ABC compliance policy for production hosts ”
Now you should discuss each of the predefined rules with your colleagues, determine if it is applicable or not and leave it or change it or remove it. You can also go one step further and define your own symptoms. Select 4th step in the alert definition and select “Property” in the dropdown list (as most of compliance checks are based on properties, not metrics or events):
For example you could create a new symptom which will be triggered, if the configured remote log host does not contain “loginsight” (if you are not yet using Log Insight, you should just create a rule and deploy Log Insight today, you will not regret). Please always think about adding some kind of key word to all your symptoms names. This will be required in the process of creating detailed report.
Once you are done with cloning and creating your own symptoms, you can add those to custom compliance alerts and display those in your compliance badge detail view:
How to Create a Detailed Compliance Report
A summary report for all ESXi hosts or virtual machine can be created with few clicks. You just need to create a new view, select list, select either host system or virtual machine as a subject and Badge->Compliance as data. The result will show the same numbers you see on the compliance badge. 100 means fully compliant, 0 not compliant.
This is possibly good enough for a management summary, but we want to see which rules in particular are violated. In this case (for today), you would need to rely on a symptom naming convention. All symptoms included in your compliance alerts should include (or start with) the same keyword. For example, each compliance symptom could start with _ABC. This means you should edit each one and add a keyword. This might sound like a big effort but it is not when you consider renaming the symptoms during the analysis for their applicability.
Your property symptoms should look like this:
Now we can move forward and define a view under Content -> VIews ->” + ” – it could be called “compliance details”. The presentation is a list and a subject is not a host or virtual machine, the whole trick is that the subject is symptom. The data we want to display is a symptom metric called “Triggered On Object”. This one is sufficient as it also includes the name of the symptom (in our case our compliance rule name) and the object name.
Our view includes all the symptoms triggered on all the objects, so we need to add some filters. This is where the naming convention is helpful. Lets add a filter to include our keyword (symptom name contains…) and also a filter for an object type (Host system or virtual machine). If you want to have all the alerts in one view underneath “5. Visibility”, we should check compliance in order to make the view available under compliance subtab in Analysis view.
Once you have created a view, you can use it in a report and create detailed reports about compliance rules being violated on particular hosts or VMs. In my report, I also added “Criticality level”:
In order to enable vSphere Hardening guide compliance alerts, you just need to assign the policy to a group.
In order to customize compliance, always clone and rename alerts and symptoms. Don’t forget to enable the alerts in your policy. Always think about proper naming convention and add a keyword to each symptom in order to be able to create filters.
If you want to define your own compliance rules: explore alerts, look at available properties for clusters, hosts, virtual machines. You may consider to regroup alerts and create separate ones for clusters and networking compliance too. You may create alerts for critical rules only and use other symptoms just in views and reports. You may also create different compliance rules for different security zones or groups.
Going further you may consider creating an automatic remediation rule based on compliance alerts. Today, this can be accomplished by sending SNMP traps to orchestrator, interpreting those and running appropriate remediation workflows. An easy example would be to create an alert for “SSH shell is enabled on host”. This alert will be filtered in VMware vRealize Orchestrator and a workflow to disable SSH will be triggered. The “how-to” is a good topic for a separate blog post.
The prebuilt compliance alerts may not cover all use cases today but they are enhancing with every update. If you are missing something special, you can improve it. For example, not only property based symptoms could be used for compliance alerts. You can create symptoms based on “message events” – i.e. log message based alerts forwarded by Log Insight or run scripts, check something and create new properties via REST API based on your custom checks. The possibilities are unlimited.
P.S. Thanks to the IT Crowd @ it.nrw and my colleague Matthias Diekert for sharing their expertise