(This post is a collaboration between Carlos Phoenix, Senior Compliance and Cyber Risk Solutions Strategist, and Bob Plankers, Technical Marketing Architect, and is first in a series of articles discussing the relationship between compliance, security, and complexity.)
As we work to add security to our systems we often use different security protocols. A protocol is simply defined as an official procedure, and things like network security protocols define the process for encrypting & decrypting data as it travels on networks. A common network security protocol is Transport Layer Security, or TLS. The United States’ National Security Agency (NSA) recently published a new protocol for Transport Layer Security Inspection (TLSI or TLS Inspection), which is meant to help reduce cyber risk. However, despite their intentions, the complexity this protocol adds, and the requirements to maintain the inspection infrastructure, may not be a good fit for every organization.
Security methods such as TLS Inspection are used at the highest echelons of secure environments, but these types of security measures do not equate to good security for organizations that do not have the same budgets, staffing, and redundancies to support these sophisticated security methods. In most organizations, cyber risk is a threat to be managed. Just like risk in the physical world, reducing cyber risk to zero is cost prohibitive, time intensive, reduces efficiency, and is often impossible. TLS Inspection may be a great protocol for the NSA, but is it right for you? What about other security controls like it, such as air gaps?
Air Gap: intentional isolation and disconnection of computing environments from larger networks, like the Internet. This may include separate cabling to keep network transportation methods isolated, or a lack of cables to connect servers/laptops beyond the immediate physical site.
Given the advanced security features present in products like VMware vSphere & VMware Cloud Foundation it isn’t a surprise that VMware has many security-minded customers. We talk to many customers considering these and other techniques to improve security. These conversations lead us to the belief that, prior to implementing these big and potentially intrusive techniques and tools, there needs to be a conversation focused on how security design, security risk, and overall organizational cyber risk mitigation work together to form an organization’s cybersecurity ecosystem.
Just Because You Can…
As we discussed the pros & cons of TLS Inspection, the use of air gaps also came to mind. Security architectures that use air gaps seem to be gaining popularity. While we are very interested in supporting those, we are also very interested in providing exemplary design architectures & ecosystems that reduce cyber risk without air gaps, using other security controls and modern methods of protection. We often worry that, for all but a few customers, adding military-grade protections such as TLS Inspection and air gaps leaves them in a worse state.
This concern arises from the difficulty in operating an environment with complex security controls. A complex environment requires more management and investment of time, and the increased staffing needs are rarely accounted for. The concepts of TLS Inspection and air gaps may appear simple, but in practice they become very complicated to implement and maintain. Security concepts that feel intuitive are often more challenging in practice than they appear initially. “Simple” concepts like these often mean that discussion quickly progresses to “how to do it” and skips the crucial “should we do it?” phase.
We always recommend evaluating security concepts with an emphasis on their origin and overall effect on risk mitigation. Many security concepts arise from military applications, but if an organization isn’t a military one, and does not protect classified data, is it the right fit? Even more importantly, in mitigating one risk you may create new risks. In balancing risk and threats, the full cost of ownership for the security control needs to be considered.
TLS Inspection started this conversation, both for us at VMware and in this post. However, for the rest of our discussion here we will use air gaps as our example. They represent a control that, on the surface, appears very simple, but instead adds a lot of complexity and may add to your cyber risk instead of lowering it.
Regulatory Misconceptions about Air Gaps
We often speak to customers that have decided on a security control, such as an air gap, based on a compliance standard or security regulation they read. “Our Infosec team told us that NIST 800-53 requires us to air gap,” or “Our bank auditor told us that FFIEC needs an air gap for all cloud environments” are very common statements. Yet, when we read the standards, we rarely reach the same conclusion. Compliance frameworks that list an air gap as an example are providing a list of choices, usually like an a la carte menu. In this form it’s easy to see how someone could be convinced that an air gap is an easy solution. It is not. Regulations don’t tell you about all the other difficulties you’ll have.
For example:
- Air gaps make patching difficult. Compliance frameworks require patching in a timely fashion, as patching is the only true way to remove a vulnerability from an environment. Most products, including vSphere, strive to make staying up to date easy. vCenter Server allows customers to simply check and download updates, just the same as you might do with Microsoft Windows or a Linux distribution. However, if you cannot reach the Internet you will need to download and copy the patches manually. What is the effort needed to do that? If patching must be performed manually then how much more time will be consumed? Think beyond vSphere to all the other components in the environment, too.
- Air gaps make monitoring difficult. Many compliance frameworks require a level of monitoring to understand what has changed and to understand if alerts are being resolved in a timely manner. There are tremendously powerful monitoring and alerting tools available to vSphere Admins. Tools like VMware vRealize Operations Manager make it easy to stay on top of break/fix, performance, and security issues inside a virtual infrastructure. Similarly, VMware vRealize Log Insight’s ability to filter and alert on log information is a very powerful security tool. Once noted by a monitoring tool, third-party services like PagerDuty make it very easy to send those alerts to an on-call vSphere Admin. How does that work when you can’t reach those services? Are you more secure or less secure if you can’t get timely alarms? Do solutions to these problems create costs that prevent other security measures?
- Air gaps make disaster recovery and business continuity difficult. Many compliance frameworks require the ability to fail over to a secondary site in a timely manner. Air gaps eliminate many options for that second site, such as the cloud or commodity co-location facilities. Assuming you don’t want to manually transport backup media between sites, how much redundancy is possible in an isolated site? That is expensive, complicated, and requires specialized networking hardware and staff training. Often a “dedicated” connection is co-mingled with telco traffic. Is that more secure or less than an IPsec or VPN connection between two sites? Does the expense here mean that other security options in the environment are no longer an option?
- Air gaps limit the amount of reuse inside an organization. For example, you won’t be able to participate in a corporate Active Directory (AD) implementation, so you will need your own. That, in turn, will require duplicate organizational processes to manage and monitor that AD implementation. Where will you get your NTP sources for time synchronization? Accurate and consistent time is a requirement for IT infrastructures. DNS is another requirement too. Will you need your own certificate authority? How would you use a multifactor authentication solution like Duo, Ping, or Okta? Is it riskier or less risky to do this all on your own versus using your established corporate systems and processes? Do those corporate processes help with the fundamental security concept of separation of duties?
- Air gaps create a staffing drag because automation and centralized administration won’t work. An admin will have to create a separate process to patch, monitor, and access the air gapped environment. This creates an additional design that needs to be supported. How do you train a new admin to adopt an ad-hoc process without injecting a process that breaks the air gap? How much additional staff have you budgeted to support the air gap environment? Are all your admins trained and provided logical access to reach the air gap?
These are just a few examples, but there are many more, and many of them relate directly to staff time. Staff time is often the single biggest item in an IT budget. Consuming a large amount of staff time by requiring someone to download and copy updates into an air-gapped environment may not be the most effective choice. As such, it is important to not take the bait offered by an intuitive, security control like air gaps if you don’t need to. What might look easier makes nearly every other regulatory requirement more difficult.
What Do We Do?
Whether an air gap, TLS inspection, or some other security control, what can we do as vSphere Admins to make the case against complexity and for simplicity? Here are some ideas and suggestions a vSphere Admin can pursue:
- Ask the Infosec team or the auditor to point to the specific regulatory or compliance requirements. Compliance frameworks are updated periodically so read the source document for yourself, directly from the agency’s website, to avoid interpretations, errors, and old information. Closely partner with your Infosec team to determine the right control framework based on an accurate threat assessment.
- Ask about & research compensating controls. Compliance frameworks tell you what you need to achieve but rarely tell you exactly how to get it done. That means that if you can’t meet a requirement directly, such as with a specific setting, you can add additional controls to compensate for that problem. Rarely is there only one solution to the problem.
- Remember that information security is made up of three distinct components: confidentiality, integrity, and availability. Confidentiality is keeping your data to yourself and out of the hands of bad actors, and is usually what people think of as “security.” Integrity means that your data is authentic and intact. Availability means your data and systems are there when you need them. These three concepts are called the “CIA Triad” and form the core of information security & risk mitigation. Sacrificing one of the three for the sake of another might not be a good idea. Similarly, relating your thoughts to an auditor in terms of these three concepts is often an effective way to help them understand what you’re trying to say.
- Understand that isolation isn’t an all-or-nothing thing. There are levels to it. Find out what levels of isolation are acceptable. Can you use VLAN tagging to isolate network traffic or do you need to dedicate specific NICs to specific types of traffic? Can you use VLANs with unrouted IP ranges for certain types of traffic? Do you need to be completely disconnected from the internet or can you use a firewall that limits egress traffic, but enables automatic patch downloads you specify? Would application whitelisting satisfy a threat profile, instead of using air gaps?
- Use resources from VMware that are designed to help compliance implementation efforts. VMware publishes compliance kits that map controls and compliance requirements (Available today: NIST 800-53and PCI DSS 3.2.1). VMware also publishes Product Applicability Guides (PAG) whitepapers that include the point of view of an independent, external auditor (Available today: NIST 800-53, PCI DSS 3.2.1, and NIST 800-171). VMware in partnership with the NIST Cybersecurity Center of Excellence (NCCoE) published a Special Publication 1800 series covering the architecture, use cases, and detailed guidance on securing a software-defined data center (SDDC). These are all amazing resources that help guide implementations as well as educate auditors and infosec staff on product capabilities, in language that is familiar to them.
Remember that security and compliance are related but are two different things. Being secure, first and foremost, leads to good compliance, too. However, being compliant doesn’t mean you are secure, or that you are securing and managing infrastructure in the most effective manner.
There is no magic solution to compliance and security. The best solutions are ones where you and your team, your auditors, and your management come together to openly discuss the options, keeping in mind the broader picture of business risk, security vs. compliance, staffing levels, and both opportunity costs and real costs and their tradeoffs. The best solutions also tend to be the simpler ones because they’re easier to manage, easier to upgrade, and easier to operate in a disaster recovery situation.
Please subscribe to this blog, follow us at @VMwarevSphere on Twitter, or follow us on Facebook for weekly posts on all things vSphere.
(Thank you to Rick McElroy for contributions to this post)