Home > Blogs > vCloud Architecture Toolkit (vCAT) Blog > Monthly Archives: August 2015

Monthly Archives: August 2015

vCloud Director for Service Providers (VCD-SP) and RabbitMQ Security

Let us start with what is RabbitMQ and how does RabbitMQ fit into vCloud Director for Service Providers (VCD-SP)?

RabbitMQ provides robust messaging for applications, in particular vCloud Director for Service Providers (VCD-SP).  Messaging describes the sending and receiving of data (in the form of messages) between systems. Messages are exchanged between programs or applications, similar to the way people communicate by email, but with select-able guarantees on delivery, speed, security and the absence of spam.

A messaging infrastructure (a.k.a. message-oriented middle-ware or enterprise service bus) makes it easier for developers to create complex applications by decoupling individual program components. Rather than communicating directly, the messaging infrastructure facilitates the exchange of data between components.  The components need know nothing about each other’s status, availability or implementation, which allows them to be distributed over heterogeneous platforms and turned off and on as required.

In a vCloud Director for Service Provider deployment, VCD-SP uses the open standard AMQP protocol to publish messages associated with Blocking Tasks or Notifications. AMQP is the wire protocol natively understood by RabbitMQ and many similar messaging systems, and defines the wire format of messages, as well as specifying the operational details of how messages are published and consumed. VCD-SP also uses AMQP to communicate with extension services: http://goo.gl/xZ9gkL – vCloud Director for Service Provider API Extensions are implemented as services that consume the API requests from a RabbitMQ queue. The API request (http request is serialized and published as an AMQP message. The API implementation consumes the messages, performs the business logic and then replies with an AMQP message. In order to publish and consume messages, you need to configure your RabbitMQ exchange and queues.

RabbitMQ1

A RabbitMQ server or _broker_, runs within the vCloud Director for Service Provider network environment, and for example is deployed into the VCD-SP underlying vSphere installation as a virtual appliance, or vApp. Clients (in this case vCloud Director for Service Provider cells belonging to the vCloud Director Service Provider (VCD-SP) infrastructure itself, as well as other applications interested in notifications) connect to the RabbitMQ broker. Such clients then publish messages to, or consume messages from the broker. The RabbitMQ broker is written in the Erlang programming language and runs on the Erlang virtual machine. Notes on Erlang-related security and operational issues are presented later in this vCAT-SP blog.

 

The Base Operating System Hosting the RabbitMQ Broker

Securing the RabbitMQ broker in a vCloud Director for Service Provider environment begins with securing the base operating system of the computer (bare metal or virtualized) on which Rabbit runs.  Rabbit runs on many platforms, including Windows and multiple versions of Linux.  As of this writing, commercial versions of RabbitMQ are sold by VMware as part of the vFabric suite and supported on Windows and RPM-based Linux distributions in the Fedora/RHEL family, as well as in a tar.gz-packaged Generic Linux edition. Please see : http://docs.gopivotal.com/rabbitmq/index.html for purchasing details.

It is generally recommended in a vCloud Director Service Provider (VCD-SP) deployment that a Linux distribution of RabbitMQ be used.  VMware expects to eventually provide a pre-packaged vApp with a Linux installation, the necessary Erlang runtime, and a RabbitMQ broker, although this form factor is not yet officially released. The VMware RabbitMQ virtual appliance undergoes, as part of its build process, a security hardening regime common to VMware-produced virtual appliances.

If a customer is deploying RabbitMQ on a Linux of their own choosing, whether running on bare-metal OS, or as part of a virtual appliance they have created themselves, the VMware’s security team recommends the following guidelines be adopted for securing the base Operating System in question:

The hardening discipline applied to the VMware produced RabbitMQ virtual appliance is based on DISA STIG recommendations above.

 

General networking concerns

Exposing the AMQP traffic occurring between vCloud Director for Service Provider cells and other interested applications in one’s cloud infrastructure outside of the private networks meant for cloud management can expose a VCD-SP provider to security threats. Messages published on an AMQP broker like RabbitMQ are sent for events that happen when something in vCloud Director for Service Provider changes and thus may include sensitive information. Thus, AMQP ports should be blocked at the network firewall protecting the DMZ to which vCloud cells are connected. Code that consumes AMQP messages from the broker must also be connected to same DMZ.  Any such piece of code should be controlled, or at least audited to the point of trustiness, by the vCloud Director Service for Provider.

It is also worth mentioning that AMQP is not exposed to any Cloud tenants and is only used by the Service Provider.

* The Erlang runtime

** What is Erlang?

Erlang is a programming language developed and used by Ericsson in its high-end telephony and data routing products.  The language and its associated virtual machine supports several features leveraged by RabbitMQ, including:

  • support for highly concurrent applications like RabbitMQ
  • built-in support for distributed computing, thus enabling easier clustering of RabbitMQ systems
  • built-in process monitoring and control, for ensuring that a RabbitMQ broker’s subsystems remain running and healthy
  • Mnesia: a performant distributed database
  • high-performance execution.

That RabbitMQ is written in Erlang matters relatively little to a system administrator responsible for deploying, configuring and securing the broker, with only a few small exceptions:

  • Erlang distribution has certain open port constraints.
  • Erlang distribution requires a special “cookie” file to be shared between hosts participating in distributed Erlang communication; this cookie must be kept private.
  • Some RabbitMQ configuration files are represented with Erlang syntax, of which one must be mindful when placing delimiters (like ‘[‘, ‘{‘, and ‘)’) and certain punctuation marks (notably the comma and the period).

 

Running Erlang securely for RabbitMQ

When clustered, RabbitMQ is a distributed Erlang system, consisting of multiple Erlang virtual machines communicating with one another.  Each such running virtual machine is called a *node*.  In such a configuration, the administrator must be aware of two basic Erlang ideas: the Erlang port mapper daemon, and the Erlang node magic cookie.

 

epmd:  The Erlang port mapper daemon

The Erlang port mapper daemon is automatically started at every host where an Erlang node (such as a RabbitMQ broker) is started.  The appearance of a process called ‘epmd’ is not to be viewed with alarm. The Erlang virtual machine itself is called ‘beam’ or ‘beam.smp’ and at least one of these will be seen on a machine running the RabbitMQ server. The Erlang port mapper daemon listens, by default on TCP port 4369. The host system’s firewall should leave this port open as a result.

 

Node magic cookies

Each Erlang node (as defined above) has its own magic cookie, which is an Erlang atom contained in a text file.  When an Erlang node tries to connect to another node (this could be a pair of RabbitMQ brokers connecting in a clustered RabbitMQ implementation, or the rabbitmqctl

utility connecting to a broker to perform some administrative function upon it) the magic cookie values are compared.  If the values of the cookies do not match, the connected node rejects the connection.

A node magic cookie on a system should be readable only by those users under whose id Erlang processes that need to communicate with one another are expected to run.  The Unix permissions of cookie files should typically be 400 (read-only by user).

For most versions of RabbitMQ, cookie creation and installation is handled automatically during installation.  For an RPM-based Linux distribution of RabbitMQ such as that for RHEL/Fedora the cookie will be created and deposited in /var/lib/rabbitmq, called ‘.erlang.cookie’ and given permissions 400 as described above.

* Rabbit server concepts

** Rabbit security:  the OS-facing side

*** OS user accounts

**** RPM-based Linux

In an RPM-based Linux distribution such as the vFabric release of RabbitMQ or the RabbitMQ virtual appliance, the Rabbit server runs as a daemon, started by default at OS boot time.  On such a platform the server is set up to run as system user ‘rabbitmq’.  The Mnesia database and log files must be owned by this user.  More will be said about these files in subsequent sections.

To change whether the server starts at system boot time use:

$ chkconfig rabbitmq-server on

or:

$ chkconfig rabbitmq-server off

An administrator can start or stop the server with:

$ /sbin/service rabbitmq-server stop|start|restart

 

Network ports

Unless configured otherwise, the RabbitMQ broker will listen on the default AMQP port of 5672.  If the management plugin is installed to provide browser-based and HTTP API-based management services, it will listen on port 55672.

*Any firewall configuration should be certain to open these two ports. *

Strictly speaking, you only need port 5672 open for VCD-SP to work. You open port 55672 only if you want to expose the management interface to the outside world.

Also, as noted above, the Erlang port mapper daemon port, TCP 4369, must also be open.

 

Rabbit security: The broker-facing side

When considering the security of the RabbitMQ broker itself it’s helpful to divide one’s thinking into the consideration of the face Rabbit shows to the outside world, in terms of how communication with clients can optionally be authenticated and secured against eavesdropping and the ways in which RabbitMQ’s internal structures like exchanges, queues and the bindings between them that determine message routing are governed.

For the former consideration, a RabbitMQ broker can be configured to communicate with clients using the SSL protocol.  This can provide channel security for client-broker communications and optionally the verification of the identities of communicating parties.

 

TLSv1.2 and RabbitMQ in vCloud Director for Service Providers (VCD-SP)

In the context of vCloud Director Service Provider (VCD-SP), the administrator can configure vCloud Director Service Provider (VCD-SP) to use secure communication based on TLSv1.2 when sending messages to the AMQP broker. TLSv1.2 can also be configured to verify the presented broker’s certificate to authenticate its identity. To enable secured communication, you need to log in to vCloud Director Service Provider (VCD-SP) as a system administrator. In the ‘Administration’ section of the user interface, you must open the ‘Blocking Tasks’ page and select ‘Settings’ tab. In the ‘AMQP Broker Settings’ section there is checkbox labelled ‘Use SSL.’  Turn this option on. You can now select whether to accept all certificates – turn “Accept All Certificates” option on or to verify presented certificates. To configure verification of presented broker’s certificates you need either to create a Java KeyStore in JCEKS format that contains the trusted certificate(s) used to sign the broker’s certificate or you can directly upload the certificate if it is in PEM format.  Under this same ‘AMQP Broker Settings’ section use either the ‘Browse’ button for single SSL Certificate or for SSL Key Store. If you upload keystore you need to provide also SSL Key Store Password. If neither keystore or certificate are uploaded, then default JRE truststore is used.

 

Securing RabbitMQ AMQP communication with SSL

Full documentation on setting up the RabbitMQ broker’s built-in SSL support can be found at: http://www.rabbitmq.com/ssl.html

The documentation at this site covers:

  • the creation of a certificate authority using OpenSSL and the generation of signed certificates for both the RabbitMQ server and its clients.
  • enabling SSL support in RabbitMQ by editing the broker’s config file (for its location on a specific Rabbit platform see http://www.rabbitmq.com/configure.html#configuration-file)

 

Broker virtual hosts and RabbitMQ users

A RabbitMQ server internally defines a set of AMQP users (with passwords), which are stored in its Mnesia database.  *NOTE:* A freshly installed RabbitMQ broker starts life with a user account called ‘guest’ and endowed with the password ‘guest’.  We recommend that this password be changed, or this account deleted when RabbitMQ is first set up.

A RabbitMQ broker’s resources are logically partitioned into multiple “virtual hosts.”  Each virtual host provides a separate namespace for resources such as exchanges and queues.  When clients connect to a broker, they specify the virtual host with which they plan to interact at connection time.  A first level of access control is enforced at this point, with the server checking whether the user has sufficient permissions to access the virtual host.  If not, the connection is rejected.

RabbitMQ offers _configure_, _read_, and _write_ permissions on its resources.  Configure operations create or destroy resources, or modify their behavior.  Write operations inject messages into a resource, and read operations retrieve messages from a resource.

It is important to note that VCD-SP requires to have all these permissions granted for its AMQP user.

Details on RabbitMQ virtual hosts, users, access control and permissions can be found here:

http://www.rabbitmq.com/admin-guide.html

The setting of permissions using the ‘rabbitmqctl’ utility is described in:

http://www.rabbitmq.com/man/rabbitmqctl.1.man.html#Access%20control

One should stick to a policy of least privilege in the granting of permissions on broker resources.

 

The rabbitmqctl utility

The rabbitmqctl (analogous to apachectl or tomcatctl) utility is one of the primary points of contact for administering RabbitMQ.  On Linux systems a man page for rabbitmqctl is typically available specifying its many options.  The contents of this page can also be found online at:

http://www.rabbitmq.com/man/rabbitmqctl.1.man.html

 

The Rabbit broker:  Where things are and how they should be protected

The following are true for a RabbitMQ server installed on an RPM-based Linux distribution such as RHEL/Fedora.  Permissions are given for top level directories where named.  Data files within them may have more liberal permissions set, particularly group/other authorized to read/write.

 

Erlang cookie

Ownership:    rabbitmq/rabbitmq

Permissions:  400

Location: /var/lib/rabbitmq/.erlang.cookie

 

RabbitMQ logs

Ownership:    rabbitmq/rabbitmq

Permissions:  755

Location: /var/log/rabbitmq/

|– rabbit@localhost-sasl.log

|– rabbit@localhost.log

|– startup_err

`– startup_log

 

Mnesia database location, plugins and message stores

Ownership:    rabbitmq/rabbitmq

Location: /var/lib/rabbitmq/mnesia

|– rabbit@localhost

|   |– msg_store_persistent

|   `– msg_store_transient

`– rabbit@localhost-plugins-expand

 

Configuration files location and permissions

RabbitMQ’s main configuration file, as well as environment variables that influences its behavior are documented here: http://www.rabbitmq.com/configure.html

Note that the contents of the rabbitmq.config file are an Erlang term, and it is thus important to be mindful of delimiters and line ending symbols, so as not to produce a syntactically invalid file that will prevent RabbitMQ from starting up.

 

Privileges required to run broker process and rabbitmqctl

Ownership:    root/root

Permissions:  755/usr/sbin/rabbitmqctl

The rabbitmqctl utility must be run as root, and maintain ownership and permissions as above.

The broker can be started, stopped, restarted or status checked by an administrator running:

$ /sbin/service rabbitmq-server stop|start|restart|status

 

Sources/References

VMware vFabric Cloud Application Platform (with purchase links for commercial RabbitMQ):

http://info.vmware.com/content/12834_index?src=PaidSearch_Google_amer-us_ENG_vFabric_vFab_Brand_Search&gclid=CLuOp7e84asCFTAaQgodJzlEQw

NSA operating systems security guidelines: http://www.nsa.gov/ia/guidance/security_configuration_guides/operating_systems.shtml

US DoD Information Assurance Support Environment Security Technical Implementation Guides for operating systems: http://iase.disa.mil/stigs/os/index.html#

RabbitMQ broker configuration: http://www.rabbitmq.com/configure.html

RabbitMQ administration guide: http://www.rabbitmq.com/admin-guide.html

RabbitMQ broker/client SSL configuration guide: http://www.rabbitmq.com/ssl.html

RabbitMQ configuration file reference: http://www.rabbitmq.com/configure.html#configuration-file)

Configuring access control with rabbitmqctl: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html#Access%20control

Rabbitmqctl man page: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html

 

Authored by Michael Haines – Global Cloud Practice

Special thanks to Radoslav Gerganov and Jerry Kuch for their help and support.

VMware vCloud Director Virtual Machine Metric Database

Hybrid Cloud PoweredThis article is a preview of a section from the Hybrid Cloud Powered Automation and Orchestration document that is part of the VMware vCloud® Architecture Toolkit – Service Providers (vCAT-SP) document set. The document focuses on architectural design considerations to obtain the VMware vCloud Powered service badge, which guarantees true hybrid cloud experience for VMware vSphere® customers. The service provider requires validation from VMware that their public cloud fulfills hybridity requirements:

  • Cloud is built on vSphere and VMware vCloud Director®
  • vCloud user API is exposed to cloud tenants
  • Cloud supports Open Virtualization Format (OVF) for bidirectional workload movement

This particular section focuses on a new feature of vCloud Director—virtual machine performance and resource consumption metric collection, which requires deployment of an additional scalable database to persist and make available a large amount of data to cloud consumers.

Virtual Machine Metric Database

As of version 5.6, vCloud Director collects virtual machine performance metrics and provides historical data for up to two weeks.

Table 1. Virtual Machine Performance and Resource Consumption Metrics

Table 1. Virtual Machine Performance and Resource Consumption Metrics

Retrieval of both current and historical metrics is available through the vCloud API. The current metrics are directly retrieved from the VMware vCenter Server™ database with the Performance Manager API. The historical metrics are collected every 5 minutes (with 20 seconds granularity) by a StatsFeeder process running on the cells and are pushed to persistent storage—Cassandra NoSQL database cluster with KairosDB database schema and API. The following figure depicts the recommended VM metric database design. Multiple Cassandra nodes are deployed in the same network. On each node, the KairosDB database is running, which also provides an API endpoint for vCloud cells to store and retrieve data. For high availability load balancing, all KairosDB instances are behind a single virtual IP address which is configured by the cell management tool as the VM metric endpoint.

Figure 1. Virtual Machine Metric Database Design

Figure 1. Virtual Machine Metric Database Design

Design Considerations

  • Currently only KairosDB 0.9.1 and Cassandra 1.2.x/2.0.x are supported.
  • Minimum cluster size is three nodes (must be equal or larger than the replication factor). Use scale out rather than scale up approach because Cassandra performance scales linearly with number of nodes.
  • Estimate I/O requirements based on the expected number of VMs, and correctly size the Cassandra cluster and its storage.

n … expected number of VMs
m … number of metrics per VM (currently 8)
t … retention (days)
r … replication factor

Write I/O per second = n × m × r / 10
Storage = n × m × t × r × 114 kB

For 30,000 VMs, the I/O estimate is 72,000 write IOPS and 3288 GB of storage (worst-case scenario if data retention is 6 weeks and replication factor is 3).

  • Enable Leveled Compaction Strategy (LCS) on the Cassandra cluster to improve read performance.
  • Install JNA (Java Native Access) version 3.2.7 or later on each node because it can improve Cassandra memory usage (no JVM swapping).
  • For heavy read utilization (many tenants collecting performance statistics) and availability, VMware recommends increasing the replication factor to 3.
  • Recommended size of 1 Cassandra node: 8 vCPUs (more CPU improves write performance), 16 GB RAM (more memory improves read performance), and 2 TB storage (each backed by separate LUNs/disks with high IOPS performance).
  • KairosDB does not enforce a data retention policy, so old metric data must be regularly cleared with a script. The following example deletes one month’s worth of data:

#!/bin/sh

if [ "$#" -ne 4 ]; then
    echo "$0  port month year"
    exit
fi

let DAYS=$(( ( $(date -ud 'now' +'%s') - $(date -ud "${4}-${3}-01 00:00:00" +'%s')  )/60/60/24 ))
if [[ $DAYS -lt "42" ]]; then
 echo "Date to delete is in not before 6 weeks"
 exit
fi

METRICS=( `curl -s -k http://$1:$2/api/v1/metricnames -X GET|sed -e 's/[{}]/''/g' | awk -v k="results" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'|tr -d '[":]'|sed 's/results//g'|grep -w "cpu\|mem\|disk\|net\|sys"` ) echo $METRICS for var in "${METRICS[@]}" do for date in `seq 1 30`;   do     STARTDAY=$(($(date -d $3/$date/$4 +%s%N)/1000000))     end=$((date + 1))     date -d $3/$end/$4 > /dev/null 2>&1
    if [ $? -eq 0 ]; then
       ENDDAY=$(($(date -d $3/$end/$4 +%s%N)/1000000))
       echo "Deleting $var from " $3/$date/$4 " to " $3/$end/$4
       echo '
       {
          "metrics": [
          {
            "tags": {},
            "name": "'${var}'"
          }
          ],
          "cache_time": 0,
          "start_absolute": "'${STARTDAY}'",
          "end_absolute": "'${ENDDAY}'"
       }' > /tmp/metricsquery
       curl http://$1:$2/api/v1/datapoints/delete -X POST -d @/tmp/metricsquery
    fi
  done
done

rm -f /tmp/metricsquery > /dev/null 2>&1

Note: The space gains will not be seen until data compaction occurs and the delete marker column (tombstone) expires. This is 10 days by default, but you can change it by editing gc_grace_seconds in the cassandra.yaml configuration file.

  • KairosDB v0.9.1 uses QUORUM consistency level both for reads and writes. Quorum is calculated as rounded down (replication factor + 1) / 2, and for both reads and writes quorum number of replica nodes must be available. Data is assigned to nodes through a hash algorithm and every replica is of equal importance. The following table provides guidance on replication factor and cluster size configurations.
Table 2. Cassandra Configuration Guidance

Table 2. Cassandra Configuration Guidance