Cloud Services Uncategorized

System Provisioning in Cloud Computing: From Theory to Tooling (Part I)

By Steve Jin, VMware engineer


This entry was reposted from DoubleCloud, a blog for architects
and developers on virtualization and cloud computing.

Cloud computing is an evolutionary technology because it doesn’t change
the computing stack at all. It simply distributes the stacks between the
service providers and the users. In some sense, it is not as impactful as
virtualization technology that introduced a new hypervisor layer in the
computing stack and fundamentally changed people’s perception about computing
with virtual machines.

But if you look closely at the latest IaaS clouds, they do leverage
virtualization as a way to effectively and efficiently deploy systems. Inside
one virtual machine, the computing stacks remain the same as before: from OS to
middleware to application.

Keep in mind that the application is the end while the OS and middleware
are the means. Customers care about applications more than the underlying
infrastructure. As long as the infrastructure can support the applications,
whatever the infrastructure might be is fine technically. Then the question
would shift to the economic side: whatever is the most cost effective wins in
infrastructure. That’s why Linux gains more share in the cloud than in
traditional IT shops.

To get to the end, you have to take a mean. In the IaaS cloud, you have
to install the underlying OS and middleware before you can run your
application. For the PaaS cloud, you can get away from that by focusing on
application provisioning.

OS Provisioning

Remember, the software stack inside a virtual machine doesn’t change. It
needs the OS, middleware and the application installed and configured before
the application can work. Because the software stack has one layer stacked on another, the
installation has to come in the order of bottom layer first. You have to
install the OS first, then middleware, and lastly the applications.

Although it happens in a cloud, it’s not a new problem. There are
already some tools available for you to solve the problem in traditional
computing environments:

* Kickstart (
“This tool allows you to automate most of the Red Hat Linux installation
including language selection, network configuration, keyboard selection, boot
loader installation, disk partitioning, mouse selection, X Window System
configuration, etc. A system administrator needs to create a single file
containing the answers to all the questions that would normally be asked during
a typical Red Hat installation.”

* Jumpstart (
“It’s used to automate the installation of Solaris OS.”

* Cobbler (
“Cobbler is a Linux installation server that allows for rapid
setup of network installation environments. It glues together and automates
many associated Linux tasks so you do not have to hop between lots of various
commands and applications when rolling out new systems, and, in some cases,
changing existing ones. With a simple series of commands, network installs can
be configured for PXE, reinstallations, media-based net-installs, and
virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware).
Cobbler uses a helper program called ‘koan’ (which interacts with Cobbler) for
reinstallation and virtualization support.”

* OpenQRM(
“openQRM is a very comprehensive and flexible Open Source
Infrastracture Management Solution. Its fully pluggable architecture focuses on
automatic, rapid- and appliance-based deployment, monitoring,
high-availability, cloud computing and especially on supporting and conforming
multiple virtualization technologies.
openQRM is a
single-management console for the complete IT-Infrastructure and provides a
well defined API which can be used to integrate third-party tools as additional
plugins. This provides companies with a highly scalable system that supports
small companies as well as global businesses who have large server base,
multi-os & high-availability requirements.”

* xCAT ( “xCAT is
DataCenter Control. It allows you to: Provision Operating Systems on physical or virtual machines:
Centos5.X, SLES[10-11], RHEL5.X, Fedora[9-11], AIX, Windows Server 2008,
Cloning or scripted installation methods. Remotely Manage Sytems: Integrated
Lights-out management, remote console, and distributed shell support. Quickly
set up and control Management node services: DNS, HTTP, DHCP, TFTP. xCAT offers
complete and ideal management for HPC clusters, RenderFarms, Grids, WebFarms,
Online Gaming Infrastructure, Clouds, Datacenters, and whatever tomorrow’s
buzzwords may be. It is agile, extendable, and based on years of system
administration best practices and experience.”


After getting the OS ready, you can go ahead with middleware
installation. The available tools allow you to describe the target configuration of
the software and then they take care of the rest. It’s much like a policy-driven

* Puppet (
“Puppet’s declarative language describes your system configuration, allowing you
to easily reproduce any configuration on any number of additional systems.
Additionally, Puppet can help establish and enforce approved system
configurations automatically correcting systems that drift from their baseline.
Puppet provides an audit trail of all your systems, which can easily be kept in
version control for compliance purposes. Organizations are increasingly taking
advantage of Puppet’s support of a variety of operating systems. Whether you
are supporting Linux (Red Hat, CentOS, Fedora, Debian, Ubuntu, SuSE), or Unix
OS’es (Solaris, BSD, OS X), Puppet can fulfill your requirements. Although
Puppet evolved primarily to support Unix-like OS’es, Windows support is planned
in the near future.”

* Chef (
“Chef is a systems integration framework, built to bring the
benefits of configuration management to your entire infrastructure. With Chef,
you can: Manage your servers by writing code, not by running commands. (via
Integrate tightly with your applications, databases, LDAP directories, and
more. (via
Easily configure applications that require knowledge about your entire
infrastructure (“What systems are running my application?” “What
is the current primary database server?”)”

* SmartFrog (
SmartFrog is a
powerful and flexible Java-based software framework for configuring, deploying
and managing distributed software systems. SmartFrog helps you to encapsulate
and manage systems so they are easy to configure and reconfigure, and so that
that they can be automatically installed, started and shut down. It provides
orchestration capabilities so that subsystems can be started (and stopped) in
the right order. It also helps you to detect and recover from failures.

* CFEngine (
“Cfengine ensures that you have the proper packages installed, that
configuration files are correct and consistent, that file protections are
correct, and that processes are running (or not) in accordance with policy.
Cfengine closes security holes, hardens your systems, and makes sure that
critical daemons stay running. It monitors performance and reacts to what it
monitors. You tell Cfengine what promises you want it to keep, and the agent
does the work. Cfengine runs natively on all common platforms, including Linux, Unix,
Macintosh and Windows. It also has support for virtualization platforms.
Cfengine is supported by a community
of expert and novice users
, and a commercial enterprise of qualified Mission
Specialists. Cfengine can play a major role in solving almost any system administration
issue, with hands-free automation (see our Solutions guide
and standard library resources Policy Starter
, and we are constantly working to made automation simpler, without

* BFFG2 (
helps system administrators produce a consistent, reproducible, and verifiable
description of their environment, and offers visualization and reporting tools
to aid in day-to-day administrative tasks. It is the fifth generation of
configuration management tools developed in the
Mathematics and Computer Science Division
Argonne National Laboratory.
It is based on an operational model in which the specification can be used to
validate and optionally change the state of clients, but in a feature unique to
bcfg2 the client’s response to the specification
can also be used to assess the completeness of the specification. Using this
feature, bcfg2 provides an objective measure of how good a job an administrator
has done in specifying the configuration of client systems. Bcfg2 is therefore
built to help administrators construct an accurate, comprehensive

Application Provisioning

With the right system configuration in place, it’s time to install the
applications. So why not use the same tools we used for the OS and middleware? Do
we need yet another set of tools?

It depends. You can use the same set of tools for middleware to install some
applications. The middleware appears like an application to the OS as well. The
difference is whether your application is stable enough and whether you need to
customize per node. The tools like Puppet can be good for stable applications
that can be deployed the same way across all nodes. If your application is
still a work in progress and you need flexibility to tweak it, you need more
specialized application provisioning tools.

The big technical difference between application and middleware provisioning
tools is that application tools push the application to the nodes and remotely
change anything as needed. The process is procedural.

The middleware provisioning tools normally have agents on the nodes to
pull the software based on the prescribed configuration files. The process is

Beyond the “push” and “pull” difference, the application provisioning
tools can also manage the lifecycles of applications (sometimes called
services) distributed on different nodes with a single line of command or code.
Given the nature of remote command dispatching framework, the application provisioning
tool can do almost anything. If there has to be a limitation, it’s your
imagination. So if you develop applications by yourself, you most likely need
application provisioning tools.

Let’s see what tools are there:

* Capistrano ( “Capistrano is an open source tool for running scripts on multiple
servers; its main use is deploying web applications. It automates the process
of making a new version of an application available on one or more web servers,
including supporting tasks such as changing databases. Capistrano is written in
the Ruby language and is distributed using the RubyGems distribution channel.
It is an outgrowth of the Ruby on Rails web application framework, but has also
been used to deploy web applications written using other frameworks, including
ones written in PHP. Capistrano is implemented primarily for use on the bash
command line. Users of the Ruby on Rails framework may choose from many Capistrano
recipes; e.g. to deploy current changes to the web application or roll back to
the previous deployment state. (”

* ControlTier. “ControlTier is a community driven, cross-platform
software system used to coordinate application service management activities
across multiple nodes and application tiers…The Command Dispatcher is a core
function of the ControlTier software that provides the mechanism to send commands
over the network seamlessly to the correct Nodes. This facility is used
whenever you run a command or script, via the command-line (ctl or ctl-exec) or
via Jobcenter.”

* Fabric ( “Fabric is a Python
library and command-line tool for streamlining the use of SSH for application
deployment or systems administration tasks. It provides a basic suite of operations for executing local or
remote shell commands (normally or via
sudo) and
uploading/downloading files, as well as auxiliary functionality such as
prompting the running user for input, or aborting execution. Typical use
involves creating a Python module containing one or more functions, then
executing them via the
fab command-line tool. Func  ( Func
allows for running commands on remote systems in a secure way, like SSH, but
offers several improvements. Func allows you to manage an arbitrary group of
machines all at once. Func automatically distributes certificates to all
“secondary” machines. There’s almost nothing to configure. Func comes
with a command line for sending remote commands and gathering data. There are
lots of modules already provided for common tasks. Anyone can write their own modules using the simple Python module API.
Everything that can be done with the command line can be done with the Python
client API. The hack potential is unlimited. You’ll never have to use
“expect” or other ugly hacks to automate your workflow. It’s really
simple under the covers. Func works over XMLRPC and SSL. Since func uses
certmaster, any program can use func certificates, latch on to them, and take
advantage of secure primary-to-secondary communication. There are no databases or crazy
stuff to install and configure. Again, certificate distribution is automatic

VMware engineer Steve Jin is author of VMware
VI & vSphere SDK (Prentice Hall)
, founder of open source VI Java API, and is the chief blogger at



Leave a Reply

Your email address will not be published.