Home > Blogs > VMware vSphere Blog > Category Archives: ESXi

Category Archives: ESXi

Virtualizing Big Data at VMware IT – Starting Out at Small Scale

The Hadoop-based system running on vSphere that is described here was architected by Rajit Saha, (who provided the material for this blog) and a team from VMware’s IT department.

This article describes the technical infrastructure for a VMware internal IT project that was built and deployed in 2015 for analyzing VMware’s own business data.. Details of the business applications used in the system are not within the scope of this article. The virtualized Hadoop environment and modern analytics project was implemented entirely on the vSphere 6 platform.

The key lesson that we learned from this implementation is that you can start at a small scale with virtualizing big data/Hadoop and then scale the system up over time. You don’t need to wait for a large amount of hardware to become available to get started.

Continue reading

vCPU to pCPU Ratios – Are they still relevant?

One question I’m commonly asked (aka weekly if not daily) is what are the perfect pCPU to vCPU ratios that I should plan for, and operate to, for maximum performance.  I wanted to document my perspective for easy future reference.

The answer?

There is no common ratio and in fact, this line of thinking will cause you operational pain. Let me tell you why.

Continue reading

Virtual Volumes for Database Backup and Recovery

In the first part of this series we provided a high level view of the benefits of using Virtual Volumes enabled storage for database operations. In this post, we will examine in more detail how Virtual Volumes can improve the backup and recovery capabilities for business critical databases, specifically Oracle.

The backups for Oracle can be Database consistent or Crash consistent. In this part we will look at Database consistent backup and recovery.

The Setup:

The solution requires VVol enabled storage. We leveraged SANBLaze VirtuaLun as the backend storage for the backup and recovery exercise. We used the VirtuaLun 7.3 emulator from SANBlaze. This emulator is VVol enabled and is one of the first VVol certified storage solutions available. Continue reading

Updating to VMware Tools 10 – Must Read

In September we announced that VMware Tools 10.0.0 Released and that VMware is now shipping VMware tools outside of the vSphere releases. Since then, we have received a lot of feedback from the community, customers, and internal folks alike. I would like to let everyone know that we have listened and we continue on our path to make VMware Tools lifecycle (and ESXi lifecycle for that matter) easier and less painful than how it may appear today.


Continue reading

Architecting Virtual SAP HANA Using VMware Virtual Volumes And Hitachi Storage

VMWorld Recap: SAP HANA and VMware Virtual Volumes

This is a follow up to my earlier VMWorld blog; “Virtualizing SAP HANA Databases Greater Than-1TB On vSphere-5-5”, where I discussed SAP Multi-Temperature Data Management strategies and techniques which can significantly reduce the size and cost associated with SAP HANA’s in-memory footprint. This blog will focus on Software-Defined Storage and the need for VMware Virtual volumes when deploying Mission Critical Applications/Databases like SAP HANA as discussed in my VMWorld session.

Multi-Temperature Data Management Is By Definition Software-Defined Storage

SAP and VMware customers who plan on leveraging multi-temperature strategies, where data is classified by frequency of access as either hot, warm or cold depending on data usage is the essence of Software-Defined Storage. This can also be equated to EMC’s Information Lifecycle Management which examines the value of data to the business over time. To bring the concept of the Software-Defined Data Center and more precisely Software-Defined Storage to reality, see Table 1. This table depicts the various storage options for SAP HANA so customers can create an architecture that aligns with the business and its applications demands.

Table 1: Multi-Temperature Storage Options with SAP HANA


Planning Your Journey To Software-Defined Storage

As we get into the various storage options for SAP HANA, VMware has made it very easy to create and deploy software defined storage in the form of Virtual Volumes. However I want to stress the actual definitions of how the storage should be abstracted is a collaborative task, at a minimum you must involve the storage team, VI-Admins, application owners, and dba’s in order to create an optimized virtual architecture; this should not be a siloed task.

In my previous post I discussed the storage requirements for SAP HANA In-Memory, Dynamic Tiering, Near-Line Storage, and the Archiving Components; one last option I did not cover in Table 1 is Data Aging which is specific to SAP Business Suite. Under normal operations SAP HANA does not preload data into memory, data is loaded upon first access, so the first time you access data its always off disk.

With Data Aging you can essentially mark data so its never loaded into memory and will always reside on disk. This is not available on all modules for Business Suite, so please check with SAP for availability and roadmap with respect to Data Aging.

Essentially this is another SAP HANA feature which enables customers to reduce and manage their memory footprint more efficiently and effectively. The use of Data Aging can change the design requirements of your Software-Defined Storage, if Data Aging becomes more prevalent in your SAP Landscape, VMware Virtual Volumes can be used to address the changing storage requirements of the application by seamlessly migrating data between different classes of software-defined storage or VMDKs.

VMware Virtual Volumes Transform Storage By Aligning With SAP HANA’s Requirements

Now lets get into Virtual Volumes and the problems they solve, with Virtual Volumes the fundamental model is centered around provisioning storage based on the application needs rather than the underlying infrastructure. When deploying SAP HANA using the Tailored Data Center Integration model, the storage KPIs can be quite complex, so how do customers translate latency, throughput for reads – writes – and updates, at various block sizes to the storage layer?

Plus how does a customer address the storage requirements for SAP HANA’s entire data life cycle, whether you are planning on using Dynamic Tiering, with or without Near-Line-Storage and what is the archiving strategy storage requirements as well. Also some of the storage requirements do tie back to the compute layer, as an example with Dynamic Tiering if you plan on using Row Level Versioning there is a compute to memory relationship for storage that comes into play when sizing

Addressing and achieving these design goals using an infrastructure centric model can be quite difficult because you are tied to physical LUNs and trust me, with mission critical databases, you will always have database administrators fighting over LUNs with the lowest numbers because of the concerns around radial density. This leads to tremendous waste when provisioning storage using an infrastructure centric model.

VMware Virtual Volumes significantly reduces the storage design complexity by using an Application Centric model because you are not dealing with storage at the LUN level, instead vSphere admins use policies to express the application requirements to the storage array, then the storage array maps storage containers to the application requirements.

What are VMware Virtual Volumes?

At a high level I’ll go over the architecture and components of Virtual Volumes, this blog is not intended to be a deep dive into Virtual Volumes, instead my goal is to convey that mission critical uses cases for VVOLS and software-defined storage are real. For an excellent white paper on Virtual Volumes see; “VMware vSphere Virtual Volumes Getting Started Guide”.

As shown in Figure 1., Virtual Volumes are a new type of virtual machine object which are created and stored natively on the storage array. The Vendor Provider also known as the VASA Provider, which are the vSphere Storage APIs for Storage Awareness (VASA) that provide the storage awareness services and mediates out of the box communications between vCenterServer and EXi Hosts on one side and the storage system on the other side.

The storage containers are pools of raw storage that a storage system can provide to virtual volumes and unlike LUNS and NFS, they do not require pre-configured volumes on the storage side. Also with virtual volumes you still have the functionality you would expect when using native VMDKs

Virtual Datastores represents a storage container in a vCenter Server instance, so it’s a 1:1 mapping to the storage systems storage container. The ESXi Hosts have no direct access to the virtual volumes on the storage side, so they use a logical I/O proxy called a protocol endpoint and as you would expect VVOLs are compatible with industry standard protocols, iSCSI, NFS, FC, and FCoE

The Published Storage Capabilities will vary by storage vendor depending on which capabilities have been exposed and implemented. In this blog we will be looking at the exposed capabilities of Hitachi Data Systems like latency, throughput, Raid Level, Drive Type/Speed, IOPS, and Snapshot frequency to mention a few.

Figure 1: vSphere Virtual Volumes Architecture and Components


VMware HDS: Creating Storage Containers, Virtual Volumes, and Profiles for Virtual SAP HANA

Now Virtual Volumes are an Industry-wide Initiative, essentially a who’s who of the storage industry are participating in this initiative, however this next section will be representative of the work done with Hitachi Data Systems

And again the guidance here is collaboration when architecting software-defined storage for SAP HANA landscapes and for that matter any mission critical application or database. Because the beauty of software defined storage is once created and architecture correctly you can then provision your virtual machines in an automated and consistent manner.

So in the spirit of collaboration, I got together with Hitachi’s SAP alliance team, their storage team, and database architects and we came up with these profiles, policies, and containers to use when deploying SAP HANA landscapes.

We had several goals when designing this architecture; one was to use virtual volumes to address the entire data life cycle of SAP HANA, the in-memory component, Dynamic Tiering, Near-Line storage, and archiving or any supported combination of the above when creating a SAP HANA landscape. And secondly we wanted to enable rapidly provisioning of SAP HANA landscapes, so we created profiles, policies, and containers which could be used to deploy SAP HANA databases whose in-memory component could range from 512GB to 1TB in size.

I’ll review some of the capabilities HDS exposed which were used for this architecture:

  • Interestingly enough we were able to meet the SAP HANA in-memory KPIs using Hitachi Tier 2 storage which consisted of 10K SAS drives for both log and data files, as well as for the Operating System and the SAP HANA shared file system. This also simplified the design. We then used high density SAS drives for the backup areas
  • We enabled automatic storage managed snapshots for HANA data, log and the OS; and set the Snapshot frequency based on the classifications of Critical, Important, or Best Effort.
  • So snapshots for the data and log were classified as Critical while the OS was classified as Important and the backup area we didn’t snapshot at all
  • We also tagged this storage as certified, capturing the model and serial number, since the SAP HANA in-memory component requires certified storage. We wanted to make sure that when creating HANA VM’s you’re always pulling from certified storage containers.
  • The Dynamic Tiering and NLS storage had similar requirements so could be provisioned from the same containers and since these are disk based columnar databases we selected Tier 1 storage SSDs for the data files based on the random read/write patterns
  • And stuck with SAS drives for the log files since sequential workload don’t benefit much from SSDs. Again because of the disk based access we selected Tier 2 to satisfy the IOPS and Latency requirements.
  • Then finally for the archiving containers we used the lowest cost & highest density storage, pretty much just a file system.

Now there’s just too much information to cover in this effort with HDS but for those of you interested, VMware and Hitachi we will be publishing a Co-Logo White Paper which will be a much deeper dive into how we architected these landscapes so customers can do this almost out of the box.

Deploying VMware Software-Defined Storage With vSphere and Hitachi Command Suite

Example: SAP HANA Dynamic Tiering and Near-Line Storage Tiers. These next couple of screen captures will show how simple virtual volumes are to deploy once architected correctly

Figure 2: Storage Container Creation: SAP HANA DT and NLS Tier


Figure 3: Create Virtual Machine Storage Policies SAP HANA DT/NLS Data/Log File


Figure 4: Create New SAP HANA DT VM Using VVOLS Policies With Hitachi Storage


Addressing Mission Critical Use Cases with VMware Software-Defined Storage

SAP HANA and Multi-Temperature Data Management is the poster child for mission critical software-defined storage use cases. VMware Virtual Volumes solves the complexities and simplifies storage provisioning by using an application centric model rather than an infrastructure centric model.

The SAP HANA in-memory component is not yet certified for production use on vSphere 6.0, however Virtual Volumes can be used for SAP HANA Dynamic Teiring, Near-Line Storage, and Archiving. So my advice to our customers is to start architecting now, get together with your storage admins, VI Admins, application owners, and database administrators to create containers, policies, and profiles correctly so when vSphere 6.0 is certified you are ready to “Run SAP HANA Simple”.



SIOC: I/O Distribution with Reservations & Limits – Part 2

Part 1 of this series explains the new reservation capabilities of the ESXi storage scheduler in vSphere 6.0 called mClock.  That article explains how to calculate the number of entitled IOPS during times of contention.  This article will expand on that topic with a couple new scenarios.  The previous article assumed that all the VMs were evenly consuming the storage resources at the same time.  In the real-world though, some VMs will be consuming resources while others will be idle.  This should help explain how the IOPS are distributed when there are idle VMs in the environment.

Scenario 3
In this scenario the third VM is idle, while the other 3 VMs are consuming storage IOPS.  For the sake of this example, it will be assumed that VM3 will be consuming only 10 IOPS.

8000 IOPS

Unlike memory reservations, the storage scheduler will allow the unused resources to be consumed by other VMs.

The first step is to determine what percentage of the resources each host will receive. In this example there are a total of 5000 shares across all hosts.  Then you would calculate how many shares are assigned to each host to determine the percentage each host will receive.  In this example, Host 1 has 3500/5000 (70%) of the shares, and host 2 has 1500/5000 (30%) of the shares.  This will result with the following entitled IOPS for each host.

Host1: 70% * 8000 IOPS = 5600 IOPS
Host2: 30% * 8000 IOPS = 2400 IOPS

Once the I/O distribution for the hosts are calculated, the VMs will have their entitled resources calculated using the share distribution within the host.

VM1: (1000/3500) * 5600 = 1600 IOPS
VM2 (2500/3500) * 5600 = 4000 IOPS
VM3: (500/1500) * 2400 = 800 IOPS (Only using 10 IOPS)
VM4: (1000/1500) * 2400 = 1600 IOPS

Since VM3 is only using 10 IOPS, the 790 unused IOPS would be distributed to the remaining VMs on the host.  In this case, VM4 would be entitled to 2390 IOPS.  However, VM4 has a limit of 2000 IOPS, which means that there will be 390 IOPS that can still be distributed.  Those 390 IOPS will then be distributed across the VMs on Host1.

In the end, this is how the IOPS allocation would be distributed:

VM1: 1600 + ((1000/3500) * 390) = 1711 IOPS
VM2: 4000 + ((2500/3500) * 390) = 4279 IOPS
VM3: 10 IOPS
VM4: 2000 IOPS (Due to limit)

Scenario 4
Now let’s take the same environment, but calculate the effective IOPS if VM1 was the idle VM. Again, for the sake of this example, the idle VM will be consuming 10 IOPS.

8000 IOPS

The first thing to do is calculate the percentage of the resources each host will receive. In this example there are total 5000 shares across all hosts. Since the environment has not changed, the entitled IOPS per host is unchanged from the previous example.

Host1: 70% * 8000 IOPS = 5600 IOPS
Host2: 30% * 8000 IOPS = 2400 IOPS

Once the I/O distribution for the hosts are calculated, the VMs will have their entitled resources calculated using the share distribution within the host.

VM1: (1000/3500) * 5600 = 1600 IOPS (Only using 10 IOPS)
VM2 (2500/3500) * 5600 = 4000 IOPS
VM3: (500/1500) * 2400 = 800 IOPS
VM4: (1000/1500) * 2400 = 1600 IOPS

Since VM1 is only using 10 IOPS, the 1590 unused IOPS would be distributed to the remaining VMs on the host.  In this case, VM2 would be entitled to 5590 IOPS.  However, VM2 has a limit of 5000 IOPS, which means that there will be 590 IOPS that can still be distributed.  Those 590 IOPS will then be distributed across the VMs on Host2.

In the end, this is how the IOPS allocation would be distributed:

VM1: 10 IOPS
VM2: 5000 IOPS (Due to limit)
VM3: 800 + ((500/1500) * 590) = 997 IOPS
VM4: 1600 + ((1000/1500) * 590) = 1993 IOPS

Hopefully this helps explain how entitled IOPS are calculated and distributed using the mClock storage scheduler in vSphere 6.0.  The important thing to take away is that unused IOPS are not held and wasted, and they distributed across the environment automatically providing the most efficient use of your resources.

VMware Tools Lifecycle: Why Tools Can Drive You Crazy (and How to Avoid it!)

There has been a lot of buzz around vSphere Lifecycle since VMworld. My last few blog posts on VMware Tools have had a tremendous amount of traffic, so I decided to continue with the theme and give you all what it appears you want more of. So in this post, LET’S TALK TOOLS!

Continue reading

Big Data on vSphere with HBase

This article describes a set of performance tests that were conducted on HBase, a popular data management tool that is frequently used with Hadoop, running on VMware vSphere 6 and provisioned by the vSphere Big Data Extensions tool. The work described here was done by Xinhui Li, who is a staff engineer in the Big Data team in VMware’s R&D Labs in Beijing. Xinhui’s biography and background details are given at the end of the article.

What is HBase?

HBase is an Apache project that is designed to handle very large amounts of data on the Hadoop platform. HBase is often described as providing the functionality of a NoSQL database running on top of Hadoop. It combines the scalability of Hadoop, through its use of the Hadoop Distributed File System (HDFS) to store the data, with real-time data access to the data. HBase can handle billions of rows of data and very large numbers of columns. Along with Hadoop, HBase runs on clusters of commodity hardware that form a distributed system. The HBase architecture is made up of RegionServers that run on the worker nodes while the HBase Master Server controls them.

Continue reading

Virtualizing SAP HANA Databases Greater than 1TB on vSphere 5.5

VMWorld 2015 Session Recap

I’m almost fully recovered from VMWorld, which was probably one of my busiest and most enjoyable VMWorld’s I’ve had in my 6 plus years at VMware because of the interaction with attendees, customers, and partners.  I’ll be doing a series of Post-VMWorld Blogs focused on my SAP HANA Software-Defined Data Centers sessions but my first blog will cover the misconceptions associated with sizing SAP HANA databases on vSphere. There are many good reasons to upgrade to vSphere 6.0, going beyond the 1TB monster virtual machine limit in vSphere 5.5 when deploying SAP HANA databases is not necessarily one of them.

SAP HANA is no longer just an in-memory database, it is now a data management platform.  It is NOT confined by the size of available memory since the SAP HANA warm data can be stored on disk in a columnar format and accessed transparently by applications.

What this means is the 1TB monster virtual machine maximum in vSphere 5.5 is an artificial barrier. SAP HANA multi-terabyte size databases can be easily virtualized with vSphere 5.5 using Dynamic Tiering, Near-Line Storage, and other memory management techniques SAP has introduced to the SAP HANA Platform to optimize and reduces HANA’s in-memory footprint.

SAP HANA Dynamic Tiering (DT)

SAP HANA Dynamic Tiering was introduced last year in Support Pack Stack (SPS) 09 for use with BW, Dynamic Tiering allows customers to seamlessly manager their SAP HANA disk based “Warm Data” on an Extended Storage Host, essentially placing data which does not need to be in-memory on disk. The guidance SAP gives when using the SAP HANA Dynamic Tiering option for SPS 09 is up to 20% of in-memory data can reside on the Extended Storage (ES) Host, for SPS 10 up to 40% can reside on the ES Host, and in the future up to 70% of the SAP HANA data can reside on the ES Host. So in the future the majority of SAP HANA data which was once in-memory can reside on-disk.

Near-Line Storage (NLS)

In addition to the reduction of the SAP HANA in-memory footprint DT affords customers, Near-Line Storage should be considered as well. With NLS, data is moved outside of the SAP HANA database proper to disk and classified as “Cold”, due to its infrequent accessed and can only be accessed read only. SAP provides examples showing NLS can reduce the HANA database in-memory requirements by several Terabytes (link below).

It is also important to note that both the DT Extended Storage Host and NLS solutions do not require certified servers or storage, so not only has SAP given customers the ability to run SAP HANA in a reduced memory footprint, customers can run on standard x86 hardware as well.

There is a white paper authored by Priti Mishra, Staff Engineer, Performance Engineering VMware, which is an excellent read for anyone considering DT or NLS options. “Distributed Query Processing in SAP IQ on VMware vSphere and Virtual SAN”

Importance of the VMware Software Defined Data Center

To their credit SAP has taken a leadership role with HANA’s in-memory columnar database computing capabilities and as HANA has evolved the sizing and hardware requirements have evolved as well. Rapid change and evolving requirements are givens in technology; the VMware Software Defined Data Center provides a flexible and agile architecture to effectively react to change by recasting compute, network, and storage resources, in a centrally managed manner.

As a concrete example of the flexibility VMware’s Platform provides, Figure 1. illustrates the evolution of SAP HANA from SPS 07 to SPS 09. For customers who would like to take advantage of SAP HANA’s multi-temperature data management techniques but initially deployed SAP HANA on SPS 07 (all in-memory); through virtualization customers can reclaim and recast memory, storage, and network resources in their virtual HANA landscape to reflect the latest architectural advances and memory management techniques in SPS 10.

Figure 1. SAP HANA Platform: Evolving Hardware Requirements

sap hana vmware

Since SAP HANA can now run in a reduced memory footprint, customers who licensed HANA to be all in-memory can use virtualization to reclaim memory and deploy additional virtual databases and make HANA pervasive in their landscapes.

As a general rule, in any rapidly changing environment The VMware Software-Defined Data Center provides an agile platform which can accommodate change and also protect against capital hardware investments that may not be necessary in the future (certified vs. standard x86 hardware). For that matter, the cloud is a good option to deploy any rapidly changing application/database in places like VMware vCloud Air, Virtustream, or Secure-24 just to mention a few.

Virtual SAP HANA Back on track

After speaking with session attendees, customers, and partners, at VMworld about SAP HANA’s Multi-temperature management capabilities, I was happy to hear they will not be delaying their virtual HANA deployments due to the vSphere 6.0 roadmap certification timeline. As I said earlier, the 1TB monster virtual machine maximum in vSphere 5.5 is an artificial barrier. It really is a worthwhile exercise to take a closer look at the temperature of your data, age of your data, and your access requirements in order to take full advantage of all the tools and features SAP provides their customers.

I was also encouraged to hear from many session attendees that my presentation at VMWorld brought the SDDC from concept closer to reality by demonstrating actual mission critical database/application use cases. My future post VMWorld blogs will focus on how I deconstructed the SAP HANA Networks Requirements document and transformed that into a virtual network design using VMware NSX from my desktop. I’ll also cover Software Defined Storage, essentially translating SAP’s Multi-Temperature Storage Options into VMware Virtual Volumes and Storage Containers.

“SAP HANA SPS10- SAP HANA Dynamic Tiering”; (SAP Product Management)


“Distributed Query Processing in SAP IQ on VMware vSphere and Virtual SAN”; Priti Mishra, Performance Engineering VMware


Blog: Bob Goldsand; “SAP HANA Dynamic Tiering and the VMware Software Defined Data Center”





Open-VM-Tools (OVT): The Future of VMware Tools for Linux


For those of you who attended my VMworld sessions with Salil Suri, we dropped a hint that there are things happening with Open-VM-Tools (OVT). We at VMware know that vSphere lifecycle is a difficult task to take on and that updating VMware Tools across hundreds or thousands of virtual machines is an ever-increasing burden. There have been some initiatives inside of VMware to help mitigate the amount of work needed to orchestrate this task and I think you all will find very interesting and exciting. Continue reading