Home > Blogs > VMware VROOM! Blog > Tag Archives: Container

Tag Archives: Container

Running Transactional Workloads Using Docker Containers on vSphere 6.0

In a series of blogs, we showed that not only can Docker containers seamlessly run inside vSphere 6.0 VMs, but both micro-benchmarks and popular workloads in such configurations perform as well as, and in some cases better than, in native Docker configurations.

See the following blog posts for our past findings:

In this blog, we study a transactional database workload and present the results on how Docker containers perform in a VM when we scale out the number of database instances. To do this experiment, we use DVD Store 2.1, which is an OLTP benchmark that supports and stresses many different back-end databases including Microsoft SQL Server, Oracle Database, MySQL, and PostgreSQL. This benchmark is open source and the latest version 2.1 is available here. It is a 3-tier application with a Web server, an application server, and a backend database server. The benchmark simulates a DVD store, where customers log in, browse, and order DVD products. The tool is designed to utilize a number of advanced database features including transactions, stored procedures, triggers, and referential integrity. The main transactions are (1) add new customers, (2) log in customers, (3) browse DVDs, (4) enter purchase orders, and (5) re-order stock. The client driver is written in C# and is usually run from Windows; however, the client can be run on Linux using the Mono framework. The primary performance metric of the benchmark is orders per minute (OPM).

In our experiments, we used a PostgreSQL database with an Apache Web server, and the application logic was implemented in PHP. In all tests we ran 16 instances of DVD Store, where each instance comprises all 3-tiers. We found that, due to better scheduling, running Docker in a VM in a scale-out scenario can provide better throughput than running a Docker container in a native system.

Next, we present the configurations, benchmarks, detailed setup, and the performance results.

Deployment Scenarios

We compare four different scenarios, as illustrated below:

–   Native: Linux OS running directly on hardware (Ubuntu 14.04.1)

–   vSphere VM: vSphere 6.0 with VMFS5, in 8 VMs, each with the same guest OS as native

–   Native-Docker: Docker 1.5 running on a native OS (Ubuntu 14.04.1)

–   VM-Docker: Docker 1.5 running in each of 8 VMs on a vSphere  host

In each configuration, all of the power management features were disabled in the BIOS.

Hardware/Software/Workload Configuration

Figure 1 shows the hardware setup diagram for the server host below. We used Ubuntu 14.04.1 with Docker 1.5 for all our experiments.  While running Docker configuration, we use bridged networking and host volumes for storing the database.

dvdstore-benchmark-system-setup

Figure 1. Hardware/software configuration

Performance Results

We ran 16 instances of DVD Store, where each instance was running an Apache web server, PHP application logic, and a PostgreSQL database. In the Docker cases, we ran one instance of DVD Store per Docker container. In the non-Docker cases, we used the Virtual Hosts functionality of Apache to run many instances of a Web server listening on different ports. We also used the PostgreSQL command line to create different instances of the database server listening on different ports. In the VM-based experiments, we partitioned the host hardware between 8 VMs, where each VM ran 2 DVD Store instances. The 8 VMs exactly committed the CPUs, and under-committed the memory.

The four configurations for our experiments are listed below.

Configurations:

  • Native-16S: 16 instances of DVD Store running natively (16 separate instances of Apache 2 using virtual hosts and 16 separate instances of PostgreSQL database)
  • Native-Docker-16S: 16 Docker containers running on a native machine with each running one instance of DVD Store.
  • VM-8VMs-16S: Eight 4-vCPU VMs each running 2 DVD Store instances
  • VM-Docker-8VMs-16S: Eight 4-vCPU VMs each running 2 Docker containers,  where each Docker container is running one instance of DVD Store

We ran the DVD Store benchmark for the 4 configurations using 16 client drivers, where each driver process was running 4 threads. The results for these 4 configurations are shown in the figure below.

dvdstore-perf-docker-vsphere

Figure 2. DVD Store performance for different configurations

In the chart above, the y-axis shows the aggregate DVD Store performance metric orders per minute (OPM) for all 16 instances. We have normalized the order per minute results with respect to the native configuration where we saw about 126k orders per minute. From the chart, we see that, the VM configurations achieve higher throughput compared to the corresponding native configurations. As in the case of prior blogs, this is due to better NUMA-aware scheduling in vSphere.  We can also see that running Docker containers either natively or in VMs adds little overhead (2-4%).

To find out why the native configurations were not doing better, we pinned half of the Docker containers to one NUMA node and half to the other. The DVD Store aggregate OPM improved as a result and as expected, we were seeing slightly better than the VM configuration.  However, manually pinning processes to cores or sockets is usually not a recommended practice because it is error-prone and can, in general, lead to unexpected or suboptimal results.

Summary

In this blog, we showed that running a PostgreSQL transactional database in a Docker container in a vSphere VM adds very little performance cost compared to running directly in the VM. We also find that running Docker containers in a set of 8 VMs achieves slightly better throughput than running the same Docker containers natively with an out-of-the-box configuration. This is a further proof that VMs and Docker containers are truly “better together.”

 

 

Scaling Web 2.0 Applications using Docker containers on vSphere 6.0

by Qasim Ali

In a previous VROOM post, we showed that running Redis inside Docker containers on vSphere adds little to no overhead and observed sizeable performance improvements when scaling out the application when compared to running containers directly on the native hardware. This post analyzes scaling Web 2.0 applications using Docker containers on vSphere and compares the performance of running Docker containers on native and vSphere. This study shows that Docker containers add negligible overhead when run on vSphere, and also that the performance using virtual machines is very close to native and, in certain cases, slightly better due to better vSphere scheduling and isolation.

Web 2.0 applications are an integral part of Enterprise and small business IT offerings. We use the CloudStone benchmark, which simulates a typical Web 2.0 technology use in the workplace for our study [1] [2]. It includes a Web 2.0 social-events application (Olio) and a client implemented using the Faban workload generator [3]. It is an open source benchmark that simulates activities related to social events. The benchmark consists of three main components: a Web server, a database backend, and a client to emulate real world accesses to the Web server. The overall architecture of CloudStone is depicted in Figure 1.

Figure 1: CloudStone architecture

The benchmark reports latency for various user actions. These metrics were compared against a fixed threshold. Studies indicate that users are less likely to visit a Web site if the response time is greater than 250 milliseconds [4]. This number can be used as an upper bound for latency for frequent operations (Home-Page, TagSearch, EventDetail, and Login). For the less frequent operations (AddEvent, AddPerson, and PersonDetail), a less restrictive threshold of 500 milliseconds can be used. Table 1 shows the exact mix/frequency of various operations.

Operation Number of Operations Mix
HomePage 141908 26.14%
Login 55473 10.22%
TagSearch 181126 33.37%
EventDetail 134144 24.71%
PersonDetail 14393 2.65%
AddPerson 4662 0.86%
AddEvent 11087 2.04%

Table 1: CloudStone operations frequency for 1500 users

Benchmark Components and Experimental Set up

The test system was installed with a CloudStone implementation of a MySQL database, NGINX Web server with PHP scripts, and a Tomcat application server provided by the Faban harness. The default configuration was used for the workload generator. All components of the application ran on a single host, and the client ran in a separate virtual machine on a separate host. Both hosts were connected using a direct link between a pair of 10Gbps NICs. One client-server pair provided a single, independent CloudStone instance. Scaling was achieved by running additional instances of CloudStone.

Deployment Scenarios

We used the following three deployment scenarios for this study:

  • Native-Docker: One or more CloudStone instances were run inside Docker containers (2 containers per CloudStone instance: one for the Web server and another for the database backend) running on the native OS.
  • VM: CloudStone instances were run inside one or more virtual machines running on vSphere 6.0; the guest OS is the same as the native scenario.
  • VM-Docker: CloudStone instances were run inside Docker containers that were running inside one or more virtual machines.

Hardware/Software/Workload Configuration

The following are the details about the hardware and software used in the various experiments discussed in the next section:

Server Host:

  • Dell PowerEdge R820
  • CPU: 4 x Intel® Xeon® CPU E5-4650 @ 2.30GHz (32 cores, 64 hyper-threads)
  • Memory: 512GB
  • Hardware configuration: Hyper-Threading (HT) ON, Turbo-boost ON, Power policy: Static High (that is, no power management)
  • Network: 10Gbps
  • Storage: 7 x 250GB 15K RPM 4Gb SAS  Disks

Client Host:

  • Dell PowerEdge R710
  • CPU: 2 x Intel® Xeon® CPU X5680 @ 3.33GHz (12 cores, 24 hyper-threads)
  • Memory: 144GB
  • Hardware configuration: HT ON, Turbo-boost ON, Power policy: Static High (that is, no power management)
  • Network: 10Gbps
  • Client VM: 2-vCPU 4GB vRAM

Host OS:

  • Ubuntu 14.04.1
  • Kernel 3.13

Docker Configuration:

  • Docker 1.2
  • Ubuntu 14.04.1 base image
  • Host volumes for database and images
  • Configured with host networking to avoid Docker NAT overhead
  • Device mapper as the storage backend driver

ESXi:

  • VMware vSphere 6.0 (pre-release build)

VM Configurations:

  • Single VM: An 8-vCPU 4GB VM ( Web server and database running in a single VM)
  • Two VMs: One 6-vCPU 2GB Web server VM and one 2-vCPU 2GB database VM (CloudStone instance running in two VMs)
  • Scale-out: Eight 8-vCPU 4GB VMs

Workload Configurations:

  • The NGINX Web server was configured with 4 worker processes and 4096 connections per worker.
  • PHP was configured with a maximum number of 16 child processes.
  • The Web server and the database were preconfigured with 1500 users per CloudStone instance.
  • A runtime of 30 minutes with a 5 minute ramp-up and ramp-down periods (less than 1% run-to-run  variation) was used.

Results

First, we ran a single instance of CloudStone in the various configurations mentioned above. This was meant to determine the raw overhead of Docker containers running on vSphere vs. the native configuration, eliminating scheduling differences. Second, we picked the configuration that performed best in a single instance and scaled it out to run multiple instances.

Figure 2 shows the mean latency of the most frequent operations and Figure 3 shows the mean latency of less frequent operations.

Figure 2: Results of single instance CloudStone experiments for frequently used operations

 

Figure 3: Results of single instance CloudStone experiments for less frequently used
operations

We configured the benchmark to use a single VM and deployed the Web server and database applications in it (configuration labelled VM-1VM in Figure 2 and 3). We then ran the same workload in Docker containers in a single VM (VMDocker-1VM). The latencies are slightly higher than native, which is expected due to some virtualization overhead. However, running Docker containers on a VM showed no additional overhead. In fact, it seems to be slightly better. We believe this might be due to the device mapper using twice as much page cache as the VM (device mapper uses a loopback device to mount the file system and, hence, data ends up being cached twice in the buffer cache). We also tried AuFS as a storage backend for our container images, but that seemed to add some CPU and latency overhead, and, for this reason, we switched to device mapper. We then configured the VM to use the vSphere Latency Sensitivity feature [5] (VM-1VM-lat and VMDocker-1VM-lat labels). As expected, this configuration reduced the latencies even further because each vCPU got exclusive access to a core and this reduced scheduling overhead.  However, this feature cannot be used when the VM (or VMs) has more vCPUs than the number of cores available on the system (that is, the physical CPUs are over-committed) because each vCPU needs exclusive access to a core.

Next, we configured the workload to use two VMs, one for the Web server and the other for the database application. This configuration ended up giving slightly higher latencies because the network packets have to traverse the virtualization layer from one VM to the other, while in the prior experiments they were confined within the same VM.

Finally, we scaled out the CloudStone workload with 12,000 users by using eight 8-vCPU VMs with 1500 users per instance. The VM configurations were the same as the VM-1VM and VMDocker-1VM cases above. The average system CPU core utilization was around 70-75%, which is the typical average CPU utilization for latency sensitive workloads because it allows for headroom to absorb traffic bursts. Figure 4 reports mean latencies of all operations (latencies were averaged across all eight instances of CloudStone for each operation), while Figure 5 reports the 90th percentile latencies (the benchmark reports these latencies in 20 millisecond granularity as evident from Figure 5.)

Scale-out-meanlatency

Figure 4: Scale-out experiments using eight instances of CloudStone (mean latency)

Scale-out-90pctLatency

Figure 5: Scale-out experiments using eight instances of CloudStone (90th percentile latency)

The latencies shown in Figure 4 and 5 are well below the 250 millisecond threshold. We observed that the latencies on vSphere are very close to native or, in certain cases, slightly better than native (for example, Login, AddPerson and AddEvent operations). The latencies were better than native due to better vSphere scheduling and isolation, resulting in better cache/memory locality. We verified this by pinning container instances on specific sockets and making the native scheduler behavior similar to vSphere. After doing that, we observed that latencies in the native case got better and they were similar or slightly better than vSphere.

Note: Introducing artificial affinity between processes and cores is not a recommended practice because it is error-prone and can, in general, lead to unexpected or suboptimal results.

Conclusion

VMs and Docker containers are truly “better together.” The CloudStone scale-out system, using out-of-the-box VM and VM-Docker configurations, clearly achieves very close to, or slightly better than, native performance.

References

[1] W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, A. Klepchukov, S. Patil, O. Fox and D. Patterson, “CloudStone: Multi-Platform, Multi-Languarge Benchmark and Measurement Tools for Web 2.8,” 2008.
[2] N. Grozev, “Automated CloudStone Setup in Ubuntu VMs / Advanced Automated CloudStone Setup in Ubuntu VMs [Part 2],” 2 June 2014. https://nikolaygrozev.wordpress.com/tag/cloudstone/.
[3] java.net, “Faban Harness and Benchmark Framework,” 11 May 2014. http://java.net/projects/faban/.
[4] S. Lohr, “For Impatient Web Users, an Eye Blink Is Just Too Long to Wait,” 29 February 2012. http://www.nytimes.com/2012/03/01/technology/impatient-web-usersflee-slow-loading-sites.html.
[5] J. Heo, “Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5,” 18 September 2013.   http://blogs.vmware.com/performance/2013/09/deploying-extremely-latency-sensitive-applications-in-vmware-vsphere-5-5.html.