Measuring Cloud Scalability Using the Weathervane Benchmark

Cloud-based deployments continue to be a hot topic in many of today’s corporations. Often the discussion revolves around workload portability, ease of migration, and service pricing differences. In an effort to bring performance into the discussion we decided to leverage VMware’s new benchmark, Weathervane. As a follow-on to Harold Rosenberg’s introductory Weathervane post we decided to showcase some of the flexibility and scalability of our new large-scale benchmark. Previously, Harold presented some initial scalability data running on three local vSphere 6 hosts. For this article, we decided to extend this further by demonstrating Weathervane’s ability to run within a non-VMware cloud environment and scaling up the number of app servers.

Weathervane is a new web-application benchmark architected to simulate modern-day web applications. It consists of a benchmark application and a workload driver. Combined, they simulate the behavior of everyday users attending a real-time auction. For more details on Weathervane I encourage you to review the introductory post.

Environment Configuration:
Cloud Environment: Amazon AWS, US West.
Instance Types: M3.XLarge, M3.Large, C3.Large.
Instance Notes: Database instances utilized an additional 300GB io1 tier data disk.
Instance Operating System: Centos 6.5 x64.
Application: Weathervane Internal Build 084.

Testing Methodology:
All instances were run within the same cloud environment to reduce network-induced latencies. We started with a base configuration consisting of eight instances. We then scaled out the number of workload drivers and application servers in an effort to identify how a cloud environment scaled as application workload needs increased. We used Weathervane’s FindMax functionality which runs a series of tests to determine the maximum number of users the configuration can sustain while still meeting QoS requirements. It should be noted that the early experimentation allowed us to identify the maximum needs for the other services beyond the workload drivers and application servers to reduce the likelihood of bottlenecks in these services. Below is a block diagram of the configurations used for the scaled-out Weathervane deployment.

Results:
For our analysis of Weathervane cloud scaling we ran multiple iterations for each scale load level and selected the average. We automated the process to ensure consistency. Our results show both the number of users sustained as well as the http requests per second as reported by the benchmark harness.

As you can see in the above graph, for our cloud environment running Weathervane, scaling the number of applications servers yielded nearly linear scaling up to five application servers. The delta in scaling between the number of users and the http requests per second sustained was less than 1%. Due to time constraints we were unable to test beyond five application servers but we expect that the scaling would have continued upwards well beyond the load levels presented.

Although just a small sample of what Weathervane and cloud environments can scale to, this brief article highlights both the benchmark and cloud environment scaling. Though Weathervane hasn’t been released publicly yet, it’s easy to see how this type of controlled, scalable benchmark will assist in performance evaluations of a diverse set of environments. Look for more Weathervane based cloud performance analysis in the future.