You cannot afford for business critical applications in your datacenter to go down just to upgrade them. With that in mind, let’s look at which events might provide a good opportunity to virtualize applications in your datacenter. Below are some questions to ask when considering virtualization. If you answer “yes” to any of the questions, it might be time to virtualize that app.
Category Archives: Java
The following best practices for virtualizing Java can provide useful guidance for virtual CPU, virtual memory, networking and storage setup.
Whether a custom Java application or third-party vendor application, all virtualize relatively easily. Often our customers notice improvements in performance and scalability when moving to a virtualized platform. Java application clusters are known to contain many instances that require increased managment when on a physical infrastructure. Many of our customers look for consolidation opportunities while improving performance and scalability. The prime reason for virtualizing Java applications these days is the ability to reduce the JVM instance sprawl that many administrators dread—and have to consolidate through virtualization in order to keep the scale manageable and feasible. Many of our customers have virtualized IBM WebSphere, Oracle WebLogic, JBoss, and Tomcat. In the last three years, virtualization of Java applications has become mainstream, as seen with many of our customer accounts.
The results of the tests discussed in this paper show that enterprise-level Java applications can provide excellent performance when deployed on VMware vSphere 4.1. The application used in these tests was Olio, a multi-tier enterprise application that implements a complete social networking Web site. Olio was deployed on SpringSource tc Server, running both natively and virtualized on vSphere 4.1. Figure 5 shows the peak throughput for a single instance of Olio running on tc Server, both natively and in a VM, with two and four CPUs.
Customer Case in Point
“With our OrderExpress project we upgraded our WebSphere Commerce, Portal, WCM, Service Layer, DB2 Database; migrated from AIX to Linux; virtualized on VMware; moved the application into a three-tier DMZ; increased our transactions by over 150 percent; and added significant new capabilities that greatly improved the customer experience. Changing such a wide range of technology components at once was a huge challenge. However using VMware vSphere and additional architectural changes we were successful in improving performance by over 300 percent; lowered costs in the millions; improved security, availability, and scalability; and how we plan to continue evolving this application to maintain greater than 30 percent yearly growth.” – Jeff Battisti, Senior Enterprise Architect at Cardinal Health
This post is part of the 7-part series Seven Top Benefits of Virtualizing Business Critical Applications.
Ensuring availability of your applications is difficult. Each application component must be made highly available, and operations teams often struggle with a proliferation of different clustering and availability options. The Web tier is fairly simple to protect using network load balancing, and the application tier can be clustered, but databases are typically the most difficult tier to protect. Databases can be protected using Microsoft Clustering, database mirroring, or high-end options such as Oracle RAC.
VMware provides a range of capabilities that can extend availability to 100 percent of applications including databases, without the complexity or cost of clustering. These capabilities are:
- vMotion – Move running virtual machines from one physical server to another with no impact to end users. vMotion keeps your IT environment up and running, giving you unprecedented flexibility and availability to meet the increasing demands of your business and end users.
- High Availability – Provides automated application restart in the event of host failure or OS failure within the virtual machine. It is automatically available for any application running on vSphere. VMware HA is simple and does not require OS- or app-level clustering. It is also very cost effective because it doesn’t rely on dedicated standby servers, and in many cases allows the use of lower-cost OS and application licenses.
- App-Aware High Availability – Monitors the application and if it goes down, it can be restarted. App-Aware HA will run the failover only when the application doesn’t come back up again. The underlying technology depends on the VMware HA to automatically initiate the failover. App-Aware HA is an API that allows users to plug in one of two currently available third-party App-Aware products from Symantec or Neverfail.
- Fault Tolerance – Protects any application against host failure with continuous availability, without data loss or downtime. VMware FT creates virtual machine “pairs” that run in lock step—essentially mirroring the execution state of a virtual machine. To the external world they appear as one instance (one IP address, one application)—but they are fully redundant instances.
The siloed example of availability methods shown in Figure 11 requires expensive licenses, dedicated standby infrastructure, and highly skilled staff to configure and manage. The alternative to this expensive approach is a standardized approach using vSphere technology, though some companies choose to implement both appspecific and VMware solutions running in tandem.
To prepare for availability issues affecting an entire datacenter, VMware vCenter™ Site Recovery Manager (SRM) enables datacenter teams to build, manage, and execute reliable disaster recovery plans for all applications, including business-critical apps. By taking full advantage of the encapsulation and isolation of virtual machines, SRM enables simplified automation of disaster recovery. SRM helps meet recovery time objectives, reduces costs traditionally associated with business continuance plans, and achieves low-risk and predictable results for recovery of a virtual environment.
The figure below lists some of the top business and technical reasons to virtualize business-critical applications.
Note: Consolidation rates are averages based on “VMware Customer Readiness Reviews.” Licensing savings are cited in the Licensing section of the below whitepaper.
vSphere delivers the performance required to run business-critical applications in large-scale environments. vSphere 5 provides 16 times (source Figure 14 in BCA Whitepaper) the performance of VMware Infrastructure 3 while keeping virtualization overhead at a limited 2 to 5 percent. The fact is that the virtualization overhead or “tax” is often greatly exaggerated and many application owners are managing applications that have already been virtualized by the server and virtualization teams, and the applications owners don’t even know it.
Performance is a major factor in business-critical applications. Virtual machines perform the same as their physical equivalents, as witnessed in production by the app owners. The following set of graphs illustrates this performance across several applications.
Virtualized Oracle databases perform the same as native databases from the application owner’s perspective (source: Virtualizing Performance-Critical Database Applications in VMware vSphere).
In the figure below, Confio, a third-party company unaffiliated with VMware, compared virtual and physical servers in a side-by-side test, finding the performance would be the same to the DBA (Source: A Comparison of Oracle Performance on Physical and VMware Servers, 2012. Written by Confio, www.confio.com.)
In the figure below, Virtualized SQL databases perform the same as native databases from the application owner’s perspective (Source: Performance and Scalability of Microsoft SQL on vSphere.).
In the figure below, Virtualized SAP performs the same as native equivalents from the application owner’s perspective (Source: Virtualized SAP Performance with VMware vSphere 4.).
In the figure below, Virtualized Java performs the same as native equivalents from the application owner’s perspective (Source: Performance of Enterprise Java Applications on VMware vSphere 4.1 and SpringSource tc Server.).
In the figure below, Virtualized Hadoop performs the same as native equivalents from the application owner’s perspective (Source: Source: “A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere® 5”, 2012.)
Audience: This whitepaper provides solution and technical product information is intended for Architects, Engineers, Administrators, DBAs, App-owners and Business staff
Purpose of this whitepaper: This whitepaper documents the challenges with virtualizing business critical apps and provides evidence for overcoming these challenges and to virtualize these apps.
Executive Summary: Starting with vSphere 4, and more recently using vSphere 5, customers are virtualizing business-critical applications at an accelerated pace. 75-percent of VMware customers report they virtualize at least one business-critical application in their production environment. Application infrastructure administrators and CIOs see that the value of virtualization extends far beyond basic consolidation. Applications run better virtualized, with faster time to market and improved Quality of Service (QoS).
Several customers have asked what vFabric SQLFire can do for their applications and how they can modify the architecture of a custom Java application accordingly. These customers typically run custom Java applications against RDBMS that have reached the limits of scalability and response time with their current architecture. They want to make the change only if they are not too invasive.
To answer these questions fairly, we simulated a customer scenario with no specific assumptions about specialized tuning. First, we took Spring Travel with RDBMs as-is and ran a load test against it. Then we converted the Spring Travel schema to run against vFabric SQLFire, also without tuning, and plotted the results side by side.
We also wanted to demonstrate how quickly we could make this change without assumptions about any code intrusion/change, so we simulated how a developer might download the Spring Travel application, run the DDlUtils conversion utility to generate the SQLFire schema and the data load file, then quickly test to see the performance improvements. We felt this would answer the customers’ questions without bias.
This post summarizes the results. The details of the conversion of the RDBMS schema and data can be found in the vFabric SQLFire POC Jumpstart Service Kit, or vFabric SQLFire Accelerator Service Kit. (Contact your local VMware account team for details on these service offerings.). The conversion process took one day, and the total process—downloading the Spring Travel application, installing vFabric SQLFire, running the schema and data conversion utility, and running the load test—took 3 days. We iterated the results for another week for verification.
NOTE: You can download Spring Travel from: http://www.springsource.org/download. Navigate to the download link under Spring Web Flow 2.3.0. For vFabric SQLFire, see: http://www.vmware.com/products/application-platform/vfabric-sqlfire/overview.html
Figure 1. Spring Travel on Traditional Disk-Based RDBMS versus vFabric SQLFire
The results are plotted in Table 1. The legend for the columns is as follows:
- Threads: The number of concurrent Spring Travel application threads executed during the two load tests.
- SQLF R/T (ms): The response time of Spring Travel application in milliseconds.
- SQLF CPU %: The percentage of CPU utilization at peak for the SQLFire VMs.
- RDBMS R/T (ms): The Spring Travel application response time in milliseconds when running against the traditional disk-based RDBMS.
- RDBMS CPU %: The percentage of CPU utilization at peak on the RDBMS VM.
The results tested for a range of 18 to 7200 concurrent threads. ”Failed” indicates that Spring Travel running against the traditional disk-based RDBMS failed to respond. Since it was essentially frozen, we collected no data from it.
Table 1. Spring Travel Results with RDBMS versus SQLFire
Spring Travel Response Time versus Concurrent Threads Test Results
Figure 2 shows response time along the vertical axis and concurrent number of threads along the horizontal axis. The red line represents Spring Travel running against a traditional disk-based RDBMS, and the blue line represents Spring Travel running against SQLFire. The data shows that as the number of concurrent threads increases along the horizontal axis, the Spring Travel response time increases in a linear fashion when running against the disk-based RDBMS but remains constant, as indicated by a fairly low and flat blue line, when running against SQLFire.
Figure 2 – Spring Travel Response Time versus Concurrent Threads
Scalability Test Results
This test demonstrates the extent of scalability of both configurations. When Spring Travel ran against the RDBMS, after reaching 1850 concurrent threads and getting close to a response time of 172 milliseconds, the system stopped responding, indicating that it had reached the scalability limit. This is indicated by the red line on Figure 3. On the other hand, Spring Travel running against SQLFire continued to function to the limit of 7200 concurrent threads and a response time of 984 milliseconds. This is indicated by the blue line.
NOTE: At approximately 3600 concurrent threads, SQLFire started to overflow to disk, and the response time increased. In a normal situation, you can use appropriate sizing of available RAM to contain this kind of overflow.
Figure 3. Spring Travel Response Time versus Concurrent Threads – Scalability Test
CPU versus Concurrent Threads Test Results
Figure 4 shows the CPU %, measured for the duration of the test, of the RDBMS VM in red and the SQLFire VMs in blue. Using the RDBMS, Spring Travel peaked at approximately 80% CPU and 1850 concurrent threads. At this point, it completely failed to respond. The SQLFire-based Spring Travel configuration, on the other hand, continued to 98% CPU utilization at 7200 concurrent threads and was still responsive – this is at approximately 984 milliseconds f Spring Travel application response time. The red and blue lines crossed over at approximately 1000 concurrent threads, indicating that Spring Travel with SQLFire handled much higher loads at a steadier CPU utilization increase.
Figure 4 – Spring Travel Application CPU versus Concurrent Threads
Summary of Findings
This simulation shows that:
- Using the DDlUtils utility to convert the schema and data of the RDBMS associated with Spring Travel application was relatively straightforward.
- The installation of vFabric SQLFire was also straightforward.
- Spring Travel pointing to vFabric SQLFire scaled approximately 4x when compared to Spring Travel pointing to an RDBMS.
- The response times of SQLFire were 5x to 30x faster with vFabric SQLFire. Further, the response times on SQLFire were more stable and constant with increased load.
- The configuration of Spring Travel with an RDBMS has a response time that increases linearly with increased load.
- The break point for Spring Travel against an RDBM was at 80% CPU utilization for about 1850 concurrent threads, after which Spring Travel stopped responding. The SQLFire version of Spring Travel continued to pace ahead at 98% CPU utilization and achieved 7200 concurrent threads.
- NOTE: The assumption here in the test across the two configurations was that the total compute resource was the same, meaning a true apples-to-apples comparison. The RDBMS VM was eight vCPU and 4GB of RAM, while the SQLFire VMs were of 2GB and two vCPU each. In configuration cases VM memory reservation was set.
Thank you for reading! Looking forward to seeing you at VMworld 2012, I will blog about my VMworld sessions shortly.
Looking at a Sizing Example
Figure 4, shows the most commonly encountered JVM and virtual machine size. This may be a fairly busy JVM with 100 to 250 concurrent threads (actual thread count varies as it depends on the nature of the workload), 4GB of heap, approximately 4.5GB for the JVM process, and 0.5GB for the guest OS. This results in a total recommended memory reservation for the virtual machine of 5GB with 2 vCPU and 1 JVM process.
Figure 4. Most Commonly Encountered Configuration
Figure 5 takes a closer look at the sizing example within the Java process, the memory layout, and for various sizes..
Figure 5. 5GB RAM Virtual Machine with One JVM Process and Two CPUs
The general sizing equation can be summarized as follows:
Let’s assume that, through load testing, a JVM max heap (-Xmx) of 4096m has been determined as necessary. Proceed to size as follows:
- Set -Xmx4096m and set –Xms4096m.
- Set –XX:MaxPermSize=256m. This value is a common number and depends on the memory footprint of the class-level information within your Java application code base..
- The other segment of NumberOfConcurrentThreads*(-Xss) depends mostly on the NumberOfConcurrentThreads the JVM will process, and the –Xss value you have chosen. A common range of –Xss is 128k-192k. If, for example, NumberOfConcurrentThreads is 100, then 100*192k => 19200k (assuming you set –Xss to 192k).
Note: The stack -Xss is application dependent, i.e. if the stack is not sized correctly you will get a StackOverflow. The default value is sometimes quite large, but you can size it down to help save on memory consumption.
- Assume the OS has a requirement of about 500m to run as per the OS spec.
- Total JVM memory (java process memory) = 4096m (-Xmx) + 256m (–XX:MaxPermSize) + 100*192k (NumberOfConcurrentThreads*-Xss) + “other mem”.
- Therefore, JVM memory is approximately 4096m+256m+19.2m + “other mem” = 4371m + “other mem”.
- Typically “other mem” is not significant. However, it can be quite large if the application uses lots of NIO buffers and socket buffers. Otherwise, assuming about 5 percent of the total JVM process memory (i.e., 4 to 5 percent of 4371=> assume 217m), should be enough, although proper load testing should be used to verify.
- This implies that JVM process memory is 4371m+217m=4588m
- To determine the virtual machine memory, assume you are using Linux with only this single Java process and no other significant process running. The total configured memory for the virtual machine translates to: VM memory = 4588m + 500m = 5088m.
- Finally, you should set the virtual machine memory as the memory reservation. You can choose to set the memory reservation as 5088m. However, over time you should monitor the active memory used by the virtual machine that houses this JVM process and adjust the memory reservation to that active memory value, which could be less than 5088m.
Stay tuned. In Part 3 we will conclude with an overview of the recently released book, Enterprise Java Applications Architecture on VMware.
This week we will publish three blogs on the topic of sizing virtual machines for Java workloads. In the bogs we will discuss various sizing considerations, best practices, sizing limits, and the most common configuration used by our customers.
A sizing exercise in a virtualized environment for Java workloads is similar to that in physical environment. The only difference is that virtualized environment provides more flexibility, such as the ability to easily change the compute resource configuration. For more detailed information, we encourage you to review the Enterprise Java Applications on VMware – Best Practices Guide (http://www.vmware.com/resources/techresources/1087)
Sizing Virtual Machines for JVM workloads – Part 1
Before delving into various sizing considerations we’ll provide some background information about the practical sizing limits of JVMs.
Background: JVM Practical Sizing Limits
Figure 1 illustrates the theoretical and practical sizing limits of Java workloads. These are critical limits that you need to be aware of when sizing JVM workloads.
Figure 1. Theoretical and Practical Sizing Limits of JVMs
The JVM theoretical limit is 16 exabytes, but there is no practical system that can provide this amount of memory, so we present this as the first theoretical limit.
- The second limit is the amount of memory a guest OS can support. In most practical cases, this is several terabytes and depends on the operating system used.
- The third limit is the ESXi 5 1TB RAM per virtual machine limit, which is ample for any workload that we have encountered.
- The fourth limit (really, the first practical limit) is the amount of RAM that is cost-effective on typical ESXi hosts.
- The fifth limit is the total amount of RAM across the server, and how this is divided into the number of NUMA nodes, where each processer socket will have one NUMA node worth of NUMA-local memory. The NUMA-local memory can be calculated as the total amount of RAM within the server divided by the number of processor sockets. We know that for optimal performance you should always size a virtual machine within the NUMA node memory boundaries. ESXi has many NUMA optimizations that come into play, but even so, it is best to stay NUMA local.
For example, if the ESX host has 256GB of RAM across two processor sockets, it has 2 NUMA nodes with 128GB (256GB/2) of RAM across each NUMA node. This implies that when you are sizing a virtual machine, it should not exceed the 128GB limit in order for it to be NUMA local.
The limits outlined above can help drive your design and sizing decision as to how practical and feasible it is to size large JVMs. However, there are other considerations that come with sizing very large JVMs such as GC tuning complexity and the knowledge needed to maintain large JVMs. In fact, most commonly sized JVMs within the VMware customer base are around 4GB of RAM for a typical enterprise Web application. On the other hand, larger JVMs exist, and we have customers that run large scale monitoring systems and large distributed data platforms on JVMs ranging from 4GB to 128GB.
With large JVMs comes the need to better understand GC tuning. VMware has helped many customers with their GC tuning activities, even though GC tuning on physical is no different than on virtual. The reason is that VMware has uniquely integrated vFabric Java and vSphere expertise into one spectrum, which has helped our customers optimally run many Java workloads on vSphere. When faced with the question of whether to vertically scale the size of the JVM and virtual machine, first consider a horizontal scale out approach. VMware has consistently found that our customers get better scalability with the horizontal scale out approach.
Furthermore, when sizing, it is helpful to categorize the size of the JVMs and virtual machines based on the Java workload types, as shown in Figure 2.
Figure 2. Common JVM Sizes and Workload Categories
We usually find that the reason customers vertically scale a JVM is due to perceived simplicity of deployment and leaving existing JVM processes intact. Be aware of workload-related choices.
- For example, a customer initially deploys one JVM process and as demand increases for more applications to be deployed, instead of horizontally scaling out by creating a second JVM and virtual machine, a vertical scale up approach is taken. As a consequence, the existing JVM is forced to vertically scale and carry many different types of workloads with varied requirements.
- Keep in mind that some workloads, such as, a job scheduler, have a need for high throughput, while a public facing Web application demands fast response time. Stacking these types of applications on top of each other within one JVM complicates the GC cycle tuning opportunity. When tuning GC for higher throughput it is usually at the cost of decreased response time, and vice-versa.
- You can achieve both higher throughput and better response time with GC tuning, but it unnecessarily extends the GC tuning activity. When faced with this deployment choice it is always best to split out different types of Java workloads into their own JVMs. One approach is to run the job scheduler type of workload in its own JVM and virtual machine, and the Web-based Java application on its own JVM and virtual machine.
In Figure 3, JVM-1 is deployed on a virtual machine that has mixed application workload types, which complicates GC tuning and scalability when attempting to scale up this application mix in JVM-2. A better approach is to split the Web application into JVM-3 and the job scheduler application into JVM-4 (that is, scaled out horizontally with the flexibility to vertically scale if needed). If you compare the vertical scalability of JVM-3 and JVM-4 versus vertical scalability of JVM-2 you will find JVM-3 and JVM-4 always scale better and are easier to tune.
Figure 3. Splitting Workload Types to Improve Scalability
In Part 2 we will look at an actual sizing example with some practical numbers that you can directly apply.