Unix to Virtualized Linux (U2VL) is a critical step towards SDDC, it targets to migrate applications and data from physical Unix servers to Linux virtual machines running on x86 virtualized infrastructure. These applications are typically business critical, therefore, customers normally take a very cautious approach by doing a carefully planned and executed Proof-of-Concept (POC) in order to validate performance, availability, and scalability, among many other areas.
My colleagues in China (a big shout out to Tony Wang and his team!) recently did one such POC with a large local bank, and naturally they chose Virtual SAN hyper-converged architecture for all of the compute and storage needs. The test results were so illustrative of many of the Virtual SAN benefits, I’d like to share this POC and some of the test results here, although I’m not allowed to mention the customer name due to reasons you probably understand.
One of the slides we showcased during the VMware Virtual SAN 6.1 Launch that got a lot of attention was the following slide:
A lot of eyebrows in the audience were going up wondering how we came to the conclusion that VSAN delivers 6-9s availability level (or less than 32 seconds of downtime a year). While, Virtual SAN uses software-based RAID, which differs in implementation from traditional storage solutions, it does have the same end result – your data objects are mirrored (RAID-1) for increased reliability and availability. Moreover, with VSAN your data is mirrored across hosts in the cluster not just across storage devices, as is the case with typical hardware RAID controllers.
The VSAN users can set their goals for data availability by means of a policy that may be specified for each VM or even for each VMDK if desired. The relevant policy is called ‘Failures to Tolerate’ (FTT) and refers to the number of concurrent host and/or disk failures a storage object can tolerate. For FTT=n, “n+1” copies of the object are created and “2n+1” hosts are required (to ensure availability even under split brain situations).
For the end user, it is important to quantify the levels of availability achieved with different values of the FTT policy. With only one copy (FTT=0), the availability of the data equals the availability of the hardware the data resides on. Typically, that is in the range of 2-9s (99%) availability, i.e., 3.65 Days downtime/year. However, for higher values of FTT, more copies of the data are created across hosts and that reduces exponentially the probability of data unavailability. With FTT=1 (2 replicas), data availability goes up to at least 4-9s (99.99% or 5 minutes downtime per year), and with FTT=2 (3 replicas) it goes up to 6-9s (99.9999% or 32 seconds downtime per year). Put simply, for FTT=n, more than n hosts and/or devices have to fail concurrently for one’s data to become unavailable. Many people challenged us to show them how the math actually works to arrive at these conclusions. So let’s get to it.
In keeping with the theme of moving the Software-Defined Data Center from concept to reality, I discussed in my previous blogs why VMware vSphere is the perfect platform to deploy cutting edged technologies like SAP HANA. This is because vSphere enables our customers to agilely react to rapidly changing hardware/software requirements by recasting memory, CPU, IO, or network resources where needed in your landscape through software in a centrally managed manner. I also discussed how VMware Virtual Volumes can be leverage to simplify SAP’s multi-temperature data management strategy; where data is classified by the frequency of access as either hot, warm, or cold depending on data usage. This is an example of the essence of Software-Defined Storage.
Mission Critical Architectures: Completing The Picture with VMware NSX
In this blog I want to discuss how VMware NSX can be leveraged in your SAP HANA Landscapes. Figure 1. is an excerpt from the SAP HANA Network Requirements Guide, which kind of goes to the heart of why networks should be virtualized. Now the components of an SAP HANA system communicate via different network channels. Rightfully so, SAP recommended to have a well-defined network topology to control and limit access into only the required access channels in order to apply the appropriate security measures as necessary.
Figure 1. SAP HANA Network Zones
In the Client Zone access is granted to different clients, such as the SQL clients on SAP application servers. In addition there are also browser applications using HTTP/S to access the SAP HANA server, as well as other data sources (such as BI) which need a network communication channel to the SAP HANA database
This article takes eight common misperceptions about virtualizing Hadoop and explains why they are errors in people’s understanding. The short explanations given should serve to clear up the understanding about these important topics.
Myth #1: Virtualization may add significant performance overhead to a Hadoop cluster.
This is a common question from users who are in the early stages of considering virtualizing their Hadoop clusters. Engineers at VMware (and some of its customers) have done several iterations over multiple years of performance testing of Hadoop on vSphere with various hardware configurations. These tests have consistently shown that virtualized Hadoop performance is comparable to, and in some cases better than that of a native equivalent.
Throughout this blog post I’ll highlight some of the enhancements that have been brought to the vSphere Web Client in 5.5 Update 3. This is especially important as we see customers continue to leverage the legacy vSphere Client (also referred to as the legacy C# client). Our goal is to make the Web Client everyone’s primary management tool for vCenter Server & vSphere and continuing to improve performance has been an essential requirement in doing that.
The Hadoop-based system running on vSphere that is described here was architected by Rajit Saha, (who provided the material for this blog) and a team from VMware’s IT department.
This article describes the technical infrastructure for a VMware internal IT project that was built and deployed in 2015 for analyzing VMware’s own business data.. Details of the business applications used in the system are not within the scope of this article. The virtualized Hadoop environment and modern analytics project was implemented entirely on the vSphere 6 platform.
The key lesson that we learned from this implementation is that you can start at a small scale with virtualizing big data/Hadoop and then scale the system up over time. You don’t need to wait for a large amount of hardware to become available to get started.
One question I’m commonly asked (aka weekly if not daily) is what are the perfect pCPU to vCPU ratios that I should plan for, and operate to, for maximum performance. I wanted to document my perspective for easy future reference.
There is no common ratio and in fact, this line of thinking will cause you operational pain. Let me tell you why.
In September we announced that VMware Tools 10.0.0 Released and that VMware is now shipping VMware tools outside of the vSphere releases. Since then, we have received a lot of feedback from the community, customers, and internal folks alike. I would like to let everyone know that we have listened and we continue on our path to make VMware Tools lifecycle (and ESXi lifecycle for that matter) easier and less painful than how it may appear today.
VMware is sponsoring Oracle Open World in San Francisco at the Mosconi Center starting Oct 26th. The following collateral will be of interest to all those attending OOW as well as any 2015-16 Oracle conference who are interested in VMware: