vSphere 6.0 introduces a powerful new feature as part of vSphere HA called VM Component Protection (VMCP). VMCP protects virtual machines from storage related events, specifically Permanent Device Loss (PDL) and All Paths Down (APD) incidents.
The new vSphere Big Data Extensions Version 2.2 shipped on the 5th June 2015!
Here is a quick summary of the new features that appear in the 2.2 release. This is an exciting and much-awaited release. As always, refer to the technical documents and the release notes to get more detail on these subjects.
• Support for the Latest Hadoop Distributions. BDE 2.2 supports the latest versions from the major Hadoop distribution vendors, including Bigtop 0.8, Cloudera CDH 5.4, Hortonworks HDP 2.2, MapR 4.1, and Pivotal PHD 3.0.
• Better Fully Qualified Domain Name (FQDN) Management. We found that some users had difficulty with generating FQDNs within their network for newly cloned virtual machines. BDE can now generate and propagate meaningful host names in FQDN form for your new virtual machines that host the Hadoop nodes. The new FQDNs will be registered to a DNS server if you are using a Dynamic DNS server.
• Shrink clusters. You can now reduce (as well as expand) the number of worker virtual machines that belong to a running Hadoop cluster in an easy way. The virtual machines targeted for shrinking will be quiesced, withdrawn from the Hadoop cluster and then deleted to release any resources that they used completely.
• Active Directory/Lightweight Directory Access Protocol (AD/LDAP) integration. You can use an AD/LDAP server to manage the accounts generated by BDE within the Hadoop nodes . You can specify the accounts to be Hadoop users accounts and/or service accounts in an AD/LDAP server.
• vSphere 6.0 Instant Clone. BDE will, at the user's request, use Instant Clone technology to spin up new Hadoop VMs. This feature reduces the time of spinning up Hadoop VMs and the runtime footprint. This is an optional way to do this. You can choose to use the older "full clone" method also if you prefer to. We recommend that you use this new type of cloning for your test and development workloads to begin with.
• Centralised logging. You can configure BDE to direct logging information to an external syslog server including LogInsight.
• Quiesce the BDE management server. You can quiesce BDE management server with a command so that you can backup BDE management server's data for your clusters safely.
• Automatic GUI installation. BDE GUI is automatically registered to the vCenter after BDE is deployed.
• Support for the Latest Partner Hadoop Management Tools. BDE 2.2 supports Cloudera Manager 5.3, and Ambari 1.7. You have more flexibility to deploy Hadoop clusters, including a compute-only cluster,a HBase-only cluster,a data-compute separated cluster etc. even when using a Partner Hadoop Management tool.
• Support for the Latest Isilon Version. Fully automated process to deploy and manage compute only clusters on OneFS 7.2.
• Big Data Extensions Upgrade. You can upgrade Big Data Extensions 2.1 to the current version, Big Data Extensions 2.2, and preserve all the data for the Hadoop clusters that were created using Big Data Extensions 2.1. All of your existing clusters can be managed by Big Data Extensions once the upgrade completes.
• Localization. BDE is localized to 6 languages including DE, FR, ZH_CN, ZH_TW, KO, and JA.
Not yet on vSphere 6? Join us for a webcast to learn why you should be. Starting June 2nd, 2015 and recurring every other Tuesday at 9AM, join the vSphere product experts to learn what’s new and exciting about vSphere 6! A different topic will be covered each session and time will be allocated at the end of each webcast for Q&A.
Please always check the latest schedule each week as topics may change and sessions may be added or removed.
Since the introduction of Distributed Resource Scheduler (DRS) almost 10 years ago, it has become the most trusted way to ensure virtual machines are running at their peak performance. Over 80% of customers that have introduced DRS use it in fully automated mode, which allows for automatic placement and rebalance operations that simplifies capacity planning and administrative overhead.
There seems to be quite a bit of inaccurate information floating around recently about vSphere DRS. The most common thing I hear is that “DRS is focused on balancing hosts in the cluster, and is not focused on workload performance.” Actually, nothing could be further from the truth, and hopefully this will help explain how DRS is working to keep your VMs performing optimally.
vCenter Server has become a mission critical part of most virtual infrastructures. It can be a single point of failure if it is not designed for availability. vCenter Server 6 has many changes relating to vCenter Server and its components and careful consideration has to be made in the design of its architecture.
There are multiple solutions for high availability. Many of these options can be combined to provide different levels of availability. vSphere HA, FT, vCenter Watchdog services and in guest clustering solutions can be combined depending on customer requirements for availability.
The Platform Services Controller (PSC) serves many VMware solutions in addition to vCenter Server such as VROPS, View, etc. The PSC deployment modes have to be carefully evaluated based on unique customer requirements and architected appropriately as well.
The VMware vCenter Server 6.0 Availability Guide is a great resource for architecting a HA solution for vCenter Server. I hope you find it useful!
Project Lightwave Now Available
Today, we are happy to announce that Project Lightwave, an identity and access management project for cloud-native apps, has been released as a free, open source project and is now available via GitHub and JFrog Bintray. Project Lightwave was originally introduced last month (read the news release).
What is Project Lightwave?
Project Lightwave is made up of the following key identity infrastructure elements:
- Lightwave Directory Service - standards based, multi-tenant, multi-master, highly scalable LDAP v3 directory service enables an enterprise’s infrastructure to be used by the most-demanding applications as well as by multiple teams.
- Lightwave Certificate Authority - directory integrated certificate authority helps to simplify certificate-based operations and key management across the infrastructure.
- Lightwave Certificate Store - endpoint certificate store to store certificate credentials.
- Lightwave Authentication Services - cloud authentication services with support for Kerberos, OAuth 2.0/OpenID Connect, SAML and WSTrust enable interoperability with other standards-based technologies in the data center.
When paired with Project Photon, VMware’s lightweight Linux operating system for cloud-native apps, Project Lightwave helps to assure that only authorized objects can run in the infrastructure.
I believe most individuals know proper DNS configuration is essential to a smooth operating VMware environment - or pretty much any environment, for that matter. However, there are a few cases where certain components must be deployed to an environment that does not have DNS servers. I had a question about this specific to vSphere Replication so I decided to do some testing. My test environment consists of vCenter Server 6.0 running on Windows Server 2012 R2 in a virtual machine, a couple of local vSphere 6.0 hosts, and another vCenter Server 6.0 environment about 800 miles away from the local environment. I deployed a vSphere Replication 6.0 virtual appliance to the environment and removed the DNS server entries. It did not take long to see warnings and error messages in the UI.
Two new white papers are now available on the work done at Adobe on virtualizing Hadoop. The VMware-authored paper, Adobe Deploys Hadoop as a Service on VMware vSphere, focuses on the business background and justifications for virtualizing the workload. It also talks about implementing Hadoop-as-a-Service by the central Technical Operations function to satisfy the needs of the business units and data analysis groups that require Hadoop as a platform. This paper also gives details about the use of the vSphere Big Data Extensions tool which was used heavily in the project, as well as the connection to vRealize Automation that forms the basis for the cloud offering at Adobe.
The second, complementary white paper, on the same architecture, Virtualizing Hadoop in Large-Scale Infrastructures, was written by the EMC consulting team that supported the project. The EMC paper, with the title "Virtualizing Hadoop in Large-Scale Infrastructures", focuses on the technical reference architecture for the Proof-of-Concept conducted in late 2014, the results of that POC, the performance tuning work and the physical topology that was deployed using Isilon storage. The two papers were written in concert by the organizations and should be read together for a full picture of the Hadoop virtualization project. This system is now live at Adobe Digital Marketing, hosted on their Virtual Private Cloud and it is being used by different groups within the big data community there. The papers together provide an outline reference architecture for use in other installations also. Watch this space, there are more technical case studies in the works.
Speaking of technical reference material for Hadoop on vSphere, here is the current list of technical papers and websites that are now available for people to learn more about this particular subject - for your reference:
Big Data/Hadoop on VMware vSphere - Reference Materials
- Virtualizing Hadoop - a Deployment Guide
- Deploying Virtualized Cloudera CDH on vSphere using Isilon Storage - Technical Guide from EMC/Isilon or find the latest version at https://community.emc.com/docs/DOC-26892
- Deploying Virtualized Hortonworks HDP on vSphere using Isilon Storage - Technical Guide from EMC/Isilon or as above https://community.emc.com/docs/DOC-26892
- Cloudera Reference Architecture - Isilon version
- Cloudera Reference Architecture – Direct Attached Storage version
- Big Data with Cisco UCS and EMC Isilon: Building a 60 Node Hadoop Cluster (using Cloudera)
- Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure (Intel, Dell and VMware)
Customer Case Studies
- Adobe Deploys Hadoop-as-a-Service on VMware vSphere
- Virtualizing Hadoop in Large-Scale Infrastructures – technical white paper by EMC
There are some very useful best practices in the first two technical papers.
- Virtualized Hadoop Performance with VMware vSphere® 6 on High-Performance Servers
- Virtualized Hadoop Performance with VMware vSphere 5.1
- A Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5
- Transaction Processing Council – TPCx-HS Benchmark Results (Cloudera on VMware performance, submitted by Dell)
- ESG Lab Review: VCE vBlock /systems with EMC Isilon for Enterprise Hadoop
vSphere Big Data Extensions (BDE)
- VMware BDE Documentation site
- VMware vSphere Big Data Extensions - Administrator's and User's Guide and Command Line Interface User's Guide
- Blog articles on BDE Version 2.1 - See the embedded Blogs from the Hadoop distro vendors also.
- VMware Big Data Extensions (BDE) Community Discussion
- Apache Hadoop Storage Provisioning Using VMware vSphere Big Data Extensions
- Hadoop Virtualization Extensions (HVE)
- Demos of Big Data Extensions
Other vSphere Features and Big Data
When designing vCenter Site Recovery Manager environments the question of how to organize Protection Groups (PG) frequently comes up. In this post we'll review what a protection group is, where it fits in the context of SRM and the factors to keep in mind when deciding how to organize them.