Home > Blogs > VMware vSphere Blog > Monthly Archives: June 2011

Monthly Archives: June 2011

Tips on Migrating to ESXi

Over the past couple months I've been on the road presenting on the topic of ESXi migration at several regional VMware User Group (VMUG) conferences.  When presenting on ESXi migration one question that always gets asked is "what are the gotchas and and what can I do to avoid them?".  Based on this I was planning to do put together a "Top 10" list of things to watch for when migrating to ESXi when I came across this great blog article by Rick Vanover at Virtualization Review titled Top 5 Tips for Migrating to ESXi.  It's a great blog and a must read for anyone getting ready to migrate to ESXi.  Enjoy the article and be sure to check back for more tips on making the transition to ESXi.

PowerCLI and SRM – what you need to know!

Hello all,

I did say some time ago that I would help get you started with PowerCLI in the SRM world.  I have taken a lot of time to get here but I will help today.  I had a lot of issues making it work – my own issues that were tough but the wonderful Alan Renouf of this blog and this one was a big help!

I suggest you review the SRM and Scripting blog I did previously for background info.

We have a number of things to do before you can execute a PowerCLI script during a failover.

PowerShell / PowerCLI

You need to have PowerCLI running on the SRM server.  Use the instructions here to do that and make sure it all works!  Make sure you test and confirm it works.

Since the PowerCLI scripts will need to execute in a security context that works for them, I suggest that you consider starting the SRM service as the user who you want to execute the scripts.  This is how I do it.  If you do, make the change and confirm that SRM is running afterwards.  You may need to do this on both SRM services if you do protection in both directions.  An alternative may be to store the necessary credentials in the PS script.

Script Repository

You need a place to keep your scripts.  I use G:\Scripts on my SRM server.  I found it to be best if they were local.

Testing

You should log onto the SRM console as the user who the service is running in and make sure your scripts work as you expect them too at the command line!

Command line for SRM

The command line you will add to the Command step is simple :

c:\windows\system32\windowspowershell\v1.0\powershell.exe -file g:\scripts\listofvms.ps1

If you need help with setting up or testing the execution of scripts you can check out this blog I wrote about it called SRM and Scripting which is here.

Remember to put the script in the right place.  Often that is in the Post Actions for the VM it may impact.  You can learn more about script placement here.

Testing your Script as part of SRM failover

Once your script is done, and stored in the right place, and you have the right command line in a call out, you are ready to do your test failover.  Once it is done make sure there is no errors in the History Report, and see if your script has done what it was supposed to.

Summary

You have just installed PowerCLI, configured SRM to execute a PowerCLI script and tested it.  This is just the start.  You now have the ability to have very powerful scripts executed when you need them executed in the failover workflows of SRM.  There are great examples of scripts that you can execute now in the VMware vSphere PowerCLI Reference book.

If you have questions or comments let me know in the comments section.  I appreciate all comments.  Below are some links that may be useful for you.

PS: I apologize for no screenshots.  My labs have all been upgraded to a new version of SRM (wait to you see my blog postings on July 12th!).  I will add in pictures in the future.  In the meantime if you have problems please leave me a comment.

Michael

PowerCLI help – http://www.simple-talk.com/sysadmin/virtualization/10-steps-to-kick-start-your-vmware-automation-with-powercli/

Can a script stop a recovery plan – /uptime/2010/08/can-a-script-or-message-call-out-stop-a-recovery-plan-and-a-little-bit-more.html

Getting Started with scripting and SRM – /uptime/2010/09/vmware-vcenter-site-recovery-manager-and-scripting-.html

PowerCLI community (samples and help) – http://communities.vmware.com/community/vmtn/developer

Failover of persistent desktops using SRM and View 4.6 – http://www.virtu-al.net/2011/06/07/failover-of-persistent-desktops-using-srm-and-view-4-6/

=== END ===

Going to Virtualize Oracle?

Below is an upcoming webcast on virtualizing Oracle from the customer perspective.The intent is to share lessons learned from two customers that have implemented Oracle on VMware and use EMC storage.

Click here for more information

N1k Load balancing options

By Hugo Strydom, Managing Consultant, Professional Services

I have been asked a few times on what the best option is to load balance with the N1k switch. In this post I want to give an overview of some of the load balancing option that is available on the N1k solution. We will look at different configuration options and when to use which option. From a VMware ESXi host licencing perspective you do need Enterprise Plus on the ESXi hosts that you want to use the N1k vDS.

The N1k supports two types of upstream connectivity namely :  

  • Standard uplinks
  • Port Channel

With standard uplinks the uplinks are not a member of any PortChannel configuration and provides no load balancing or failover capabilities.

With PortChannel there is two types of PortChannel mechanims to use:

  • Standard PortChannel
  • Special PortChannel

Standard PortChannel supports LACP and will act like a Etherchannel. The upstream switches must also be configured with LACP. This type of Portchannel can be configured over multiple physical switches that are part of a cluster. Some of the Cisco family of switches that support clustering is : Nexus 7000 and 5000.

Special PortChannel is used when the upstream physical switches do not support or use clustering. There are two type of configuration options that can be used:

  • MAC Pinning
  • vPC Host Mode

With MAC Pinning all the uplinks that have been defined with the N1k VEM will be defined as standalone links and will pin diffents MAC's to the uplinks based on a load balance algorithm. This will ensure that the there is no MAC Flapping on the upstream switches. Advantages of MAC pinning i staht there is not special configuration needed on the upstream switches.

vPC Host Mode will allow you to configure Portchannel on the N1k even if the upstream physicals switches cannot support PortChannel. Thus if a host have 4 pNIC's and 2 is connected to pSwitch A and 2 is connected to pSwitch B, two subgroup PortChannels will be created accross each of the pair's of pNic's that is attached to a pSwitch. Uplinks that connect to the same pSwitch will automaticly be paired into the same subgroup and used the CDP information from the upstream switch for information to create the subgroup PortChannel pairs.

Sources for this article:

More on Scripting with SRM – View desktops!

Hello all,

I wanted to draw your attention to a very interesting blog a co-worker has written about using PowerCLI and SRM to fail desktops over in a crisis.  This is a very interesting use of both SRM and PowerCLI.  Currently SRM cannot protect View desktops but Alan has shown how it can be done with an interesting design and use of PowerCLI.

Check it out here – http://www.virtu-al.net/2011/06/07/failover-of-persistent-desktops-using-srm-and-view-4-6/

I have in the past done several blogs that should help you out with scripting and SRM, and hopefully you will see more in the future. 

Getting started with SRM and scripting – /uptime/2010/09/vmware-vcenter-site-recovery-manager-and-scripting-.html

Can a script stop a recovery plan – /uptime/2010/08/can-a-script-or-message-call-out-stop-a-recovery-plan-and-a-little-bit-more.html

I think that PowerCLI is one of the best tools for scripting with SRM and you can get some help with that at http://www.simple-talk.com/sysadmin/virtualization/10-steps-to-kick-start-your-vmware-automation-with-powercli/ .

Have a great day!

Michael

Considerations for DMZ, iSCSI and Private vDS on same ESXi/Cluster

Before we start on this topic, a disclaimer here: This is not the only configuration that can be used for vDS switches that is used to connect to a DMZ, iSCSI and Public networks. This is merely a configuration that can be considered to be used. It has it merits and thus I am mentioning it here.The scenario that we are looking it is when a client are using iSCSI for storage (Software initiator), have a DMZ environment and also need to have a private network, all on the same ESXi and cluster infrastructure and can be used across multiple clusters. Also refer to this blog on vDS and vSwitch security. 

Configuration of ESXi Hardware and vCenter :

  • For this configuration we will use 3 vDS's.
  • 6 pNIC's will be used (2 per vDS)
  • Permissions will be given to vDS level and to port group level  

DMZ vDS configuration and settings

Permissions for creating and deleting vDS can only be applied at Datacenter Level. Ensure the correct people have the correct rights to create/delete/modify vDS settings.The following roles can be created and assigned to administrators of Network configuration of the DMZ vDS :

Port Group Admin    : dvPort group / Create, Delete, Modify, Policy operation, Scope operation      

 > Assign at Port Group level

vDS Admin              : vNetwork Distributed Switch / Create, Delete, Host operation, Modify,

                                Move, NIOC, Policy operation, Port configuration, Port settings operation,

                                VSPAN operation

>  Assign at Datacenter Level

Care should be taken to not allow users to change their VM network settings. Thus once a VM have been provisioned into a DMZ network, users should not have permissions to change vNIC port groups of the VM. Consider removing the following permissions :

                       : Network / Assign network, Configure, Move network, Remove

For the DMZ vDS port group Security Policies ensure that Promiscuous mode, Mac Address Changes and Forged Transmits is all set to "Reject".Teaming and failover can be set to use LBT and set all pNIC's to active. Also considerto enable NIOC. 

iSCSI vDS configuration and settings

One of the main reasons to have a separate vDS switch for the iSCSI network is to be able to set Jumbo Frames to 9000. This is done at vDS level and not at port group.The following roles can be created and assigned to administrators of iSCSI configuration of the iSCSI vDS :

Port Group Admin    : dvPort group / Create, Delete, Modify, Policy operation, Scope operation

                              > Assign at Port Group level of the iSCSI vDS

It is recommended to use a Separate Layer 2 Switch for the iSCSI traffic. Ensure that no Layer 3 routing takes place with any of the iSCSI vlans.Depending on the iSCSI hardware and configuration, you would have to create multiple vmkernel to allow for having multiple paths to the storage unit. These VMKernel's traffic can be load balanced across the pNIC's using LBT. Thus consider to enable LBT on the iSCSI port groups.

Public vDS configuration and settings

Since the vMotion and VMKernel (for MGMT traffic) port groups will be located on this vDS, consider to set "no access" permission for users that do not need to access this port groups. This will prevent accidental or intention placement of VM's into these MGMT port groups.

To enhance security, set vDS port group Security Policies. Ensure that Promiscuous mode, Mac Address Changes and Forged Transmits is all set to "Reject".There is only 2 pNIC's attached to this vDS switch, thus for optimal pNIC load balancing with traffic prioritization, enable Network IO Control on this vDS and enable LBT on each port group. 

Conclusion

The above is general guidelines that can be used to setup a vDS environment that is connected to a DMZ, iSCSI and Public networks. In addition host profiles can be use to create a consistent ESXi network configuration across all ESXi hosts in a cluster and to do compliance checking.

VDS Security

By Shudong Zhou, Sr. Staff Engineer, ESX Networking

From time to time, I got queries about VDS architecture, particularly about how vswitches are implemented in the kernel. Do different vswitches share the same code? What’s the object lay out? A few email exchanges later, it was clear that the questioners really wanted an assessment of the security risk of VDS, and they thought they could get it from a raw description of VDS implementation architecture!

You can get an idea of VDS security from experts whose job is to assess security risk. One reference is that VI3 earned Common Criteria EAL4+ Certification. The vswitch data plane was thoroughly tested and the code was reviewed as part of that process. Granted that VDS didn’t exist in VI3, but VDS data plane follows the same architecture as in ESX 3. More recently, my colleagues and I worked with CESG, a UK government organization, to evaluate the security of VDS in the upcoming release of vSphere. I received a number of comments, and the most serious one was adding parenthesis around a macro in the source code.

Another way to look at the security risk of VDS is via statistics. Assuming there are 20 million VDS ports (you can get more accurate numbers from Gartner reports) running for a period for a year. Since we never issued a security patch in the VDS area, as far as I know, your chance of running into a security issue with VDS is less than 1 in 20 Million port-years. Suppose you run 1000 VMs for a year, the security risk due to VDS would be less than 1 in 20,000. I know that ports are not independent, etc., but this is just a rough estimate. In contrast, human errors are more probable. I consider myself a decent developer. Over the last 4+ years, I made 775 code checkins, out of which 11 were backouts. The error rate is about 1 in 70. This is a lower limit since not all errors resulted in a backout. I’m not a security expert, but I know that a system is as secure as the weakest link. I hope I convinced you that VDS is not the weakest link.

Some time back, a customer wanted to connect separate physical networks to a cluster of ESX hosts. They had a choice of using N1K or VDS. With VDS, you create one VDS instance per physical network, thus different networks are managed separately. With N1K, you have to connect all networks to a single N1K instance. There was a debate on whether the VDS approach is more secure. Someone from Cisco posted a blog on why there is no difference, after a long winding lecture on software architecture. The author totally missed the point. The weakest link is human error, and the VDS approach provides less chance for human error.

VDS vs. Cisco N1K

By Shudong Zhou, Sr. Staff Engineer, ESX Networking

I often get questions about the difference between VMware Distributed Switch (VDS) and Cisco Nexus 1000v (N1K). At a high level, VDS presents an integrated model where both network and VM deployment are managed from a single place, while N1K caters to organizations having a separate networking group trained with Cisco CLI. What I really want to talk about is the implementation architecture.

VDS

VDS data plane is unique in that it is not a learning switch. Because all VMs are in software, the hypervisor knows the unicast and multicast addresses programmed into each virtual NIC, VDS makes use of the authoritative information to do packet forwarding. The beauty is in the simplicity. When a packet enters the data plane, if the destination mac matches a local port, it is forwarded there. Otherwise, it is sent to one of the uplinks. Since all forwarding decisions are local, the scalability is infinite. VDS can span as many hosts as the vCenter can manage and can span over long distances, assuming you don’t run into limitations in other parts of the virtualization stack.

The simple scheme has a few consequences. Because VDS is not learning, it cannot automatically pick the right uplink to send packets out. In the spirit of keeping it simple, we require that all uplinks connected to a VDS be connected to the same physical network (surprisingly, this isn’t documented anywhere). This way, it doesn’t matter which uplink packets are sent out of. If you have separate physical networks connected via different network adapters, you need to create one VDS for each physical network (different VLANs going through the same adapter don’t count as separate physical networks).

Another consequence is that VDS can handle duplicate vNic mac addresses. The data plane never complains about duplicate mac addresses, just forward packets to all matching ports. In fact, the implementations of VMware Lab Manager and vCloud Director Networking Infrastructure take advantage of this.

VDS as a product is far more complex than what I described above. You can find some details of VDS implementation in a paper I wrote in the Dec. 2010 issue of OS Review. Unfortunately, the site asks you to buy the paper. I thought about the posting a copy myself, but I’m not really sure about the copyright and legal stuff.

N1K

N1K is a hybrid of two implementations, one in software and one in hardware. When we started on VDS four years ago, Cisco formed a new group to implement a software switch in ESX. The project code name is Swordfish. Since the migration of access ports into hypervisor was inevitable, Cisco might as well claim a piece the territory. Later on, Nouva approached us about the VN-Tag technology. The idea is more radical: all VM traffic is sent out to physical switch with a tag identifying the vNic port and the physical switch will do all the VM-to-VM packet forwarding. Effectively, the technology moves access ports back into the physical switch. To make it happen, Nouva needed our help to get the traffic out of the hypervisor. When Nouva was merged into Cisco, the two teams were merged into the same Cisco BU and the two switching schemes were merged into a single product: Cisco Nexus 1000v. When N1K is installed on UCS systems with Palo adapter, the hardware switching module takes effect. Otherwise, the software switching module is activated.

Swordfish

I don’t see the Cisco switching code, so I can’t offer more insight than what’s publicly available. Swordfish is a learning switch. There is a controller which can be installed as a VM or purchased in a hardware box (Nexus 1010). The controller must be up for data plane to function, so you should deploy dual controllers to avoid a single point of failure. Swordfish provides a rich set of features commonly available in Cisco hardware switches, richer than VDS.

Having a central controller provides some deployment flexibility. For example, you can enable PVLAN within N1K without any physical switch support. The PVLAN feature in VDS, in contrast, requires the same PVLAN map to be configured in the physical switches. On the other hand, the central controller can be a liability when it comes to scalability. The current N1K limit is 64 hosts. Spanning N1K over long distance could be a challenge as well.

VN-Tag

VN-Tag is a great technology. When coupled with passthrough, it takes virtual switch completely out of the picture. However, hardware VN-Tag will cost more per VM since a virtual port costs only a small amount of physical memory. Furthermore, passthrough requires guest VM memory to be locked, killing memory over commit and consolidation ratio. So I think VN-Tag is a niche technology at best. It might make sense to run VN-Tag alongside Swordfish, where only I/O intensive workloads are put on VN-Tag.

Moving forward, the question is which implementation provides a better foundation for multi-tenancy and scalability in the cloud environment. Only time will tell.

Virtual Networking Observations

By Shudong Zhou, Sr. Staff Engineer, ESX Networking

This is the first of several posts about VMware virtual networking infrastructure. The goal is to help readers gain a deeper technical understanding and appreciation of the VMware offering. This is my personal view and does not represent VMware’s official position.

I joined VMware in Oct. 2006, around the time when ESX 3.5 was about to be released. While VMware was growing rapidly at the time, Hyper-V was looming in the horizon. On the networking front, 10G NICs were ready to enter the market and ESX was facing resistance from enterprise networking administrators. I joined the effort to tackle the issue of network management.

Virtual Switch

ESX introduced a virtual switch in ESX 2 to provide efficient traffic forwarding between VMs inside ESX hosts and to provide redundant connections to the physical network. The virtual switch was not popular among network admins for several reasons:

  1. It’s a foreign concept introduced by a company not known for networking.
  2. The design requires physical switch ports be configured in trunked (VLAN) mode. This was scary to many (and still scary to some admins today).
  3. All the safety valves applied to access ports are not longer available. Suppose you set up a list of allowed macs, you would need to enter the mac address of all VMs that can potentially be running on a host. This is practically impossible, particularly when VMs can migrate between hosts dynamically.

Distributed Switch and Cisco N1K

We decided to do two things to address the challenges. One was to bring access switch features into the virtual switch, the other was to make a Cisco branded virtual switch available in ESX. In designing the new beast, we made the following choices,

  • Since a VM can vMotion from one ESX host to another, access port features must move with the VM. This means ports should exist independent of ESX hosts. The obvious choice is to give each port its own identity (dvPort ID) and label the new switch “distributed”.
  • We chose a centralized approach to implementation. The distributed switch is created and managed from vCenter. Information is pushed to ESX hosts on demand as VMs are deployed and moved around. Once deployed, the data plane (packet forwarding) works independently of vCenter.
  • There would be two implementations of the distributed switch, one provided by VMware and one by Cisco. The VMware implementation was called Distributed Virtual Switch (DVS) during development, renamed vNetwork Distributed Switch (vDS) when released in vSphere 4.0, the current name is VMware Distributed Switch (VDS).

Our effort clearly paid off as VDS and Cisco N1K are now widely accepted by the industry. The concept of “distributed” switch is corroborated by Citrix with the release of the Citrix Distributed Virtual Switch late in 2010. In the meantime, my colleagues at VMware did a tremendous job in achieving 10G line rate with TSO, Jumbo Frame, and other offload technologies. And this was done without requiring passthrough or TOE! VMware survived Hyper-V and other competition.

Moving forward, we are looking to bring other network services currently in the physical network into the hypervisor. We are also looking to take advantage of 40G and 100G. If you like what we are doing and would like to join the effort, see job opportunities here.

 

vDS config location and HA

By Hugo Strydom, Managing Consultant, Professional Services

Have you ever wondered what those .dvsData folder is on some of your datastores ? Well in this article I will give some insight to them and what they are used for.

Firstly some information around a vDS switch. Each vDS switch has a UUID in the format : “6a de 0e 50 80 32 76 68-e0 72 fd 00 c3 23 52 92″ (as an example). When you look inside a datastore you will sometimes (Will explain just now on when) a .dvsData folder. If you look in there you will see 1 or more of these numbers. Each number correspond to a vDS switch that is used by the ESXi host.

So some rules around these files :

  • The .dvsData folder is only created if you have a VM on that datastore that is attached to a vDS. Thus if there is only VM’s that is attached to a vSwitch on a datastore there will not be a .dvsData folder. Also only the datastore that holds the .vmx config file will have the .dvsData folder.
  • Inside the .dvsData folder there is a UUID number for each vDS switch.
  • Inside the UUID folder is a smaller file that is a number. This number corresponds to the ethernetx.dvs.portId inside the .vmx file of a VM. Below we can see that ethernet0.dvs.portId=10.

     


     

    ethernet0.dvs.switchId = “6a de 0e 50 80 32 76 68-e0 72 fd 00 c3 23 52 92

    ethernet0.dvs.portId = “10″

    ethernet0.dvs.portgroupId = “dvportgroup-208″

    ethernet0.dvs.connectionId = “177280995″

Now why is this file (“10″) important ? It is needed by HA to restart the VM on another host. Certain information (Port state, MTU, run time packets stats) must be transferred to the new host when starting up the VM and this file has that information. In a vMotion of a VM this information is transferred as part of the copy to the other host.

There is some config files that also important:

  • /etc/vmware/ dvsdata.db
    • Updated by hostd every 5 min
    • Contains data for persistent vdPorts (vmkernel)
  • The smaller files (“10″ in our case)
    • Updated with vdPort information every 5 min by hostd
    • Contains data for the VM’s vNIC’s that is attached to the vDS