When I first heard about VMware Mirage in 2012—when Wanova was acquired by VMware and Mirage was integrated into our portfolio—it was seen more as a backup solution for desktops, or a tool for migrating from Windows XP to Windows 7. And, with an extension, it was possible to easily migrate a physical desktop to a virtual one, so most of the time when we had to design a Mirage solution, the question of DRP or HA came up as, “Why backup a backup solution?” Mirage was not seen as a strategic tool.
This has changed, and VMware Mirage is now totally integrated as an Extended UNIX Code tool to manage user desktops through different use cases. Of course, we still have backup and migration use cases, but we also have more and more customers who are using it to ensure desktops conform to IT rules and policies to ensure infrastructure reliability for Mirage. In this post we’ll describe how to design a reliable infrastructure, or at least give the key points for different scenarios.
Let’s first have a look at the different components of a Mirage infrastructure:
Figure 1 – Basic VMware Mirage Components
- Microsoft SQL Database—The MS SQL database contains all the configurations and settings of the Mirage infrastructure. This component is critical; if the Microsoft DB fails, then all Mirage transactions and services—Mirage Management Server service and Mirage Server service—
- SMB Shared Volumes—These could be any combination of NAS, Windows Server, desktop files, apps, base layers, or USMT files—all stored on theses volumes (except small files and meta-data.)
- Mirage Management Server—This is used to manage the Mirage infrastructure, but also acts as a MongoDB server instance on Mirage V5.4 and beyond. If it fails, administration is not possible until a new one is installed, but there’s no way to recover desktops since small files stored in the MongoDB are no longer available.
- Mirage Server—This is used by Mirage clients to connect into. Often, many Mirage servers are installed and placed behind load-balancers to provide redundancy and scalability.
- Web Management—A standard Web server service can be used to manage Mirage using a Web interface instead of the Mirage Management Console. The installation is quite simple and does not require extra configuration, but note that data is not stored on the Web management server.
- File Portal—Similar to Web management above, it is a standard Web server service used by end users to retrieve their files using a Web interface, and again, data is not stored on the file portal server.
- Mirage Gateway—This is used by end users to connect to Mirage infrastructure from an external network.
Now, let’s take a look at the different components of VMware Mirage and see which components can be easily configured for a reliable and redundant solution:
- Mirage Management Server—This is straightforward, and actually mandatory, because with MongoDB, we need to install at least one more management server, and the MongoDB will synchronize automatically. The last point is to use a VIP on a load-balancer to connect to, and to route traffic to any available management server. The maximum number of Mirage management servers is seven due to MongoDB restrictions. Keep in mind that more than two members can reduce performance as you must wait for acknowledgement from all members for each writing operation to the database. The recommended number of management servers is two.
- Mirage Server—By default we install at least two Mirage servers or more; one Mirage server per 1,000 centralized virtual desktops (CVDs) or 1,500 (depending on the hardware configuration), plus one for redundancy and use load-balancers to route client traffic to any available Mirage server.
- Web Management and File Portal—Since these are just Web applications installed over Microsoft IIS servers, we can deploy them on two or more Web servers and use load-balancers in order to provide the required redundancy.
- Mirage Gateway—This is an appliance and is the same as the previous component; we just have to deploy a new appliance and configure load-balancers in front of them. Like the Mirage server, there is a limitation concerning the number of connections per Mirage gateway, so do not exceed one appliance per 3,000 endpoints, and add one for resiliency.
Note: Most components can be used with a load-balancer in order to get the best performance and prevent issues like frequent disconnection, so it is recommended to the set load-balancer to support the following:
- Two TCP connections per endpoint, and up to 40,000 TCP connections for each Mirage cluster
- Change MSS in FastL4 protocol (F5) from 1460 to 900
- Increase timeout from five minutes to six hours
Basically, all Mirage components can be easily deployed in a redundant way, but they rely on two other external components, both of which are key: the Microsoft SQL database and the SMB shared volumes, both of which work jointly. This means we have to pay special attention to which scenario is privileged:
- Simple backup
- Database continuity
- Or full disaster recovery
The level of effort required is not the same and depends on the RPO/RTO required.
So let’s have a look on the different scenarios available:
- Backup and Restore—This solution consists of performing a backup and restore of both Microsoft SQL database and storage volumes in case a major issue occurs on either component. This solution seems relatively simple to implement and looks inexpensive as well. It could be implemented if the attending RPO/RTO is not high. In this case, you have a few hours to restore the service, and there is no need to restore data that has been recently backed up. Restoring lost data backed up in the last couple of hours is automatic and quick. Remember, even if you lose your Mirage storage, all data is still available on the end-users’ desktop; it will just take time to centralize them again. However, this is not an appropriate scenario for large infrastructures with thousands of CVDs as it can take months to re-centralize all the desktops. If you want to use this solution, make sure that both the Microsoft SQL database and the SMB volumes are backed up at the same time. Basically, this means stopping Mirage services, performing a backup of the database using SQL Manager to get a snapshot of the storage volumes, and stopping MongoDB from backing up files. In case of failure, you have to stop Mirage (if it has not already done that by itself) and restore the last database backup and revert to the latest snapshot on the storage side. Keep in mind you must follow this sequence: first, stop all mirage services, and then the MongoDB services.
- Protect Microsoft SQL Database—Some customers are more focused on keeping the database intact, and this implies using Microsoft SQL clustering. However, VMware Mirage does not use ODBC connections, so it is not aware of having to move to a different Microsoft SQL instance if the main one has failed. The solution resides in using Microsoft SQL AlwaysOn technology, which is a combination of the Microsoft SQL clustering and the Microsoft failover cluster. It provides synchronization between “non-shared” volumes among nodes, but is also a virtual IP and virtual network name that will move to the remaining node in case of disaster, or during a maintenance period.
- Full Disaster Recovery/Multisite Scenario—This last scenario concerns customers who require a full disaster recovery scenario between two data centers with a high level of RPO/RTO. All components are duplicated at each data center with load-balancers to route traffic to a Mirage management server, Mirage server, or Web management/File portal IIS server. This implies using the second scenario in order to provide Microsoft SQL high availability, and also to perform a synchronous replication between two storage nodes. Be aware that synchronous replication can highly affect storage controller performance. While this is the most expensive of the scenarios since it requires extra licenses, it is the fastest way to recover from a disaster. An intermediate scenario could be to have two Mirage management servers (one per data center), but to shut down Mirage services, and replicate SQL database and storage volumes during the weekend.
Figure 2 – E.g. Multi-Site Mirage Infrastructure
For scenario two and three, the installation and configuration of Microsoft SQL AlwaysOn in a VMware Mirage infrastructure is explained further in the white paper.
Eric Monjoin joined VMware France in 2009 as PSO Senior Consultant after spending 15 years at IBM as a Certified IT Specialist. Passionate for new challenges and technology, Eric has been a key leader in the VMware EUC practice in France. Recently, Eric has moved to the VMware Professional Services Engineering organization as Technical Solutions Architect. Eric is certified VCP6-DT, VCAP-DTA and VCAP-DTD and was awarded vExpert for the 4th consecutive year.