By Girish Manmadkar
I recently worked with an enterprise customer to resolve end user reports of performance issues related to Microsoft SharePoint 2010 and FAST Search deployed on vSphere 5.1. The end users were reporting problems with initial page response and file upload and download. The customer requested architecture guidance, including a performance health check across the entire infrastructure stacks. The result of this engagement is the following architectural guidance, designed to help customers with similar deployments achieve maximum performance for Microsoft FAST Search on the VMware platform.
Specifics
The customer deployed the SharePoint FAST Search Farm with the following key components:
Software Resources
- VMware vSphere 5.1 Update 2
- Windows 2008 R2
- SharePoint 2010
- Microsoft SQL server 2008 protected with MSCS in 3 node cluster
Hardware (Virtual) Resources
Role |
RAM |
Local Disk |
#CPU |
NIC |
Total VMs |
Total #CPU |
Total Mem (GB) |
SQL |
32 |
C: 80 |
4 |
2 |
3 |
|
|
E: 100 |
12 |
96 |
|||||
WebFront End |
8 |
C: 80 |
2 |
2 |
5 |
|
|
E: 50 |
10 |
40 |
|||||
Application |
16 |
C: 80 |
4 |
2 |
4 |
|
|
E: 50 GB |
16 |
64 |
|||||
Services |
16 |
C: 80 |
4 |
2 |
2 |
|
|
E: 50 |
8 |
32 |
|||||
Fast |
16 |
C: 80 |
4 |
2 |
1 |
|
|
E: 50 |
4 |
16 |
|||||
Query |
16 |
C: 80 |
4 |
2 |
5 |
|
|
E: 50 |
20 |
80 |
Allocated Total Memory = 328 Gig
Allocated Total vCPU = 70
Discovery
During discussions and white board sessions with the customer, we encountered following issues with the deployment:
- Storage
- The virtual machines running query and index services were sharing the LUN and the data stores.
- Thin provisioning was being deployed at the vSphere and EMC storage array layer.
- The RDMs used for the SQL server MSCS environment were configured with incorrect (MRU/fixed) multi-pathing options.
- Virtual machines had no lock pages for SQL and no memory reservations.
- Various SQL server databases were being deployed as shared SQL instances for the entire FAST Search environment.
- The networking configurations were set incorrectly for certain SCSI adapters.
- Typical traffic within the guest operating systems, VMotion, and backup were not channeled properly.
- There were no anti-affinity rules in place for the application servers within the vSphere farm.
- The CPU subscriptions across the overall farm seemed unbalanced.
Approach/Recommendations
Throughout a series of discussions we learned more about the architecture and identified the following steps to improve performance:
- Reconfigure multi-pathing per EMC’s recommendations for vSphere5.1 to round robin. (This change showed immediate performance improvement.)
- Enable memory reservations with “Lock Pages in Memory” for SQL workloads.
- For a write-intensive application like FAST Search, use four (4) vSCSI controllers to separate volumes for operating systems, binaries, data, LOG and TEMPDB disks with window full format option to avoid additional write penalty.
- Absolutely avoid CPU over commitment in the production environment.
- Adopt best practices on vSphere to separate various networking traffic, including dedicated backup, which in this case was previously sharing VM traffic.
Conclusion
For any business-critical application to run with optimum performance, you must put performance ahead of consolidation and avoid over commitment of CPU and memory. Once you implement these principals for the production environment, any performance issues for business-critical applications on vSphere will be alleviated.
Girish Manmadkar is a veteran VMware SAP Virtualization Architect with extensive knowledge and hands-on experience with various SAP and VMware products, including various databases. He focuses on SAP migrations, architecture designs, and implementation, including disaster recovery.