Home Page Technical VCF Compute (vSphere)

Scaling vCenter Server Connections for Improved Resiliency

In the latest version of VMware vSphere, we introduced a significant enhancement that improves the resilience and availability of our core services such as vCenter server. In this article, I aim to bring these changes into the spotlight and explain how they improve the overall performance and availability of the VMware vCenter Server Appliance. By diving into the details, you’ll see how these updates help vCenter handle more connections efficiently and avoid performance bottlenecks, ultimately enhancing the user experience.

Background

In vCenter, Envoy acts as a front proxy, handling incoming requests on ports 443 and 80. It is the first service users interact with when sending a request to vCenter (VC). Envoy includes built-in connection limits to protect VC from overloads and attacks. However, when these limits are reached, vCenter may appear unavailable. Let’s break down the changes made to address this issue in VMware vCenter Server Appliance 8.0u3 available also in VCF 5.2.

How it works before vSphere 8.0 u3(VCF 5.2)

  1. Connection Limits
    Envoy previously enforced four types of connection limits:
    • 2048 external HTTP connections
    • 2048 local HTTP connections
    • 2048 external HTTPS connections
    • 2048 local HTTPS connections
  2. In total, Envoy could manage 8000 connections, but they were segregated by type (local/external and HTTP/HTTPS). If, for example, all 2048 HTTPS connections were in use but there were available HTTP connections, new HTTPS requests would fail, even if HTTP requests could still be processed.
  3. Uninformative Error Messages
    When the connection limit was reached, users experienced what appeared to be a standard SSL error, with no clear indication of the actual issue. The only way to diagnose the problem was to SSH into vCenter and examine Envoy logs, specifically looking for “remote https connections exceeding the max allowed.”
  4. No Early Warning
    There was no system in place to warn users when the connection limit was nearing capacity. This made it difficult to detect and prevent connection issues before they occurred.
  5. Long Idle Timeout
    Envoy had an idle timeout of 8 hours. This meant that idle connections could remain open for extended periods, blocking new connections and consuming resources unnecessarily.

How it works now after vSphere 8.0u3 (VCF 5.2)

Shared Connection Pool
A major improvement in VMware vCenter Server Appliance 8.0u3 (VCF 5.2) is the creation of a shared connection pool for HTTP and HTTPS connections. This allows unused HTTP connection slots to be allocated to HTTPS requests, providing more flexibility and preventing unnecessary connection failures. Below is the detailed list with enhancement introduced in VMware vCenter Server Appliance 8.0u3(VCF 5.2):

  1. With the new update, HTTP and HTTPS connections now share the same pool, so you can make more HTTPS requests if fewer HTTP connections are in use, if the overall connection count remains below 8000.
  2. Progressive Idle Timeout Reduction
    When the connection count reaches 50% of the limit, Envoy starts to reduce the idle timeout. Initially, the timeout is 8 hours (28,800 seconds), but it decreases progressively as the connection limit approaches, reaching as low as 2 seconds. This prevents idle connections from accumulating and ensures resources are freed up for active requests.
  3. Connection Draining at 80% Capacity
    Once 80% of the connection limit is reached, Envoy begins connection draining. For HTTP/2, it sends a GOAWAY signal, allowing existing connections to gracefully close, while for HTTP/1, it sets a drain timer to close less recently used idle connections.
  4. Request Denial at 99% Capacity
    When 99% of the connection limit is reached, Envoy stops accepting new requests and starts returning 503 overloaded responses. This response is accompanied by the x-envoy-local-overloaded header, allowing users to identify the overload issue without needing to SSH into the system or check logs.

Conclusion

The changes introduced in VMware vCenter Server Appliance 8.0u3(VCF 5.2) significantly improve how Envoy manages connections to vCenter. By introducing a shared connection pool, reducing idle timeout dynamically, and implementing early connection draining and overload responses, these updates provide greater resilience and flexibility. Users now have a better chance of maintaining service availability and avoiding disruptions caused by hitting connection limits.