Running a virtual environment as compared to physical ones has become the norm lately and upgrading to new technologies such as hyper-convergence and containerization have become the next step in that evolution.
Any organization that took the step to virtualize made some gains in cost (Electricity, Cooling, Hardware), flexibility and overall ability to keep things running. The days of running one application per physical server seem to be long gone and it seems just right.
VMware was one of the pioneers in that field and with their vSphere suite, their strategic partnerships with Dell, and Tanzu, they retain their sit at the table of the major enablers of virtualization and containerization in the enterprise world. Building a private or a hybrid cloud is always a lot of work and the steps to get there using VMware always pass through setting up the de-facto management system, namely vSphere vCenter.
Over the years, vCenter has served us right (moving from a server/client app to a web-accessed app) but with high availability becoming more important, it became necessary to be able to retain the availability of this management system even during major downtime. This has become increasingly important knowing that so many other systems now tap into vCenter (NSX, vCloud, 3rd party systems by HPE, Dell, Veeam, etc.)
How to ensure your vCenter remains available even if you lose your primary infrastructure completely is the subject of this article. The acronym is vCHA which stands for vCenter High Availability.
vCenter High Availability (vCHA)
vCenter High Availability (vCHA) is a method by which vCenter is able to exist at two locations simultaneously in an active-passive topology. This provides a significant downtime reduction when you lose your primary vCenter unexpectedly.
In the event of a hardware or software failure, an automated failover from the active vCenter to the secondary one will ensure your associated services and your central management are still available. Consider the diagram below:
Figure 1 – vCenter High Availability Architecture
In order to provide high availability to your vCenter system, in addition to your active and passive nodes, typically running in two different DCs or clusters, you require a witness node. The job of the witness node is to protect against split-brain scenarios and to kind of arbitrate the failover activity.
Active Node actively serves all the client requests and the appliance that faces the end users. It maintains constant communication with the passive node and uses it as a target for the continuous vSphere database replication. It also communicates with the witness node to telling it about its ability to deliver services. All this traffic is sent through a dedicated vCenter HA network.
Passive Node is initially a clone of the active vCenter. It then continuously syncs with the active node over the vCenter HA network. Should any failure occur, the passive nodes automatically take over the role of active vCenter.
Witness Node is a lightweight clone of the active node that provides a quorum to the cluster.
The specific scenarios in which your vCHA will become handy are listed below:
- General hardware failure affecting the primary vCenter
- Complete or partial network failure or isolation of the primary vCenter
- Storage or Datastore loss that causes your primary vCenter to go down
- Partial or complete vCenter Server application or service failure
- General Operating Systems failure
As vCHA is relatively new, there are various considerations one must take into account when planning to provide High-Availability for vCenter.
- ESXi version 6.0 or above with vCenter 6.5 or above
- All three nodes must run on different clusters (or hosts) with anti-affinity rules ensuring they remain separate no matter what
- Datastore separation to ensure a datastore failure does not affect both passive and active nodes simultaneously.
- vCenter deployment as “small” or above since “tiny” is unsupported
- vCenter HA network must be separate from the general management subnet
- The latency between the nodes must be 10ms or less.
It is important to note that vCenter HA does not require you to purchase an additional vCenter license; your current (single) vCenter license is sufficient.
Patching and Upgrades
For now, upgrading a vCHA environment requires you to delete or remove the witness and passive node, upgrade the primary vCenter and then redeploy the vCHA environment.
You may go through the KB here for more information on the procedure.
In a production environment, you must always be on the lookout to maintain availability when unforeseen failures hit the environment. Our dependency on vCenter has increased greatly and the ability to maintain the single pane of glass management it provides during failures and major maintenance exercises is certainly great.
It is still a relatively new technology with less flexibility than we would like but we believe it is a good start. VMware will keep developing it and eventually, it will become as smooth as we would expect it to be.