VMware Cloud Community
--Norton--
Enthusiast
Enthusiast

Load Balanced Cells. Are they really though??

I have implemented load balanced cells on a couple of cloud solutions now but I cant help thinking that load balanced is the wrong word to use.

Redundant yes but load balanced?

Whilst it is true that the network load balancer can balance the network requests across the 2 or more cells from what I've read only one of those cells can actually utilise vCenter and the other cells mearly off load there requests via it at intervals?

This was seen on one of my environments when requests were being made to create new machines and delete them via the VCD interface but a 5 min (300 second) wait time was seen between each set of 2 tasks in vCenter.

What was happening was that the requests were going via the second cell server. The cell server that was NOT listed as the proxy cell server within VCD.

What would then happen would be the requests would be released 2 at a time to vCenter every 300 Seconds.

I finally managed to find some documentation and go through the logs on the 2 cell servers and discovered that the proxy listed cell server is the only one that actually talks to vCenter. The others release requests via it at a time delay.

The only way we could prevent this from happening was to reboot the cell listed as proxy which then failed the role over and the responses would pick up again.

Can anyone confirm this?

2 Replies
IamTHEvilONE
Immortal
Immortal

Let's take a look at two points of view.  Clients to vCloud and vCloud to vCenter.

Clients to vCloud: The inbound volume is actually load balanced as per your definition on the Load Balancer (F5, NetScaler, etc).  If you only have 1 vCenter, then you are load balancing inbound HTTPS web requests and Console Requests.  Each cell can service a client for the website and will handle the console requests on its own.

vCloud to vCenter: A cell is nominated as a vCenter proxy to handle the communication to a specific vCenter.  This is a funnel approach.  if you had two cells, active/active, it would be extremely hard to negotiate which tasks are complete and listen for results.  If the cell running the proxy fails it will move the proxy to another cell.

If you have 1 vcenter and 4 cells, then there really isn't a "load of vcenters" to balance.  All other cells running HTTPS requests will put things going to vCenter into the DB and then the one running the proxy will pull from the DB and ship to vcenter.

If you have 4 vCenters and 5 cells, then you have approximately 0.8 vCenters per cells (4 cells have 1 vCenter each).  The cell that doesn't use the proxy is ready in case of another cells failure.

Now for what you are experiencing.  I personally haven't seen that sort of lag when hitting the vcenter which is not holding the particular proxy you care about.  However, I'm not at my desk right now to really dig into this.

0 Kudos
shepherdz
VMware Employee
VMware Employee

--Norton-- wrote:

Whilst it is true that the network load balancer can balance the network requests across the 2 or more cells from what I've read only one of those cells can actually utilise vCenter and the other cells mearly off load there requests via it at intervals?

Sort of. For each vCenter server, there's (in normal operation) 1 cell designated as a vCenter proxy (listed in the "vCenter Proxy" column on the vCenters page of the Manage & Monitor section of the UI). That cell is responsible for listening to updates from the vCenter server. Other cells should be able to perform other communication with the vCenter.

The best consolidated documentation I know of is actually buried in a post (on another topic) on Tom Fojta's blog: Graceful Shutdown of vCloud Director Cell

--Norton-- wrote:

This was seen on one of my environments when requests were being made to create new machines and delete them via the VCD interface but a 5 min (300 second) wait time was seen between each set of 2 tasks in vCenter.

That's... odd. Is this in an otherwise idle system? What version of vCloud Director are you using? It's probably worth filing a support request to get someone to take a look at your environment.

--Norton-- wrote:

What was happening was that the requests were going via the second cell server. The cell server that was NOT listed as the proxy cell server within VCD.

That's fine. The cell which handles the REST API (or UI) request to initiate an operation isn't necessarily the cell that performs the operation (a sort-of load balancing of operations; even if you trigger numerous operations from one cell, actual execution will be balanced across the server group).

Zach Shepherd, Member of Technical Staff, VMware vCloud Director Engineering