Using a vSphere 6.5 setup.
12 blades with ESXi 6.5 and shared storage on a SAN.
Controlled through VCSA 126.96.36.19900.
I have a number of VMs for which I have a rigid structure imposed on me.
There are 34 VMs. These are split into groups of VMs which have to be together.
There are 9 groups, lets call them G1 through to G9
These groups are represented as VM-VM affinity rules.
That bit all works fine.
The next requirement is that no group can co-exist with another group on a blade, i.e. each group must be on its own blade.
To do this, one VM from each group affinity rule is added to a single VM-VM anti-affinity rule. These are referred to as the group-anchor VMs. The theory is that if each group has a VM in it that has anti-affinity to at least one VM in every other group, this will force the groups to be moved to separate blades.
I have been able to get it to work, but on one condition:
I have to remove the anchor VM of the first group in the DRS rules list from the anti-affinity rule, otherwise the anti-affinity rule cannot be enabled.
I had though it was an issue with the VM chosen as the anchor initially, and removed the VM entirely from the cluster, and moved onto the next one in the group. As soon as this was added to the anti-affinity rule, the rule became disabled again. Remove the VM and I could enable the rule.
When the rule is disabled by having the anchor VM from all groups present, none of the rule member VMs show any conflict errors.
To see if this was an issue with not being able to have an anchor VM from all groups in general I tried adding the G1 anchor to the anti rule and removing the G2 anchor VM instead. This didn't allow the rule to be enabled, so it seems to be an issue related only to the first affinity rule in the list.
Is this expected behaviour or and oddity that needs to be reported as a possible bug?
It would appear to be OK now, although I'm not altogether happy not knowing why it didn't work yesterday.
I am told that the networks team were doing some config changes and that for a while HA agents were complaining about not being in contact with other hosts in the cluster.
Another engineer informed me today that he had completed some work to homogenize the cluster and removed some discrepancies between host profiles. Grrr!!
So today it started working. Sigh!