VMware Cloud Community
HamR
Enthusiast
Enthusiast

Preferential VM-Host DRS Rules

I have a stretched cluster on vSphere 5 U1 with three preferential VM-Host DRS Rules.

1. Keep SiteA VMs on SIteA Hosts

2. Keep SiteB VMs on SiteB Hosts

3. Keep the vCenter and vCenterDB servers on Host1 at SiteA and Host1 at SiteB

The storage for the vCenter and DB servers is at SiteA.

When Host1 is available at SiteA, that's where the servers migrate to. When Host1 is not available, they go wherever they like. Invoking "Run DRS" finds no recommendations.

Any thoughts?

HamR

Tags (1)
0 Kudos
6 Replies
frankdenneman
Expert
Expert

When Host1 is not available, they go wherever they like. Invoking "Run DRS" finds no recommendations.

Can you explain this any further?

Blogging: frankdenneman.nl Twitter: @frankdenneman Co-author: vSphere 4.1 HA and DRS technical Deepdive, vSphere 5x Clustering Deepdive series
0 Kudos
HamR
Enthusiast
Enthusiast

Hi Frank,

I was hoping you'd chime in here.

I have the follwing DRS host groups defined:

- Site A Hosts (contains all hosts at Site A)

- Site B Hosts (contains all hosts at Site B)

- First Hosts (contains Host1 at both sites)

I have the following VM groups defined:

- Site A VMs (contains all VMs at Site A)

- Site B VMs (contains all VMs at Site B)

- Mgmt VMs (contains the vCenter and DB Servers)

I have soft rules binding Site A VMs with Site A hosts and likewise for Site B. I also have a soft rule binding the Mgmt VMs to the First Hosts group.

This means that there are conflicting rules as the Mgmt VMs can't reside at both sites at the same time. The VMs gravitate to Host1 at SiteA. When this host fails or is placed in MMode, the Mgmt VMs do not necessarily stay at SiteA or move to Host1 at SiteB. Manually invoking DRS doesn't see a problem with the rule violation. Exiting MMode or recovering Host1, Site A will see both VMs move back to it, as expected.

What am I missing?

Cheers,

HamR

0 Kudos
frankdenneman
Expert
Expert

You stated that the storage for the mgmt vm’s is located in site A, but does the first host in site b (grouped in the first hosts DRS group) also have a connection and read write abilities to the storage in site A?

Blogging: frankdenneman.nl Twitter: @frankdenneman Co-author: vSphere 4.1 HA and DRS technical Deepdive, vSphere 5x Clustering Deepdive series
0 Kudos
frankdenneman
Expert
Expert

I recreated your environment in your lab and this is my conclusion: http://frankdenneman.nl/uncategorized/overlapping-drs-vm-host-affinity-rule-in-a-vsphere-stretched-c...

I'm really interested to see why your virtual machine deviate from the configured path, e.g. get moved to hosts external to Site A and are not Host1 at Side B.

My previous question was if storage was available to hosts in site-b? what about VM networks? Please keep in mind that DRS always apply soft affinity rules, but drops them if destination hosts in the cluster are 100% utilized or they cannot meet the compatibility requirements such as storage and or VM networks.

Blogging: frankdenneman.nl Twitter: @frankdenneman Co-author: vSphere 4.1 HA and DRS technical Deepdive, vSphere 5x Clustering Deepdive series
0 Kudos
HamR
Enthusiast
Enthusiast

Hi Frank,

Thanks again for looking into this. Your lab test seems to represent my config, as described. But your results obviously differ from my own.

To answer your questions:

- Shared storage is consistent between all hosts, in that all hosts can read/write to all datastores. There are no site-specific datastores.

- Each site does have some site-specific networks; however, the two mgmt VMs only have one NIC, attached to a stretched VLAN and visible to all hosts

All hosts belong to the same vDS and have access to the same port groups. There were the only two VMs configured and running when the tests were conducted (new vSphere environment).

The SiteA hosts were built and aligned to a host profile, which was then cloned and adjusted for NTP/DNS type variations for SiteB and assigned/applied to the SiteB hosts.

VMs can be manually vMotioned to any host, which confirms that the VMs pass the host validation tests.

In your tests, if you failed a host and an HA event transpired where the VMs recovered outside the DRS boundaries, did the VMs right themselves on the next DRS sweep, or could you generate a DRS recommendation to do so?

From memory, my rules were created using the same chronology described in your first test, so the Mgmt-VMs rule is newer. That said, the VM placement behaviour for Mmode and HA (+ Run DRS action) failed to replicate the results.

On the first run Mmode test, both VMs went to different hosts at SiteB, neither went to Host1. On the second Mmode test, one VM stayed at SiteA and the other went to SiteB (not Host1). All hosts are the same physical spec.

Any thoughts?

Cheers,

HamR

0 Kudos
frankdenneman
Expert
Expert

In your tests, if you failed a host and an HA event transpired where the VMs recovered outside the DRS boundaries, did the VMs right themselves on the next DRS sweep, or could you generate a DRS recommendation to do so?

DRS sweeps them back the moment DRS load balancing is started. The first priority of DRS is to fix violations, as the VMs violate the rules, DRS moves them directly to the correct host. Typically DRS runs every 5 minutes, but is also invoked when the cluster configuration changes. As the host was back within 5 minutes, DRS was triggered after the host was up and running.

From memory, my rules were created using the same chronology described in your first test, so the Mgmt-VMs rule is newer. That said, the VM placement behaviour for Mmode and HA (+ Run DRS action) failed to replicate the results.

For HA I can understand the behavior, but not for maintenance mode. Maintenance mode must abide the rules unles there is a problem or 100% utilization on the destination host (the 1st host in site B). If it cannot satisfy the first rule (mgmt) then it should satisfy the site rule (all vm's on host in site-a). If it's a test environment could you only create these two rules. Throw your other rules away, and make the site rule (site-a) first and then the mgmt.

Place the 1st host in site A into mm and the VM should move to 1st host in site B. Place this host into mm and it should migrate to the other host in Site-A (cause it's listed in the Site-A affinity rule)

On the first run Mmode test, both VMs went to different hosts at SiteB, neither went to Host1. On the second Mmode test, one VM stayed at SiteA and the other went to SiteB (not Host1). All hosts are the same physical spec.

It looks like DRS is avoiding placing the VMs on Host1, to understand the behavior of DRS the log files have to be reviewed. I would file a support bug. I've been going through the events and logs in vCenter and they do not provide enough information to deduct DRS behavior and why it chose specific hosts for placement. DRMdumps do, but they are located on the vCenter server in a compressed state. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102180...


If you are using the VC virtual appliance they are located in /var/log/vmware/vpx/drmdump/cluster-id


However the support team can analyze the data more in-depth.


Blogging: frankdenneman.nl Twitter: @frankdenneman Co-author: vSphere 4.1 HA and DRS technical Deepdive, vSphere 5x Clustering Deepdive series
0 Kudos