VMware Cloud Community
atom_acres
Enthusiast
Enthusiast
Jump to solution

Strange DRS issue

We are using vCenter v5.1 U2

We have 18 hosts in our dev cluster. We are a mixed environment of Windows and Linux. We recently decided to start testing some DRS rules that would keep all Windows guests on 5 hosts. We created a host group containing 5 of the hosts, and then created a VM group containing all of the Windows guests.

Next we created a DRS rule that said all of the Windows Guests should run on the "Windows Hosts". Within 5 minutes all of the guests had migrated to the correct hosts except 1.

This particular guest (A) has a Linux counterpart (B) - there is another rule stating these two machines must run on the same host. For whatever reason, every 5 minutes (indefinitely) they would both migrate to a new host but it was never one of the 5 hosts marked for "Windows Guests". I finally manually migrated Guest A to one of the proper hosts and eventually B followed and that was the end of that.

If I migrate A back out of the group, B will follow and both will start jumping around again but never making it back onto a proper host. If both A and B are on a Windows host (we don't care of Linux guests are on the windows hosts) and I migrate the Linux guest (B) off - it will eventually move back and join the windows guest (A). This tells me it is honoring the VM-to-host rule. But for whatever reason, once it is off of one of the proper hosts it can never find its way back.

We have two other guests (C and D) and they are also one windows and one linux guest with a VM-VM affinity rule. They behave the EXACT SAME WAY as A and B!

What is going on? Is this a bug? Can someone point me to some supporting documentation or an article that outlines a similar finding? I am having a hard time researching this one.

Thanks!

FWIW: we are in no way constrained for resources on any of the hosts in question. All storage and VLANs in use are presented and available to ALL hosts. The guests run fine when manually migrated to the proper hosts, they just cant find their way there via DRS.

1 Solution

Accepted Solutions
atom_acres
Enthusiast
Enthusiast
Jump to solution

Just a follow up to this....

I opened a case with vmware and they set up a lab with my exact same build # and configuration and they could not duplicate my results. For now, we just came up with the workaround of adding the second machine to the large group of VMs (in my scenario, that would be adding the linux VM to the 'windows servers' DRS group). I hope this issue will go away when we eventually move off of this build #.

View solution in original post

0 Kudos
12 Replies
vThinkBeyondVM
VMware Employee
VMware Employee
Jump to solution

Let me put the same DRS rule configuration & issue again in my words.

Consider you hosts are: H1 -H20

1. You have one Host group: H1,H2,H3,H4,H5: Group name : HOSTGRP1

2. You have created windows VM group: A, C,VM3,VM4,VM5 :All are windows. group name: VMGROUP1

3. You have 2 Linux VMs : B, D

4. I am keeping all other host and VMs aside (Not considering)

5. You created VM-HOST DRS rule : HOSTGRP1 should run on VMGROUP1 (Its not must rule)

6. You have 2 VM-VM affinity rules: i)A & B(A is windows guest & B is linux) ii)C & D (C is windows & D is windows).

i) Now due to VM-HOST should rule: All VMs ie VM3, VM4, VM5 are migrated to HOSTGRP1 as expected.

ii) Every five minutes (i.e DRS default invocation time), Both A & B migrate together to either of hosts (from H6-H20) but never on any host from HOSTGRP1.

iii)When you manually migrate A to HOSTGRP1 (say on H1), B follows and gets migrated on H1.

iv) When you migrate manually A to either hosts from H6-H20, B gets migrated to the host where A is. After that again every five minutes they starts migrating to either host from H6-H20 but never on HOSTGRP1. BUT when you migrate B (linux) itself to out of HOSTGRP1, B comes back again on HOSTGRP1 as expected.

v) You have this statement : "But for whatever reason, once it is off of one of the proper hosts it can never find its way back". I am assuming, here you are talking about moving A VM out of HOSTGRP1. Please confirm.

v) You observe the same for C & D.

vi) Now you are wondering why its happening & its bug or not ? etc.

Please confirm my understanding on your config as it is important to understand configuration first before making any troubleshooting comment.

Meanwhile, I will think on this scenario again & will come back with my comments.


----------------------------------------------------------------
Thanks & Regards
Vikas, VCP70, MCTS on AD, SCJP6.0, VCF, vSphere with Tanzu specialist.
https://vThinkBeyondVM.com/about
-----------------------------------------------------------------
Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

atom_acres
Enthusiast
Enthusiast
Jump to solution

You are correct - your understanding is spot on!

(with the exception of a typo on 6 - D is Linux not windows - but you did state this correctly under 3)

0 Kudos
atom_acres
Enthusiast
Enthusiast
Jump to solution

As another test, I took 2 different VMs (both Windows OS) and created a VM-VM affinity rule. We will call them E and F. E is in the VMGROUP1 and F is not. They behaved exactly the same as the examples I posted above.

I migrated E to one of H6-H20 and F followed E and they began to jump around h6-h20. I removed the VM-VM affinity rule and E proceeded to migrate successfully to one of H1-H5

It appears to be some sort of bug or issue when a VM has both a VM-VM affinity rule and a VM-Host rule in which the other VM(s) in the VM-VM rule do not have the VM-host rule as well.

Try it for yourself...

0 Kudos
vThinkBeyondVM
VMware Employee
VMware Employee
Jump to solution

Thanks for your update.  As your setup is in place. Can you please try to reproduce the same use-cases on smaller scale (2-4 hosts) & update this post?  I will give a try from my side as well.


----------------------------------------------------------------
Thanks & Regards
Vikas, VCP70, MCTS on AD, SCJP6.0, VCF, vSphere with Tanzu specialist.
https://vThinkBeyondVM.com/about
-----------------------------------------------------------------
Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

0 Kudos
atom_acres
Enthusiast
Enthusiast
Jump to solution

I took the same two VMs (E & F) from my last test and removed them from any groups and rules for a fresh slate. At this point, they both resided on H3

I then created a VM-VM affinity rule for them then created a HOSTGRP2 and only added H19 and H20 to it. Next, I created a VM-HOST rule for E to stay on HOSTGRP2 (H19 and H20). Immediately both E and F migrated to H19.

Next, I manually migrated F off of H19-20 and it came back within 5 minutes via DRS.

Finally I tried to migrate E off of H19-20 and was told it would violate a rule - at this point I realized I had made my VM-Host rule with MUST instead of SHOULD on accident. I changed it to SHOULD then migrated E off of H19-20 and low and behold - E & F begin jumping around H1-H18 but will never migrate to H19-20.

0 Kudos
atom_acres
Enthusiast
Enthusiast
Jump to solution

bump...

Can anyone else recreate this? Do I need to log a case with vmware?

0 Kudos
vThinkBeyondVM
VMware Employee
VMware Employee
Jump to solution

Hi Atom_acres,

I tried on 5 host DRS cluster but I could not reproduce this issue, need more dedicated time to scale the environment and deploy exact testbed that you have.

is any other rule causing this issue?(apart from 2 rules that you created). Did you enable HA as well? if yes, what was the major config details?

Nonetheless, please go ahead and log a support request with VMware. This will help to give dedicated time for this task in order to root cause this issue.


----------------------------------------------------------------
Thanks & Regards
Vikas, VCP70, MCTS on AD, SCJP6.0, VCF, vSphere with Tanzu specialist.
https://vThinkBeyondVM.com/about
-----------------------------------------------------------------
Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

atom_acres
Enthusiast
Enthusiast
Jump to solution

I tested with two VMs that were not in any DRS groups prior to adding them to the test groups and still had the issue...

HA is enabled - which settings would you like to know? How does HA play a role in the DRS migrations every 5 minutes?

0 Kudos
vThinkBeyondVM
VMware Employee
VMware Employee
Jump to solution

Yes, 5 min is default DRS invocation time-frame. It is nothing to do with HA . However, HA also has some level of interop with DRS rules.

Please do log support request in order to track this issue better.


----------------------------------------------------------------
Thanks & Regards
Vikas, VCP70, MCTS on AD, SCJP6.0, VCF, vSphere with Tanzu specialist.
https://vThinkBeyondVM.com/about
-----------------------------------------------------------------
Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

0 Kudos
atom_acres
Enthusiast
Enthusiast
Jump to solution

As a further test, I took H1 in HOSTGRP1 and manually migrated VM A to H1 and VM B followed shortly after. I then took H1 and put it into maintenance mode. All VMs eventually migrated off of H1 to other hosts. If they fell into the VM-Host rule they successfully migrated to another host in HOSTGRP1. VM A and B however did not! They again migrated to a different host (h6-h20) and continued to migrate around every 5 minutes.

HOWEVER, after everything migrated off of H1 I took it back out of maintenance mode and immediately VM A and B migrated back to it (and only those VMs).

I am working on getting a case open - it may take me some time. If I get any resolution I will be sure to update this post.

0 Kudos
atom_acres
Enthusiast
Enthusiast
Jump to solution

Just a follow up to this....

I opened a case with vmware and they set up a lab with my exact same build # and configuration and they could not duplicate my results. For now, we just came up with the workaround of adding the second machine to the large group of VMs (in my scenario, that would be adding the linux VM to the 'windows servers' DRS group). I hope this issue will go away when we eventually move off of this build #.

0 Kudos
vThinkBeyondVM
VMware Employee
VMware Employee
Jump to solution

Thanks a lot for update.


----------------------------------------------------------------
Thanks & Regards
Vikas, VCP70, MCTS on AD, SCJP6.0, VCF, vSphere with Tanzu specialist.
https://vThinkBeyondVM.com/about
-----------------------------------------------------------------
Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

0 Kudos