Hi
I have a cluster that consist of 6 hosts where manual vmotion works fine, but when I put my hosts in Maintenance mode, it just doesn't vmotion the vm's to the other hosts in the cluster. This problem started after we had some major network problems and every host have been rebooted after the network problem was solved. I've done vmkping and looked in different logs without finding any clues. Any idea on how to fix this (so that I can update to Update 2 without putting to much manual work into it)?
Anders
My support guy have forwarded it to engineering, but currently no published solution exists. He agreed as well as other vmware engineers that it seems that the capacity calculation is not correct in my case (this doesn't mean that vmware acknowledge that it is a bug).
Also - same problem that started only with 3.5 U2 update... However, about those calculations - I have 3-node clusters. Allowed failover - 1 node and VC reports capacity as 2 nodes before maintenance... Then I took a look at logs on VC - right before the maintenance mode it reports VM slots available twice the number that is required.... So, how can anyone say this is not a bug?
P.S. This update 2 definetely has some fixes and new features that we all longed for. But also in the past week I have seen some bugs that if I known before would definetely stop me from upgrading.
I got inspired by what you wrote and checked my reservations. I had some important servers that had 3gb memory reserved each (they are using 100%). If that reservation is removed I have a failovercapacity of 2 host suddenly. There is for sure something wrong with the calculation of HA.
This is really annoying.
Another option for a workaround is to select all the VMs on the host, right click and select migrate. Instead of choosing a specific host, choose your cluster, DRS will figure out what hosts to put them on and the vmotioning will start. Only issue is if you are using resource pools in your cluster. Then it gets messy trying to match the VM to the pool during a VMotion.
An update on my experience with this issue: I have not upgraded to ESX 3.5 U2, but I have upgraded Virtual Center 2.5 to U2, and now I'm having the problem. Apparently, the issue is in the Virtual Center code, not the ESX code.
The problem is without question in the VirtualCenter, where the calculation of available resources is unfavourable if you're using reservations and might result in zero failover capacity. I guess all we can do is wait for a solution from vmware and avoid reservations. Those who experience the problem should open a service request with vmware so that the scope of the problem is known to them.
The annoying thing is that the calculation problem has been very well-known and has existed since 3.0 and gone unresolved. Since it was easy to work around for HA, I didn't worry about it. Now, some bright mind at VMware has made the erroneous calculation a dependency for VMotion, which is bad enough, and in particular a dependency for maintenance mode, which just seems stupid. If I’m putting a host into maintenance mode, I expect guests to be moved unconditionally, barring some real technical problem. This kind of cluelessness/carelessness seems unlike VMware, but it does seem like EMC, so I hope it doesn’t forecast the future.
Done ranting.
An interesting thing... After an upgrade to U2 and an installation of the express-patch, I decided to try to reinstall U2 from scratch - do a clean install. Once I finished with a lab install - the same lab installation that did exhibit the behaviour in the subj - I can do vmotion with no problem - and no issues with HA calculations... I haven't changed the number of VMs running nor reservations. The diffirence between patch express and clean install in build numbers - express: 3.5.0.110181 - clean: 3.5.0.110268... Go figure...
Is that the build # for ESX Server or for VC? Either way, your build numbers are significantly different from mine.
Those are the ESX builds... Following U2 upgrade to VC - no further patches were applied - the build there - 2.5.0.104215.
I'm getting this in the latest build of vCenter 2.5 (Build 174768). Was there ever a fix for this?
Thanks - David