Has anyone been having a problem with Maintenance Mode getting stuck at 2%? This is a random issue that doesn't occur at all times. This was discovered during testing a new ESX rollout. Occasionally when a hosts is put into maintenance mode some or none of the virtual machines are not migrated off of the host and the Maintenance Mode task stays at 2%. This only applies when HA is enabled. This process does not error out or produce and error message.
ESX 3.5 Update 1 with VCMS 2.5 Update 1
2 hosts (16 CPU's, 32 GB RAM)
4 virtual machines (4 vCPU's, 3.5 GB RAM)
During a call with VMware support (with a live example of the problem), we changed the setting in HA to allow virtual machines to be started if this constraint would be violated. Within 10 seconds the remainder of the Virtual Machines began to migrate and the Maintenance Mode task completed properly.
After this setting numerous attempts were made to recreate the issue with the new HA setting and it has not come back.
Has anyone else seen this? reported it to VMware Support?
What I saw is that vmnic0,1,4 all share IRQ 16. Maybe move vmnic4 as unused, and plug your cable into vmnic5. Either that, or for the time being, remove vmnic4 from your config and see if all works well.
To rule out HA rules as the issue, go to the cluster and then edit the
cluster settings. Go to the "VMware HA" section and in the "Admission Control"
area, select the "Allow virtual machines to power on even if they
violate constraints" option. This ignores resource pre-allocations.
If it doesn't work with this option, then it must be a resource that is
unique to the server is in use (i.e. isolated storage volume, host
CDROM, parallel port, serial port, SCSI device, processor affinity etc). BE SURE the second host really can handle the load before you do this.
As another test, manually migrate each vm from the host to the other
host. If they refuse to move, you've at least pinpointed which vm's
are causing the problems. If all move individually, you've pretty much eliminated
network as an issue. If they fail, look at the log/events for that vm
and tear into it's settings one by one. I don't think any of the HA settings matter for manual migrations, but DRS (if in auto mode) may try to relocate vm's back to the other host(s).
Looks like this is still a problem for a lot of people - I'm seeing the same symptoms on 2- and 3-node clusters.
VM migrate fine manually but MM never activates the migrations and stays hung on 2%.
Maintenance mode gets stuck at 2% when there are still active VMs running on that ESX host.
If you have HA configured it should automatically vmotion off the VMs to another ESX host with enough resources. If your also using DRS it will do some load balancing and work out any affinity rules also.
What can stop VMs from vmotioning automatically to another host is:
- CD/Floppy still attached to VM - If so remove it
- VM attached to an internal only network or virtual switch not available on another ESX server - Check spellings of vSwitches and which network VM is using.
- VMtools installing on VM - Complete install
- VM is stored on a datastore local to that ESX host - Needs to be on a centralized datastore (SAN etc) that is availble to another or all ESX hosts.
I have also found in a cluster of two hosts (even in VC 2.5 u3 and ESX 3.5 u3) it does not automatically migrate the VMs, and needs to be done manually.Used to work in u1 but was changed since u2 to harsher calculations for HA.
If you found this information useful please award points using the buttons at the top of the page accordingly.
if entering maintenance mode is getting stuck it means that the VC can not migrate the VMs away to another ESX server. What you describe seams not to be a bug but a feature
Not true. When you START maintenance mode it immediately ATTEMPTS to put the machine into that mode. Which is 2%. NOW if it get's hung something is wrong, because the agent on the machine doesn't respond, so you have to manually move the VM's off, and reboot.
But getting stuck doesn't mean it's stuck moving the VM's, it hasn't even started it. All you get is a message in the queue put ESXhost into maintenance, and then you see no activity at all (not even an attempt at moving a VM). That's the problem.
Getting stuck at around 30-50% THEN you might have a problem with VM's not moving, but 2% is just initiate the process on the host.
Double-check to make sure you don't have any DRS anti-affinityh rules in place.
Also, how many VM's do you have created between two hosts?
This happened on a few of our hosts also, the agent is hung on the ESX host. Even if you login to the host directly, and click 'enter maintenance mode' there is still no response. Only 2 fixes that I know of, either upgrade / patch the ESX host, or reboot the host, and attempt the maintance mode again. Even service mgmt-vmware restart will not fix the issue.
Since upgrading to U2 3.5 we haven't seen this, so maybe VM Ware fixed this issue as of the newest patches.
> Maintenance mode gets stuck at 2% when there are still active VMs running on that ESX host.
Not really, it depends. If you watch the tasks, and you click Maintenance mode, nothing happens. The host doesn't respond to the request, it's been fixed (apparently) in later upgrade / patches since this posting. This happened to me many times, but it hasn't happenend recently. But at 2% it's notifying the ESX host, and there is ZERO activity at that point so it's hung. Even if there are active VM's when initiate maintenance mode, it should still show something on the task list that there is progress, but it doesn't. So I don't think it has anything to do with HA or VM's or anything, the host isn't able to respond to the request for whatever reason. Even if you manuall move the VM's, and turn off HA and there isn't anything running on the ESX host, I had times where it just wouldn't go into maintenance mode, I had to reboot the host, then everything was fine after that. But since them I have upgraded to U2/U3 3.5 and haven't seen this problem.
I can manually migrate the VM so that rules out the vmotion compatibility. All ESX servers have the exact same config and same hardware. After a manual move the server goes into maintenance mode fine.
I have disabled HA and tried again but it still stays at 2%.
Ive got a support case open for it and will update but as seen in a previous post vm support didnt resolve it either
Count me in (my customer, that is). Manual VMotion works flawlessly. MM freezes at 2% and doesn't even attempt a VMotion. 3 host cluster with 4 times more memory and CPU then is required for environment. We are upgrading from ESX 3.5 Update 1 to ESX 3.5 Update 3.
This is completely stupid. MM is a major selling point of VI3 Enterprise. I guess, shame on customer for not staying on latest greatest, but then again, this problem would have probably been there this spring time anyway. Interested to see what the issue was.. Good luck!
Hi, I'm new to vmware but I experienced the same issue today. When I checked back I discovered that DRS had been configured for partially automated and recommendations had been generated. It appeared that the DRS recommendations hadn't been applied and this was preventing the blade going into maintainance mode. Once all the recommendations had been applied and I put DRS back to automatic I have now been able to put blades into maintainance mode.
Just my small contribution - Hope it helps
Here's my setup and fix for this problem. The issue makes sense to make. I have a two node ESX cluster with HA and DRS enabled. HA is set to allow contraint violations. If DRS is set to manual, 'Entering Maintenance Mode' gets stuck at 2% because it it waiting for you to manually move your VMs, since DRS is set to manual. If you set DRS to fully automated and most conservative, entering MM moves your VMs for you. This seems logical to me.
This is a know bug in vCenter 2.5 Update 2, as explained at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100715.... Try to upgrade vCenter to the last version.