VMware Cloud Community
thorwitt
Contributor
Contributor

Maintenance stucks at 2%

We have got 2 ESX 3.5.0 Build 110268 and a VirtualCenter Server 2.5 Build 104215 (as VM) connected to a FC SAN. When I try to go in the maintenance modus the status stucks at 2% and nothing happens. The running VMs are not automatically migrated to the other esx server (manual migration of the VMs works fine). HA and DRS ist enabled. Where is the problem?

0 Kudos
17 Replies
java_cat33
Virtuoso
Virtuoso

Is your DRS cluster not fully automated?

0 Kudos
JonRoderick
Hot Shot
Hot Shot

If your DRS cluster is set to automated and you can carry out manual vmotions without any problem, the issue is almost certainly the well known issue lots of people are having with HA in VC 2.5/ESX 3.5. The HA algorithm has changed in the latest releases and there seems to be a bug that prevents vmotion kicking in and taking care of the migrations onto other hosts when you kick in Maintenance Mode.

I've heard a bug fix has been requested but have no further information.

Cheers.

0 Kudos
thorwitt
Contributor
Contributor

DRS ist set to fully automated. We have got a customer with same issue after upgrading von 3.03 to 3.5U2 (since U2 HA looks o.k. but does not work). Another customer with an upgrade from 3.5 to 3.5u2 (with the fixed news isos) has got no problem. There ist works. Crazy! So we have to wait for a patch?

0 Kudos
IRIX201110141
Champion
Champion

The whole process stops always at 2% if somewhere is a violation of rules or just a dump shortstopper like "ghost network card" in a VM or similar. Check the task/events in combination with the DRS recommendation page. If a single VM cant migrate the whole task is blocked. In this case just migrate your VMs by hand and the maintenance tasks will go go forward.

regards

Joerg

0 Kudos
JonRoderick
Hot Shot
Hot Shot

Fair point but in this case, it sounds like a fairly good punt that it's the HA bug at work.

I don't agree, though, that you can just migrate the VMs by hand - automatic migration during a MM task is a key value-add as far as I'm concerned. Not having it is tantamount to removing one of the key benefits of ESX/VC.

Sure, you can use whizzy Powershell scripts to take care of the migrations and the subsequent re-organisation (multi-selecting VMs and then migrating them puts them in the top-level resource pool) but it's still a fudge.

Cheers

Jon

0 Kudos
Texiwill
Leadership
Leadership

Hello,

To work past this, do vMotions by hand. Also, check your isolation responses. It could be that the VM is set to power off and that is the issue as well as it may not be able to power off the VM due to a lack of a script, etc.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
Rubeck
Virtuoso
Virtuoso

I think your problem is your ESX build number... which seems to be ES 3.5 U2. In the release notes for this version it clearly states:

Virtual Machine Migrations Are Not Recommended When the ESX Server Host Is Entering the Maintenance or Standby Mode

No virtual machine migrations will be recommended (or performed, in fully automated mode) off of a host entering maintenance or standby mode, if the VMware HA failover level would be violated after the host enters the requested mode. This restriction applies whether strict HA admission control is enabled or not.

Taken from http://www.vmware.com/support/vi3/doc/vi3_esx35u2_vc25u2_rel_notes.html

/Rubeck

0 Kudos
astrolab
Contributor
Contributor

Have you checked that the VMs don't have CD-ROMs connected? I use VMCDconnected, you can find it here:

Or you can use the following powershell command:

Get-VM | Get-CDDrive |                                         
     ? { $_.ConnectionState.Connected -eq "true" } |
     Set-CDDrive -Connected:$false -Confirm:$false

0 Kudos
thorwitt
Contributor
Contributor

Sorry, can't get it really. Manually vmotions of all VMs are working without any warnings. So, I think enter in maintenance mode schould automatically migrate all VMs to the other ESX host (This worked before U2!). If the maintenance mode did not work for all 2 Node ESX configuration (Release Notes) automatically, nobody can use the VMware Update Manager without a lot manual intervention. This is not very nice. I think this is a bug not a feature!

0 Kudos
ilatimer
Hot Shot
Hot Shot

So just to let everyone know this is a feature and not a bug (in VMware's eyes). If you only have two ESX hosts in a HA/DRS cluster it will not automatically vMotion VMs off the ESX host that you put into maintenance mode. This is a change from previous builds. You can verify this by disabling HA on the HA/DRS cluster and then put the ESX host into maintenance mode (which will automatically move the VMs off the ESX host). This is stated in VMware's release notes and also verified by their support staff (at least in my case).

0 Kudos
JonRoderick
Hot Shot
Hot Shot

I see this on 2-node and 3-node clusters. I'm expecting to see it on my 6 and 9-node clusters next week too.

It's not a feature.

Jon

0 Kudos
wallbreaker
Contributor
Contributor

Hi ,

we have 10 esx and have the same problem.

The problem is about the "Current Failover Capacity". In the last build, this computed value is wrong.

And if this value is at 0, maintenance mode stucks.

We have resolved this problem when we have deleted all resource allocation for all vm. We kept only our 4 pools.

0 Kudos
RParker
Immortal
Immortal

> So, I think enter in maintenance mode schould automatically migrate all VMs to the other ESX host (This worked before U2!).

Precisely, and it's not suggested that this is the norm. We had this happen on 1 ESX server, and for whatever reason, the agent on the ESX server would not go vmotion ALL the VM's first, I had to do it manually. It's a work around a problem, and the issue is something is amiss with the agent, either between VC and the ESX server or the ESX server itself.

I agree, this is a bug, however it only happened ONCE! Ever other maintenance mode has been perfect, so it was an isolated incident. Also this is exactly the reason why we DON'T use ESX 3i, because I can putty/ssh into the ESX server directly and look at the processes and VM's (esxtop) and see what could be causing a problem. It turned out that 1 VM was not visible from VIC, but it WAS running on esxtop, so I had to shutdown the VM first then reboot the host (it never did go into maintenance mode). After a reboot I tested and it was fine for another maintenance mode.

Also this is the difference between UPGRADING servers rather than doing a NEW install also. This particular server was an upgrade, so now more than ever I exclusively use NEW installs, and configure the machines (which only takes a few minutes) and add it to the cluster.

0 Kudos
thorwitt
Contributor
Contributor

If I look at my Events after "Enter Maintenance Mode" there are a lot of Unable to automatically migrate Messages. But why? Manual vmotion worked without any warnings or problems.

0 Kudos
tomasz_dowalews
Contributor
Contributor

Hi,

I had the same issun on my 5 ESXs cluster. The solution was to remove any CPU and Reservation Limit on any Virtual Machine belonged to this cluster (Resource Allocation tab in the Infrastructure Client).

DRS Automation Level is set into Fully automated mode.

0 Kudos
thorwitt
Contributor
Contributor

This is the reply from the vmware support. So every two node esx u2 cluster should have this issue Smiley Sad

With ESX 3.5 U2 Virtual Machine Migrations are Not Recommended When the ESX Server Host Is Entering the Maintenance or Standby Mode.

No virtual machine migrations will be recommended (or performed, in fully automated mode) off of a host entering maintenance or standby mode, if the VMware HA failover level would be violated after the host enters the requested mode. This restriction applies whether strict HA admission control is enabled or not.

(Ref.: VMware Infrastructure 3 Release Notes; sections known issues: http://www.vmware.com/support/vi3/doc/vi3_esx35u2_vc25u2_rel_notes.html#knownhaissues)

More information about what is new in ESX 3.5 U2 can be found in VMware Infrastructure 3 Release Notes under following link http://www.vmware.com/support/vi3/doc/vi3_esx35u2_vc25u2_rel_notes.html

0 Kudos
pdrace
Hot Shot
Hot Shot

I'm seeing the same thing with ESX 3.5 update 1.

Our Virtual Center server is 2.5 update 2 however. Now I wonder if HA will work on my clusters in a failure.

0 Kudos