VMware Cloud Community
Aviso
Contributor
Contributor

Vmotion errors after rebuild on 3.0.2

As far as I can tell there seems to be a bug in 3.0.2 that started with the 11/15/07 patch set. Whenever I remove an ESX server from VirtuaCenter, rebuild it with the same name, and rejoin VirtualCenter, Vmotioning a running VM to the rebuilt server will fail with a timeout. /var/log/vmware/vpx/vpxa.log on that server inticates that "the object doesn't exist and has never existed".

To resolve this is issue, I have to migrate a VM that is powered off or suspended (This is technically a relocation not a migration). Once that VM has been moved to the rebuilt ESX server. I power it on, migrate it back to the server it was originally on, and then everything seems to work as normal.

I have no problems if I use 3.0.2 update 1, but with any patches, starting from 11/15/07 through 03/06/08 I'll get the error. Not sure about 3.5

Anyone else seen this? Does anyone know if it's resolved in 3.5? VMware, any fix for this coming out soon?

Reply
0 Kudos
8 Replies
Spad
Enthusiast
Enthusiast

I've just rebuilt a 3.5 server and have exactly the same issue.

Only it's worse, because I can't find any way to get it working again.

Reply
0 Kudos
dkfbp
Expert
Expert

A long shot: We had an issue with vmotion / cold migration when installing a esx 3.5 with the hostname in UPPERCASE. Did you do that?

Best regards Frank Brix Pedersen blog: http://www.vfrank.org
Reply
0 Kudos
alhamad
Enthusiast
Enthusiast

Did you try to disconnect the server from the VC instead of removing it, perform rebuild, connect it back.

Reply
0 Kudos
Spad
Enthusiast
Enthusiast

It appears that disabling DRS will allow you to VM between affected machines, but enabling it again will break VMotion, so it's hardly a long-term solution.

Reply
0 Kudos
Spad
Enthusiast
Enthusiast

I've "fixed" the issue by removing all my hosts from the cluster, removing the cluster, creating a new one and moving all the hosts back into it.

However, VMotion does seem much slower now than it used to be.

Reply
0 Kudos
Aviso
Contributor
Contributor

Thanks for the suggestions

Names are all lowercase after that HA bug they had in VC 2.0.2.

I'll try disbaling DRS, but I'm pretty sure I tried that before without an effect.

I'll try disconnecting the server too, but I feel more comfortable removing it since it will be rebuilt before it gets reconnected.

As far as destroying and recreating the cluster, I don't think I want to go through that every time. What I described above seems to work every time, It's a pain, but probably less effort.

Based on how quickly people responded, it looks like a lot of other people have been having this problem. I'm very sorry to hear it still exists in 3.5, hopefully VMware will fix it quickly.

Reply
0 Kudos
Aviso
Contributor
Contributor

Problem solved. Seems the issue is that the ESX serves that aren't rebuilt cache the MAC address of the vmkernel port of the server that was rebuilt. The solution is to do a vmkping from the server you just rebuilt to the other servers in the cluster. At some point 'll write a script that will actually query VirtualCenter to determine the addresses, but for now I just added a section to my network config script that enumerates all the IPs on the subnet and vmkpings them.

Reply
0 Kudos
TomHowarth
Leadership
Leadership

Original Poster has found solution to his own issue. there thread marked as assumed answered

Tom howarth

VMTN User Communities Moderator

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos