During Vmotion - "A general system error occurred:...

Oddjob62 · ‎01-29-2008

Hi, I have just installed ESX 3.5 with VC 2.5 I have 2 Hosts with a shared NFS storage (boss won't spring for a SAN). I am currently using evaluation licenses (am an ESX n00b)

Initially everything seemed to have installed fine, i have added the hosts to the VC server, and have installed a VM on one of the hosts.

Then i thought i'd try the fun stuff... Vmotion. Selected to migrate the VM, selected the destination, get the message "Validation succeeded" and started the migration. Migration gets to about 90% and then cancels with the error message "A general system error occurred: Failed to open the swap file".

The swap file is located in the same location as the rest of the VM's files. (recommended setting)

The destination host has access to the VM files (if i manually move the VM to the other node i can start it up with no problems)

Anybody have any ideas? Hopefully i'm just missing something obvious.

Thanks

Texiwill · ‎01-29-2008

Hello,

Check to see if you are getting a lot of SCSI Reservation Conflicts during your migration. That can be the cause of this error. Opening a file requires a reservation request.

Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

Oddjob62 · ‎01-29-2008

Would that be the case on an NFS back end though? Surely as it's essentially a file share, there's no chance of a SCSI reservation conflict. In case I am understanding things wrong, how would i check this please?

EDIT: vmkwarning logs entries during a migration attempt...

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu2:1089)WARNING: Migrate: 1242: 1201625672031250: Failed: Failed to resume VM (0xbad0043) @0x961d3e

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu4:1088)WARNING: MigrateNet: 309: 1201625672031250: 5-0x9020f98:Sent only 4088 of 4096 bytes of message data: Broken pipe

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu4:1088)WARNING: Migrate: 6770: 1201625672031250: Couldn't send data for 23838: Broken pipe

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu4:1088)WARNING: Migrate: 6921: 1201625672031250: Failed to send final set of pages: Broken pipe (0xbad0052)

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu4:1088)WARNING: MigrateNet: 299: 1201625672031250: 11-0x9020f98:Sent only 0 of 68 bytes: Broken pipe

Jan 29 17:09:54 vmkernel: 0:02:04:46.022 cpu4:1088)WARNING: Migrate: 6947: 1201625672031250: Couldn't send all pages sent msg: Broken pipe

Oddjob62 · ‎01-29-2008

Ok, i seem to have found the cause of the issue

In the vmware.log file:

Jan 29 17:09:54.458: vmx| Unable to initialize swap file /vmfs/volumes/a04af050-eaf0ec4b/Windows 2003 Enterprise Server R/Windows 2003 Enterprise Server R-8d3446ea.vswp: Not found

It seems that Server1 is seeing the NFS share as a04af050-eaf0ec4b, while Server2 sees it as 7c65c3b5-ac3b9ecc.

XXX@XXXXXXX volumes--# ls

478f6a6f-82df724c-9a7e-001e4f175355 7c65c3b5-ac3b9ecc bhf-esx01:storage1 NFS_VMStore

I'm getting to the limit of my knowledge of Linux here, but surely they should both be pointing to NFS_VMStore, instead of 7c65... and a04a...., because that contains the same content and has the same name on both servers.

Now the question is.... why and how do i fix that.

dmorgan · ‎01-29-2008

We had a similar issue, although with FC, and it had to do with the way that each ESX server saw the SAN. We have two SAN filer heads, and a fiber switch, so there are multiple paths to each LUN. Each ESX server saw the correct LUN on the SAN, however they each saw it under a different path, so once that was resolved, and each ESX server saw the LUN by the same name, the problem was resolved. I would check to see if each ESX server sees the datastore under the same name.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

ROMCH · ‎01-29-2008

I do have the same issue. Do you mean with diffrent path's for example ESX1 has vmhb1:0:4 & vmhba2:0:4 and ESX2 has vmhba2:0:4 & vmhba3:0:4?

VCP4 & VCP3 & CCNA

Oddjob62 · ‎01-29-2008

The Storage settings on both servers are identical, showing the Storage Identification as NFS_VMStore.

However the swap file insists on creating itself using the "locally significant" name as opposed to the shared name. I even tried to manually change it in the vmx file, but it gets changed back.

I have set the Swapfile as "Deault" and as "Always store with the virtual machine" but this makes no difference.

dmorgan · ‎01-29-2008

Yes, that would be a problem. Each ESX server needs to see the LUN on the SAN by the same name. If you have two filer heads, both attached to a switch, and both filer heads can control each others trays of disks, you will have 2, 4, or maybe even more paths to the same LUN. You must specify the exact same name for the SCSI target on each ESX server in order for VMotion to be able to migrate from one ESX server to another.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

dmorgan · ‎01-29-2008

As an example, we have two ESX servers. Each has a mapping to two LUN's on the SAN(s), via Fibre channel HBA's. The path for the two LUN's on each ESX server is vmhba0:0:10 and vmhba0:0:20. If I check both ESX servers, they both have SCSI targets with those same two names. Both ESX servers belong to the same cluster, and see those two LUNS by the same name. Thus, VMotion can change which ESX server is currently controlling a VM, and migrate it between the two, since they both see the LUN's by the same name.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

Oddjob62 · ‎01-29-2008

Ok so here's where we are at the moment

When i boot up the VM (say on server1) it creates a swap file using the following parameters

"sched.swap.derivedName = /vmfs/volumes/7c65c3b5-ac3b9ecc/New Virtual Machine/New Virtual Machine-7f522bca.vswp"

If i try to vmotion to server2, the error comes up

"Unable to initialize swap file /vmfs/volumes/a04af050-eaf0ec4b/New Virtual Machine/New Virtual Machine-b9736b96.vswp: Not found"

I can insert the line

sched.swap.dir = "/vmfs/volumes/NTFS_VMStore/New Virtual Machine"

...to get the swap file to be created in the "user friendly" link for the NFS share, but each time the server is restarted, it still adds those extra random characters at the end, and even more annoyingly, the destination server seems to try to make its own different one as well and then complains that it can't find it.

Maybe if there was a way to force the swap file to be called the same thing each time that would fix it.... is this possible?

The solution seems so close.

dmorgan · ‎01-30-2008

I checked the swap files on my VM's. One machine, for example, has a swap file named ProphetServer-New-6e17770c.vswp. I don't think you want to change the name of the vswp file manually, I believe the system tacks on the -6e17770c portion itself. I VMotioned this machine manually to another ESX host, and the vswp file name doesn't change. So, back to my original assumption, I think you may have multiple paths to the same LUN. If each individual ESX server sees a common LUN under a different name, then VMotion will not work. For instance, we have two filer heads, and four trays of disks. Each filer head is the primary controller for two trays of disks. However each filer head is also the secondary controller for the other two trays of disks. So, there are several routes to any one LUN on the SAN. If I were to map a LUN that existed on Filer 1's disks from ESX server 1, going through Filer 1, and mapped the same LUN on ESX server 2, through filer head 2, then VMotion would not work. Check your storage adapter settings on each ESX server from Virtual Center, and see what the SCSI Target Path's are for each. Do they show the same identical path, or differing paths?

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

Oddjob62 · ‎01-30-2008

Both adaptors show EXACTLY the same. I'm just stumped, i guess even though VMware say that 3.5 supports vmotion with NFS, it doesn't seem to. I have manged to modify my setup to use iSCSI, and this works straight away. Ahh well... if nothing else i've been thrown in at the deep end and learned a lot. Thanks for the assistance.

Oddjob62 · ‎01-30-2008

Given up on using Vmotion with an NFS back end. Might revisit when i get more free time.

fletch00 · ‎01-31-2008

I have an (escalated) case open on this issue - we tried removing the swap file reference from vmx and the like, but the swap filename mismatches still occur...

We have another cluster where this does not occur...

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info

fletch00 · ‎01-31-2008

The engineer called back and we fixed this by dropping the NFS datastore from one ESX host and re-adding it.

I had specified the NFS server with a FQDN on one server and not the other!

So making them both FQDN fixed it - VMotion is succeeding

BTW: the VMWare recommendation is to use IP address of the NFS server

Also I re-iterated the VMWare error messages are sorely lacking when trying to solve a simple case like this !

VCP5 VSP5 VTSP5 vExpert http://vmadmin.info

Oddjob62 · ‎01-31-2008

Hmmmm... i was using IP address one both hosts... all settings were identical. Maybe it HAS to be FQDNs to work properly? Dunno.

Glad you got yours working. Mine is now working with iSCSI instead of NFS, so i don't want to risk breaking it for the time being.

Agree with your last point. Maybe the migration "wizard" should have this as one of its checks before saying validation successful.

bvdkolk · ‎08-22-2008

I experienced the same error message after updating some of our esx servers.. i did serveral things like restarting the virtual center service, detaching / attach of storage lun's, restarting the esx host (in that order) and one host had 'old' virtual hardware which needed an upgrade..

after this, everything worked like it should

Raist · ‎11-12-2008

Hello, I ran across this thread while googling this same problem. As an FYI I found that this error occures if you use powershell to update virtual machine hardware while the vm is running and before the vm is shutdown to have the new settings take over.

All

During Vmotion - "A general system error occurred: Failed to open the swap file"