VMware Cloud Community
FT5508
Contributor
Contributor

Help with DRS VMotion Error Message

Hi,

Wonder if anyone in the community can help me figure out this error. Few days ago when DRS tired to vmotion a VM, it failed, resulting in the VM shutting down. Obviously thats not good. Since then, I have manually vmotion the same VM a few times and it has been fine. DRS has been able to migrate other VM's and have not had problems. Thank you in advance!

The error message from vcenter was a very generic "A general system error occurred:"

Looking at the vmkwarning file from the originating host has these entries:

Aug 10 21:47:49 source_host vmkernel: 35:03:34:06.057 cpu5:1913)WARNING: Migrate: 1243: 1249963637632293: Failed: Failed to resume VM (0xbad0043) @0x988d3e
Aug 10 21:47:49 source_host vmkernel: 35:03:34:06.063 cpu1:1912)WARNING: MigrateNet: 309: 1249963637632293: 5-0x802b408:Sent only 3404 of 4096 bytes of message data: Broken pipe
Aug 10 21:47:49 source_host vmkernel: 35:03:34:06.063 cpu1:1912)WARNING: Migrate: 6820: 1249963637632293: Couldn't send data for 2949051: Broken pipe
Aug 10 21:47:49 source_host vmkernel: 35:03:34:06.063 cpu1:1912)WARNING: Migrate: 6971: 1249963637632293: Failed to send final set of pages: Broken pipe (0xbad0052)

vmkwarning from the destination host has these entries:

Aug 10 21:47:49 destination_host vmkernel: 31:02:54:36.377 cpu0:1747)WARNING: Migrate: 7803: 1249963637632293: Migration pagein timeout expired, cleaning up
Aug 10 21:47:49 destination_host vmkernel: 31:02:54:36.394 cpu2:1747)WARNING: World: vm 1743: 6870: vmm0:ohpca184:vmk: vcpu-0:Failed to tranfer all changed pages from source within 100 seconds. Failing migration.
Aug 10 21:47:49 destination_host vmkernel: 31:02:54:36.394 cpu2:1747)WARNING: Migrate: 1243: 1249963637632293: Failed: Timeout (0xbad0020) @0x98b7d9
Aug 10 21:47:49 destination_host vmkernel: 31:02:54:36.394 cpu2:1747)WARNING: MigrateNet: 323: 1249963637632293: 9-0x80211b8:Received only 0 of 68 bytes: Timeout
Aug 10 21:47:49 destination_host vmkernel: 31:02:54:36.395 cpu2:1743)WARNING: World: vm 1743: 6938: Simultaneous panic! :vmk: vcpu-0:Failed to complete remote page-in.
Aug 10 21:48:43 destination_host vmkernel: 31:02:55:30.370 cpu0:1742)WARNING: World: vm 1742: 2398: VMMWorld group leader = 1743, members = 4

Any help in understanding what these errors mean or any assiatance in pointing me to the right direction to look is much appreciated.

Thank you!

Tags (3)
0 Kudos
1 Reply
marcelo_soares
Champion
Champion

This looks like network error on the vmkernel interface (the "broken pipe" message points on this direction). Do you have dedicated GB network for vmotion? Can you check the traffic on the time of the failure in order to confirm this?

Hope this helps,

Marcelo Soares

VMWare Certified Professional 310

Technical Support Engineer

Linux Server Senior Administrator

Marcelo Soares
0 Kudos