VMware Cloud Community
admin
Immortal
Immortal

VM gets rebooted when vmotioned to this Host

Hi Folks,

I am running into bit of trouble here, we have a 10 host cluster and all hosts seem to be working fine expect for one. when a  VM is migrated to the

host from a different host - as soon the migration is completed the VM gets rebooted. Its happens for both by DRS or manual vMotion. happens each and every but the same VM when migrated to a different host does not reboot

and works good.

I tried creating a Blank VM with no guest OS and migrated it to the host - and this did not reboot.? its so confusing.

How come all other VM's reboots and this did not.? Ran a hardware test for the memory and it is good - Seems no hardware issue. I see the following in the logs

vmware.log

Aug 20 21:11:13.371: vcpu-0| Guest: toolbox: Version: build-341836

Aug 20 21:11:13.982: mks| MKSHostOps_HideCursor before defineCursor!

Aug 20 21:11:14.241: vcpu-0| MONITOR PANIC: vcpu-0:VMM64 fault 14: src=MONITOR rip=0xfffffffffc29166c regs=0xfffffffffc008c00

Aug 20 21:11:14.241: vcpu-0| Core dump with build build-348481

Aug 20 21:11:14.242: vcpu-0| Writing monitor corefile "/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmware64-core0.gz"

Aug 20 21:11:14.242: vcpu-0| CoreDump: dumping core with superuser privileges

Aug 20 21:11:14.245: vcpu-0| VMK Stack for vcpu 0 is at 0x417f86f61000

Aug 20 21:11:14.245: vcpu-0| Saving busmem frames

Aug 20 21:11:14.246: vcpu-0| Saving anonymous memory

Aug 20 21:11:14.247: vcpu-0| Beginning monitor coredump

Aug 20 21:11:14.973: vcpu-0| End monitor coredump

Aug 20 21:11:14.974: vcpu-0| Beginning extended monitor coredump

Aug 20 21:11:20.690: vcpu-0| Msg_Post: Error

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic] *** VMware ESX internal monitor error ***

Aug 20 21:11:20.690: vcpu-0| vcpu-0:VMM64 fault 14: src=MONITOR rip=0xfffffffffc29166c regs=0xfffffffffc008c00

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic.report] You can report this problem by selecting menu item Help > VMware on the Web > Request Support, or by going to "http://vmware.com/info?id=8&logFile=vmware%2elog&coreLocation=%2fvmfs%2fvolumes%2f4fed003a%2db7db6e6...". Provide the log file (vmware.log) and the core file(s) (/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmware64-core[0-1].gz, /vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmx-zdump.000).

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic.serverdebug] If the problem is repeatable, set 'Use Debug Monitor' to 'Yes' in the 'Misc' section of the Configure Virtual Machine Web page. Then reproduce the incident and file it according to the instructions.

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic.vmSupport.vmx86] To collect data to submit to VMware support, run "vm-support".

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic.entitlement] We will respond on the basis of your support entitlement.

Aug 20 21:11:20.690: vcpu-0| [msg.log.monpanic.finish] We appreciate your feedback,

Aug 20 21:11:20.690: vcpu-0| -- the VMware ESX team.

Aug 20 21:11:20.690: vcpu-0| ----------------------------------------

Aug 20 21:11:20.690: vcpu-0| VTHREAD watched thread 0 "vmx" died

Hostd log  :

[2013-08-20 21:11:14.918 2E290B90 verbose 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Actual VM overhead: 103841792 bytes

[2013-08-20 21:11:14.918 2E290B90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM

[2013-08-20 21:11:20.009 2E290B90 info 'Statssvc'] HostCtl Exception during network stats collection for vm 16 : Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

[2013-08-20 21:11:21.314 FFBECB90 info 'Libs'] VmdbPipeStreamsOvlError Couldn't read: OVL_STATUS_EOF

[2013-08-20 21:11:21.314 FFBECB90 warning 'Libs'] VMHS: Connection to VM broken: cfg: /vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx; error: Pipe: Read failed; state: 3

[2013-08-20 21:11:21.314 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Unmounting the vm.

[2013-08-20 21:11:21.314 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] VMDB unmount initiated.

[2013-08-20 21:11:21.315 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Unmounting VM complete.

[2013-08-20 21:11:21.315 FFBECB90 info 'Libs'] SOCKET 1 (61)

[2013-08-20 21:11:21.315 FFBECB90 info 'Libs'] recv detected client closed connection

[2013-08-20 21:11:21.315 FFBECB90 info 'Libs'] Detected automation socket close for VM (/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx)

[2013-08-20 21:11:21.317 2E2D1B90 verbose 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Running status of app monitoring changed to : gray

[2013-08-20 21:11:21.318 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Mount state values have changed.

[2013-08-20 21:11:21.318 2E2D1B90 verbose 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Ignored toolsManifestInfo update of size (0)

[2013-08-20 21:11:21.318 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Mount state values have changed.

[2013-08-20 21:11:21.318 2E2D1B90 verbose 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] MKS ready for connections: false

[2013-08-20 21:11:21.318 2E2D1B90 verbose 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Tools are auto-upgrade capable

[2013-08-20 21:11:21.320 FFBECB90 info 'vm:/vmfs/volumes/4fed003a-b7db6e6e-0dc9-10052500002c/vmtst-3/vmtst-3.vmx'] Reloading config state.

[2013-08-20 21:11:21.357 FFBECB90 info 'Libs'] VMHS: Transitioned vmx/execState/val to poweredOff

does this seems like a hardware issue..??

Any help on this situation would be much appreciated.

Thanks,

Avinash

0 Kudos
8 Replies
weinstein5
Immortal
Immortal

Is it only this on VM or are all the VMs rebooting? If it is the single VM than the issue is with the VM and I would look at the log with in the VMs operating system?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
admin
Immortal
Immortal

Thanks for the Input, let me check the Network activity during the vmotion.

0 Kudos
admin
Immortal
Immortal

Can you please attach the entire vmware.log file?

0 Kudos
admin
Immortal
Immortal

Update : Definitely something to do with the network.

Here is what the texts I have done and the outputs .

1. Created a another VM with swap file in the same location as with VM - initiated a Vmotion

Ran esxtop : we see packet drops of 100% and the VM still rebooted.

2. removed the vmdk from the test VM and migrated it - did not reboot the VM.

removed an existing VMDK from a production test  vm and migrated and it did not reboot. Seems the VM gets rebooted after the stun of the VM at the end of the migration.

We are checking the network to see what is causing the problem and why is the saturation occurring.

Thanks,



0 Kudos
f10
Expert
Expert

I think its the VM Monitoring option that reboots the from when hostd agent find the vmware tools service un-responsive.

Regards, Arun Pandey VCP 3,4,5 | VCAP-DCA | NCDA | HPUX-CSA | http://highoncloud.blogspot.in/ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
admin
Immortal
Immortal

Test VM does not have an OS and no VMware tools installed. But it stays powered -ON after migration.only if the VM ahs a VMDK attached it reboots.

Thanks,

Avinash

0 Kudos
admin
Immortal
Immortal

Hi folks ,

This is getting more and more interesting.. was looking at the  UCS Manager and noticing off and on inconsistencies with one of the CPUs on that blade. Just for fun, I kicked up on of the test VMs to 8vCPUs and it went out of control. It started power cycling itself, turning on and off and repeating. I finally got it to another host so it's ok now .


Any inputs on this would be much appreciated.


0 Kudos
admin
Immortal
Immortal

Hi Guys it was a hardware Issue, CPU was the problem.

0 Kudos