VMware Cloud Community
dlhartley
Contributor
Contributor

VMs randomly power off

I've been having a strange issue for a few weeks now where Virtual Machines across three ESX 4 hosts randomly power off. Nothing comes up under 'Events' for the VM other than "Virtual machine on (hostname) is powered off".

It has happened to one or two machines on each host over the last couple of weeks and I can find no errors relating to why they have magically decided to power off - has anyone had similar issues?

All of the VMs in question except one are on an iSCSI SAN which has had no issues until now, so I'm at a loss!

Thanks

0 Kudos
13 Replies
mcowger
Immortal
Immortal

Check the vmware.log for each one right after it happens (maybe post it up if possible) so we can review it.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
dlhartley
Contributor
Contributor

As the machine had been powered off since this occurred, here are the last 50 lines from the VM's vmware.log. The timestamp on the log entries matches when the console reports the machine was powered off, so hope this helps!

David

Nov 15 04:26:47.581: vcpu-1| SymBacktrace[3] 0x3e961f68 eip 0x8f0eb54 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.582: vcpu-1| SymBacktrace[4] 0x3e961f88 eip 0x8b14544 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.583: vcpu-1| SymBacktrace[5] 0x3e961fd8 eip 0x8f0b0cf in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.583: vcpu-1| SymBacktrace[6] 0x3e962008 eip 0x8b143fb in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.584: vcpu-1| SymBacktrace[7] 0x3e962038 eip 0x8b1489b in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.585: vcpu-1| SymBacktrace[8] 0x3e962078 eip 0x8b23001 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.585: vcpu-1| SymBacktrace[9] 0x3e9620a8 eip 0x8b2045b in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.586: vcpu-1| SymBacktrace[10] 0x3e9620f8 eip 0x8b1f728 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.586: vcpu-1| SymBacktrace[11] 0x3e962148 eip 0x8b22bad in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.587: vcpu-1| SymBacktrace[12] 0x3e962178 eip 0x8b8359a in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.588: vcpu-1| SymBacktrace[13] 0x3e9a3218 eip 0x8cd04e4 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.588: vcpu-1| Unrecoverable memory allocation failure at bora/lib/user/msg.c:221

Nov 15 04:26:47.589: vcpu-1| Panic loop

Nov 15 04:26:47.590: vcpu-1| Backtrace:

Nov 15 04:26:47.590: vcpu-1| Backtrace[0] 0x3e961138 eip 0x8f1027d

Nov 15 04:26:47.591: vcpu-1| Backtrace[1] 0x3e961578 eip 0x8b12b34

Nov 15 04:26:47.591: vcpu-1| Backtrace[2] 0x3e961648 eip 0x8f0a2fa

Nov 15 04:26:47.592: vcpu-1| Backtrace[3] 0x3e961668 eip 0x8f0a8a4

Nov 15 04:26:47.593: vcpu-1| Backtrace[4] 0x3e961aa8 eip 0x8f11899

Nov 15 04:26:47.593: vcpu-1| Backtrace[5] 0x3e961ee8 eip 0x8b12ac3

Nov 15 04:26:47.594: vcpu-1| Backtrace[6] 0x3e961f18 eip 0x8f0daa8

Nov 15 04:26:47.594: vcpu-1| Backtrace[7] 0x3e961f68 eip 0x8f0eb54

Nov 15 04:26:47.595: vcpu-1| Backtrace[8] 0x3e961f88 eip 0x8b14544

Nov 15 04:26:47.596: vcpu-1| Backtrace[9] 0x3e961fd8 eip 0x8f0b0cf

Nov 15 04:26:47.596: vcpu-1| Backtrace[10] 0x3e962008 eip 0x8b143fb

Nov 15 04:26:47.597: vcpu-1| Backtrace[11] 0x3e962038 eip 0x8b1489b

Nov 15 04:26:47.597: vcpu-1| Backtrace[12] 0x3e962078 eip 0x8b23001

Nov 15 04:26:47.598: vcpu-1| Backtrace[13] 0x3e9620a8 eip 0x8b2045b

Nov 15 04:26:47.599: vcpu-1| Backtrace[14] 0x3e9620f8 eip 0x8b1f728

Nov 15 04:26:47.599: vcpu-1| Backtrace[15] 0x3e962148 eip 0x8b22bad

Nov 15 04:26:47.600: vcpu-1| Backtrace[16] 0x3e962178 eip 0x8b8359a

Nov 15 04:26:47.600: vcpu-1| Backtrace[17] 0x3e9a3218 eip 0x8cd04e4

Nov 15 04:26:47.601: vcpu-1| SymBacktrace[0] 0x3e961138 eip 0x8f1027d in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.602: vcpu-1| SymBacktrace[1] 0x3e961578 eip 0x8b12b34 in function Panic in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.602: vcpu-1| SymBacktrace[2] 0x3e961648 eip 0x8f0a2fa in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.603: vcpu-1| SymBacktrace[3] 0x3e961668 eip 0x8f0a8a4 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.604: vcpu-1| SymBacktrace[4] 0x3e961aa8 eip 0x8f11899 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.604: vcpu-1| SymBacktrace[5] 0x3e961ee8 eip 0x8b12ac3 in function Panic in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.605: vcpu-1| SymBacktrace[6] 0x3e961f18 eip 0x8f0daa8 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.605: vcpu-1| SymBacktrace[7] 0x3e961f68 eip 0x8f0eb54 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.606: vcpu-1| SymBacktrace[8] 0x3e961f88 eip 0x8b14544 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.607: vcpu-1| SymBacktrace[9] 0x3e961fd8 eip 0x8f0b0cf in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.607: vcpu-1| SymBacktrace[10] 0x3e962008 eip 0x8b143fb in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.608: vcpu-1| SymBacktrace[11] 0x3e962038 eip 0x8b1489b in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.609: vcpu-1| SymBacktrace[12] 0x3e962078 eip 0x8b23001 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.609: vcpu-1| SymBacktrace[13] 0x3e9620a8 eip 0x8b2045b in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.610: vcpu-1| SymBacktrace[14] 0x3e9620f8 eip 0x8b1f728 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.611: vcpu-1| SymBacktrace[15] 0x3e962148 eip 0x8b22bad in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.611: vcpu-1| SymBacktrace[16] 0x3e962178 eip 0x8b8359a in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

Nov 15 04:26:47.612: vcpu-1| SymBacktrace[17] 0x3e9a3218 eip 0x8cd04e4 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0x8ab4000

0 Kudos
wila
Immortal
Immortal

Hi,

PMJI, the line:

Nov 15 04:26:47.588: vcpu-1| Unrecoverable memory allocation failure at bora/lib/user/msg.c:2

seems to indicate that you are having hardware problems with the memory on your host. But as you are talking about 3 different hosts, that seems unlikely.

It is also possible that your guest has a memory leak of some sorts, especially if this only happens with a few isolated VM's.

I would check the memory usage in your guests.

Another reason might be that the problem happens when paging to disk and in that case it could actually be your storage.

As you are not seeing any details at host or storage level, I would try to see if you can find out somehow what was running when your guest is faulting.

Do check the log files in the guest.

Maybe Matt has some other ideas you can look into.



--

Wil

_____________________________________________________

VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
0 Kudos
dlhartley
Contributor
Contributor

Thanks guys.

From what I can tell, and what I remember around the times that a couple of others faulted, the memory usage is generally around 75% of what's allocated (i.e. comparing what is allocated to what is consumed).

Storage being an issue did occur to me - but i'm not sure how i'd go about checking whether anything is wrong as it seems to work 99% of the time, and whether there is anything that I can change to either speed it up or make it more reliable. We're using iSCSI over gig ethernet for the virtual disks, and each VM host has a dedicated gigabit link. The iSCSI target is on a x64 Windows UDS Server 2003 box, and has been working well.

The most recent VM to fault was stored locally on the VM host itself rather than on the SAN, just to complicate things a bit more!

All of the VM faults occur overnight, and I find out that the machine is off when I come into work in the morning, so it's hard to tell exactly what caused the issue. From the Guest OS side of things, the machine isn't shutdown, it's completely powered off and there's nothing in the event logs to indicate that anything within the OS shut the VM down.

Is there anything perhaps on the iSCSI target that I can check, configuration-wise or logfile wise that might indicate an issue with the storage? I don't expect it to be a network issue either, but i'm open to suggestions.

- David

0 Kudos
Rumple
Virtuoso
Virtuoso

So the guest VM's are indicated an unexpected shutdown (event ID 6008?)

Can you find if there is anything in the esx host losts that might indicate that there was a vmotion happening at the time? Do you have VM heartbeat enabled on your cluster DRS settings? Thats been known to shutdown vm's...

0 Kudos
dlhartley
Contributor
Contributor

Correct, the event log shows events with ID 6008.

I can't see anything that would indicate a vmotion occurring at the same time, and the DRS history isn't showing anything like that either. The heartbeat was enabled, yes.

0 Kudos
Rumple
Virtuoso
Virtuoso

I would start by diabling the heartbeat. Its known to cause issues as its too sensitive.

If you run just fine without the heartbeat on the VM, then you know where the problem is occuring...

0 Kudos
dlhartley
Contributor
Contributor

Ok. I did that when you asked if was on - we'll see how we go. It hasn't been a regular thing, just strange and inconvenient!

0 Kudos
dlhartley
Contributor
Contributor

This issue has just popped up again, and the vmware.log for the VM in question contains exactly the same errors as last time (even though it was a different VM last time).

The last two lines read:

Nov 19 18:40:48.983: vcpu-1| SymBacktrace[16] 0x33700178 eip 0xab5959a in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0xaa8a000

Nov 19 18:40:48.984: vcpu-1| SymBacktrace[17] 0x33741218 eip 0xaca64e4 in function (null) in object /usr/lib/vmware/bin/vmware-vmx loaded at 0xaa8a000

Any ideas? This is rather frustrating, about to log another case for it.

-- David

0 Kudos
mcowger
Immortal
Immortal

Ahh - this is a failure of the vmx process itself Smiley Happy

Have the latest patches? If so, this is definitely something for a case for VMware.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
dlhartley
Contributor
Contributor

Yep have the latest patches - a severity 1 case has been logged, so hopefully it'll be resolved soon. I have a bad feeling that it has something to do with the iSCSI SAN, so we'll see!

Thanks for all your help thus far.

0 Kudos
mcowger
Immortal
Immortal

This almost certainly isn't the array.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

HA setting for isolation response is set to leave VM power on / off? just in case...

vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org
0 Kudos