VMware Cloud Community
Bolgard
Contributor
Contributor
Jump to solution

The s*** has hit the fan

Hi there,

I'm having some trouble with VMware ESXi. It started when I created a new VM and installed it with Windows Server 2003. After shutting it down, things started happening:

  • If I go to the Configuration tab, Health status it says "Unknown" on all devices

  • I can't use the console for any VM, they respond with "Connection terminated" or, if they're powered off "A general system error occured"

So I figured a reboot might solve it. I entered maintenance mode and then clicked "Reboot". And now it seems halted somewhere in between. I can't access the commands "Reboot" and "Shutdown" anymore, and I can't exit maintenance mode.

If I look at the logs, in /var/log/messages I get alot of "init fn user failed with: Out of memory!" and "WorldInit failed: trying to cleanup.".

Anyone have any idea what's going on here?

EDIT: After a hard reboot I'm able to start the VMs again. Don't dare touching much though. What could be the problem here?

Message was edited by: tom howarth Profanity removed from Subject line

Reply
0 Kudos
34 Replies
Bolgard
Contributor
Contributor
Jump to solution

I had a VM converted from VMware Server into VMware ESXi. When it was running on VMware Server I installed VMware Tools on it. I then never uninstalled and re-installed VMware Tools when I moved it to ESXi. Could the problem have been caused by this VM and it VMware Tools that belonged to VMware Server?

I have now uninstalled and re-installed VMware Tools for ESXi. Problem haven't reoccured yet (though it usually takes a few weeks). I don't have a RCLI up and running now, but if it happens again I will set it up. It works fine to SSH into ESXi and run esxtop from there, and everything looks normal.

I'm running VMware ESX Server 3i, 3.5.0, 123629 according to VMware Infrastructure Client.

Reply
0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

It may have been something like that. If you run esxtop on a daily basis you may be able to see if some process is slowly using more and more memory.

Reply
0 Kudos
charlesleaverdd
Contributor
Contributor
Jump to solution

EDIT2: Memtest have completed a pass now. "Pass complete, no errors, press Esc to exit". So problem is not with the RAM. I'm starting to believe this is a bug...

Wow dude that's pretty serious and incredibly annoying. I'm having to go to a lot of trouble to try and get these virtual machines off my ESXi box so that I can bounce it. I almost hoped that it was due to faulty RAM, just so the issue would be solved. Are you sure you memtested for long enough? I was planning on running it for like a week or more even. Cos the RAM doesn't necessarily illustrate its flaw immediately, does it? By the way I also get this: "failure forking: Cannot allocate memory".

I'm amazed that all the virtual machines are 100% fine. Including the one that's hammering the box.

Reply
0 Kudos
charlesleaverdd
Contributor
Contributor
Jump to solution

Oh dear. How did we overlook this: ?

... and I quote: "A memory corruption condition might occur in the virtual machine hardware. A malicious request sent from the guest operating system to the virtual hardware might cause the virtual hardware to write to uncontrolled physical memory." which is the first issue that the advisory mentions will be solved by the update. I had not done the update yet because I'm not able to bounce that box very easily due to change control etc.

Today the box finally gave in. Boom. Four notifications in my mailbox with the dreaded "Host DOWN alert for" in their subject. On arrival at the box I found this on the screen. I then hard powered the box off, powered back on, booted normally, and then everything that was meant to auto-start did exactly that and everything returned to normal.

As far as I'm concerned my issue is definitely caused by the fact that I have not updated. I'm not going to chase this anymore. I'll change my mind if after the upgrade I experience the same issue.

Reply
0 Kudos
RS_1
Enthusiast
Enthusiast
Jump to solution

I got the same problem on an IBM 3850 M2 ESX 3i 3.5.0 130755 (up to date) :

vmkernel: 27:18:33:59.650 cpu2:1376)WARNING: Heap: 1397: Heap globalCartel already at its maximumSize. Cannot expand.

vmkernel: 27:18:33:59.650 cpu2:1376)WARNING: Heap: 1522: Heap_Align(globalCartel, 48/48 bytes, 4 align) failed. caller: 0x73a8ce

vmkernel: 27:18:33:59.650 cpu2:1376)WARNING: World: vm 11666870: 910: init fn user failed with: Out of memory!

vmkernel: 27:18:33:59.650 cpu2:1376)WARNING: World: vm 11666870: 1775: WorldInit failed: trying to cleanup.

inetd[1370]: fork: Cannot allocate memory

vmkernel: 27:18:33:54.452 cpu6:1370)WARNING: Heap: 1397: Heap globalCartel already at its maximumSize. Cannot expand.

vmkernel: 27:18:33:54.452 cpu6:1370)WARNING: Heap: 1522: Heap_Align(globalCartel, 48/48 bytes, 4 align) failed. caller: 0x73a8ce

vmkernel: 27:18:33:54.452 cpu6:1370)WARNING: World: vm 11675061: 910: init fn user failed with: Out of memory!

vmkernel: 27:18:33:54.452 cpu6:1370)WARNING: World: vm 11675061: 1775: WorldInit failed: trying to cleanup.

Reply
0 Kudos
Bolgard
Contributor
Contributor
Jump to solution

charlesleaverdd: That patch is for the whole ESXi, right? Did you install the patch or just update the VMware Tools on the guests? My problem seems to be gone after updating VMware Tools on the guest.

Reply
0 Kudos
3sh
Contributor
Contributor
Jump to solution

have you run any diagnostics on your drives?

I assume the adaptec has some utilities you can use the check the health of the drives and raid array.

Reply
0 Kudos
Bolgard
Contributor
Contributor
Jump to solution

Thanks for your interest, but the problem is solved when updating VMware Tools.

I have currently NO way of checking the RAID array, because VMware's idea of I/O-compability does not cover monitoring and management Smiley Wink I recommend people to check that the RAID card they're buying is actually monitorable in VMware ESXi!

Reply
0 Kudos
StuartLittle
Contributor
Contributor
Jump to solution

I've been reading your thread guys and have the exact same issue/log messages on our Dell PowerEdge 1950III server. Your post explained the exact scenario our server is in at the moment (follow thread )

Bolgard: can you confirm how you updated the VMware Tools please and what version? Did you simply use the VI Client to update/reinstall VMware Tools on all your guest VM's when your host was fully operational again or did you need to download the latest VMware Tools from VMware's website and manually run on the guest VM's?!

Reason i'm asking is I would have thought updating VMware Tools from the VI Client would simply just reinstall the same version of the VMware tools on the guest VM's and not technically fix the issue in the long term.

Thanks for your help and for making this post (and to the other posters like charelesleaverdd) as it's been such a relief to see someone else has experienced the pain and sheer terror of their host causing problems! (not saying I enjoy reading about other people's pain - just that it's good to know i'm not alone on this issue).

Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso
Jump to solution

Given the portability of VMs between hardware, I would approach this by moving all VMs elsewhere and then using the manufacturers own diagnostics - destructive if necessary. Also Microsoft have a particularly good memory diagnostic utility available on the Windows 7 CD (boot to recovery mode) or downloadable from .

As an aside I would advise against 3rd party components in a production server. The cost of downtime will far outweigh capital savings.

Reply
0 Kudos
Schorschi
Expert
Expert
Jump to solution

This may be a bit off topic, but we have had quite a number of 6850s (say 2 or 3 per 100), single-core (declining), dual-core (declining), as well as quad-cores based, as they age, or should I say, as the RAM ages along with the 6850s, we have seen some odd issues. VMware ESX seems to abuse RAM. We tend to use our hardware for a minumum of 3 years and often use hardware the full 5 years or so, that typical end-of-life cycles support from Dell as well as other hardware vendors. Between 3 to 4 years of age, we see a inconsistent pattern of RAM issues. Servers that have worked without issue for 3 or more years just seem to start showing their age. Ignoring the expected hard disk failures, and once in a while a NIC or processor dies, or a cable goes bad, but RAM almost never until the last 3 years. In the last 3 years the frequency of RAM issues has dramatically increased. Be it a quality control issue given the major increase in density of DIMMs or materials quality issue, some how it happens. Most recently and frequently it has been with Dell OEM providers for RAM but I would not call it a Dell issue alone, since Dell relies on the same major RAM OEM providers as do HP, IBM, etc. What is a surprise is to have a 6850 that is just fine running ESX 3.5.x, rebuilt as ESX 4 (ESXi 4), and within hours, of initial reboot start throwing DIMM faults.

Reply
0 Kudos
here4now
Contributor
Contributor
Jump to solution

I presently run ESX 3i on multi hardware. HP and Dell. And I have seen many issues like this.

To recover the host without destorying your VM's. Log into the console of the machine be sure all VM guest are shut down. And on the console there is a option to reset the machine. This will reset ESX to day one. You'll lose all you esx config but not he guests. Then reconfig VM manager etc and your network.

Most like the issue you are having is with the CIM componet in ESX. I loks like a memory leak and tghe machine does seem to fall apart. Do a search on this site to find doc to disable CIM..

Good Luck..

Reply
0 Kudos
Bolgard
Contributor
Contributor
Jump to solution

Sorry for the late reply StuartLittle!! I haven't had any troubles with my virtual environment for a while (cross fingers), so I haven't been checking back here very frequently 😃

I manually uninstalled VMware Tools on my Ubuntu VM guest that I suspected was causing the problem (http://communities.vmware.com/message/1140808#1140808 asked how here) , and then re-installed it using the version that came with my ESXi. My problem was that I had converted this VM from VMware Server without removing and re-installing VMware Tools. Therefore I just installed the version from ESXi as I hadn't tried that.

Hope it helps and good luck, I feel your pain...

Reply
0 Kudos
StuartLittle
Contributor
Contributor
Jump to solution

Bolgard - thanks for your reply.

Just to update everyone on our situation - restarting the Host resolved the issue and it has not reoccured since but I am suspicious that a guest vm tools may also be causing the issue.

If it happens again I'll update all our guests to the latest version of VMware Tools.

Thanks for everyone's support and for piggy backing on your original thread Bolgard!

Reply
0 Kudos
HMC-Frank
Contributor
Contributor
Jump to solution

I have seen the same erors including "General System" error and the world.init and out of memory errors. Look here for similar discussion with a related VMware KB article in repsonse. Came out in past week. Restart of the host will clear memory, but most likley sfcbd process will fill up memory again causing host + console to hangup

official VMware KB

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101257...

Reply
0 Kudos