VMware

This Question is Answered

1 2 3 Previous Next 34 Replies Last post: Jul 27, 2009 11:46 AM by HMC-Frank  

The s*** has hit the fan posted: Jul 3, 2009 11:46 PM

Click to view Bolgard's profile Hot Shot 113 posts since
Aug 11, 2007
Hi there,

I'm having some trouble with VMware ESXi. It started when I created a new VM and installed it with Windows Server 2003. After shutting it down, things started happening:
  • If I go to the Configuration tab, Health status it says "Unknown" on all devices
  • I can't use the console for any VM, they respond with "Connection terminated" or, if they're powered off "A general system error occured"

So I figured a reboot might solve it. I entered maintenance mode and then clicked "Reboot". And now it seems halted somewhere in between. I can't access the commands "Reboot" and "Shutdown" anymore, and I can't exit maintenance mode.

If I look at the logs, in /var/log/messages I get alot of "init fn user failed with: Out of memory!" and "WorldInit failed: trying to cleanup.".

Anyone have any idea what's going on here?

EDIT: After a hard reboot I'm able to start the VMs again. Don't dare touching much though. What could be the problem here?

Message was edited by: tom howarth Profanity removed from Subject line

Re: The s*** has hit the fan

1. Jul 3, 2009 11:47 PM in response to: Bolgard
Click to view Jackobli's profile Master 1,100 posts since
Aug 28, 2007
Tell us something about your hardware.

Re: The s*** has hit the fan

3. Jul 3, 2009 11:53 PM in response to: Bolgard
Click to view Jackobli's profile Master 1,100 posts since
Aug 28, 2007
Bolgard schrieb:
I'm guessing it has something to do with the RAID controller even though it's supposed to be supported...? Any logs I could check?
Hmm, don't know the 3405, any logs in the BIOS-Configuration utility? Any messages saved in the logs viewable through VI-Client?
You are running RAID1/5? SAS/SATA? BBU installed?
I think any problems in Disk/RAID should lead to other errors. ECC-Errors at your system? Bad RAM is always a source for errors.

Re: The s*** has hit the fan

5. Jul 3, 2009 11:54 PM in response to: Bolgard
Click to view Jackobli's profile Master 1,100 posts since
Aug 28, 2007
Bolgard schrieb:
I guess I would need to run memtest or something similar to check for ECC-errors? Will do that when I access the server physically next time.
EDIT: Now when I come to think of it, there were some strange messages relating to RAM before the reboot. Something about heap and memory, and the log messages I mentioned in the first post in this thread.
memtest should throw out any RAM errors. If your server has ECC (what I really suppose) any one or two bit error in RAM will be detected, one bit errors should be corrected and usually logged somewhere in your servers BIOS.
You bought that server with it's RAM installed? According to the kingston-homepage, it has four banks. For best performance they recommand to install pairs.

Re: The s*** has hit the fan

7. Jul 3, 2009 11:55 PM in response to: Bolgard
Click to view Jackobli's profile Master 1,100 posts since
Aug 28, 2007
Bolgard schrieb:
Well, what I possibly could have done wrong with the RAM, I have done wrong now that I think of it.
  • I added 2 GB RAM (not certified by FS since they were alot cheeper, though I get a warning in BIOS about that. Figured it wouldn't matter that much, I have had several different RAM types in workstations and have always worked fine)
The harder a machine is using it's RAM, the sooner you will see errors.
* So there's totally 3 GB of RAM in 3 slots, so they're not grouper in pairs either.
Kingston says, it's not recommanded (performance deal), but should (!) work.
* And I'm not sure the added 2 GB is ECC (though I guess it has to be if the motherboards supports ECC?)
Quite sure, that the server should at least throw out a BIG warning, but most probably it has to be ECC.
So we've found the problem there, don't you think?
I've seen so much problems with RAM, that I suppose, it is the source. But you never know.
So try with 1 GB or spend lots of bucks for original RAM or some bucks for compatible. I (nearly) newer had problems with Kingston RAM and they will at least try to support you. Dunno where you live, but I found 2x2 GBytes KFJ-E50/4G for US $112.00 at the Kingston shop.

Re: The s*** has hit the fan

9. Jul 3, 2009 11:57 PM in response to: Bolgard
Click to view charlesleaverdd's profile Novice 6 posts since
Aug 7, 2008
I have a Dell PowerEdge 6850 with the following specs:

  • 4x DUAL CORE XEON 7120M, 3.0GHZ, 4

  • 32GB (16X2GB DUAL RANK DIMMS) 400MHZ

  • 4x 73GB SCSI ULTRA320 15K HD 1IN 80 PIN HDD

  • EMBEDDED RAID

  • ESX 3i 3.5.0 build 123629.
On there are five virtual machines. The utilisation on it was never more than half of the total that it has. I had also assigned the system itself 3000MHz of CPU and 2048GB RAM in case it ever got into a situation where it needed that.

Despite that I am also experiencing this problem. The entries in my logs look like this:

Jan 2 05:31:12 196.x.x.x vmkernel: 48:13:18:46.449 cpu13:12172097)WARNING: Heap: 1397: Heap globalCartel already at its maximumSize. Cannot expand.
Jan 2 05:31:12 196.x.x.x vmkernel: 48:13:18:46.449 cpu13:12172097)WARNING: Heap: 1522: Heap_Align(globalCartel, 48/48 bytes, 4 align) failed. caller: 0x73a8ae
Jan 2 05:31:12 196.x.x.x vmkernel: 48:13:18:46.449 cpu13:12172097)WARNING: World: vm 12172104: 910: init fn user failed with: Out of memory!
Jan 2 05:31:12 196.x.x.x vmkernel: 48:13:18:46.449 cpu13:12172097)WARNING: World: vm 12172104: 1775: WorldInit failed: trying to cleanup.

The ESXi is not able to be logged into any of the three ways I have tried, those being the unsupported console on the actual machine itself, the RCLI or the Windows VIclient. The connection eventually times out.

The five virtual machines are all still running perfectly, however. Is there a correct way I should be apprioaching this? I have found absolutely nothing other than this post. If there is nothing I can do then I was going to bounce the ESX box, but I am extremely weary of that as all of those machines are mission critical boxes that can't go down.

Thanks in advance for any ideas or suggestions.

Cheers, Charles.

Re: The s*** has hit the fan

11. Jul 3, 2009 11:59 PM in response to: Bolgard
Click to view charlesleaverdd's profile Novice 6 posts since
Aug 7, 2008
I cannot do anything from Virtual Center as the box is not even connected anymore using VirtualCenter and there is absolutely no way for me to do anything to or with it because it is completely dead to any of the normal mechanisms that I would use (VirtualCenter, the RCLI and even the unsupported console on the physical machine). So I can only shut the Virtual Machines down by going onto them and typing halt. The RAID card is the built in card that comes with that system, which is a PERC 4i. The RAM is all original RAM and has never been added to or touched in any way, so yes, identical.


I am having the exact same problem as you, I'm pretty sure of that. So I see you have hard rebooted your box multiple times. Does it always come back fine? Does VirtualCenter get affected in any way? Do those hosts go back from being "Unknown" to being what they are really called?


By the way I suspect one of the virtual machines to be the cause of this, as it is continuously complaining about maxing its RAM. Which I think makes your suspicion of a memory leak likely. But hey it could be faulty RAM too. It's not terribly easy for me to find out as these virtual machines cannot go down and so I can't spend ages running memtest on the box. If I can hard reboot then I can move them off to my other ESX server and then sure, I can try memtest.

Re: The s*** has hit the fan

13. Jul 4, 2009 12:00 AM in response to: Bolgard
Click to view charlesleaverdd's profile Novice 6 posts since
Aug 7, 2008
All of mine are running vmware tools. Whether I connect using VirtualCenter or directly to the machine via the VIclient it will not connect. I have tried restarting the management agents.

I was previously able to connect via the RCLI and also using VIclient or VirtualCenter, but now it has deteriorated past being able to do anything with it at all. Very impressive that the virtual machines are absolutely 100% fine still.

You should find the logs for each machine, if no other way then at least by using the datastore browser and browsing to the directory in which that machines disk files are. There you will find logs for that machine. My VMs weren't importe. A few may have been clones of others but I don't think that's relevant. The others are clearly running fine whereas this one always complains. We blame the application that runs on there but the developer gets very offended when we do. Hopefully he's not reading this right now. :)

VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities