Good Afternoon,
Since upgrading to ESXi 5.1 last fall, we have been experiencing a problem with the ramdisk filling up on our hosts. I have not yet been able to determine the cause, but when the ramdisk hits the wall, it is almost impossible to troubleshoot. It usualy appears after 20 or so days of continuous uptime. Once the ramdisk becomes full, it is not obvious that there is a problem untill you do something like a vmotion. Then it fails with the following message:
A gereral system error occured
If you look at the host events, there are many of the following types of entries:
Hi,
What is hardware vendor for ESXi host.
Here is one KB
Regards
Mohammed
Seems bad, can you still login using the DCUI or local shell instead of SSH?
Maybe you shouldn't restart the management agents yet, there's a chance this will fail starting the agents again as well, leaving you with disconnected, completely unmanageable host too.
If you can still login through the local shell then check if it's really ramdisk space or ESXi inodes filling up the host:
If you can't login then you might still be able to run these esxcli commands from a remote host with the vCLI installed (like the vMA):
# esxcli --server $server system visorfs ramdisk list
Ramdisk Name System Reserved Maximum Used Peak Used Free Reserved Free Maximum Inodes Allocated Inodes Used Inodes Mount Point
------------ ------ --------- ---------- -------- --------- ---- ------------- -------------- ---------------- ----------- ---------------------------
root true 32768 KiB 32768 KiB 3856 KiB 3864 KiB 88 % 88 % 8192 4096 2711 /
etc true 28672 KiB 28672 KiB 316 KiB 356 KiB 98 % 98 % 4096 1024 458 /etc
tmp false 2048 KiB 196608 KiB 6888 KiB 8508 KiB 96 % 0 % 8192 256 75 /tmp
hostdstats false 0 KiB 654336 KiB 3340 KiB 3340 KiB 99 % 0 % 8192 32 4 /var/lib/vmware/hostd/stats
# esxcli --server $server system visorfs get
Total Inodes: 524288
Used Inodes: 3247
Unlinked Inodes: 0
Reserved Inodes: 0
Peak Used Inodes: 3338
Peak Unlinked Inodes: 2
Peak Reserved Inodes: 2
Free Inode Percent: 99
Lowest Free Inode Percent: 99
Check this post
It seems to be caused by SNMP using all available inodes.
I was unable to vmotion machines off this host, did the above regarding snmp and I am able to evacuate the VMs from this host, so I may update to update 1 which should curb this problem.
Hi Andrew,
This looks very similar to an issue that I had ~5 months ago, but on version 5.0. In my case, sfcbd watchdog was exhausting inodes causing the host to become unresponsive (as per the links posted by MKguy).
I had a support case logged with VMware who provide a hot patch to address this, but this has long been superseded. In between VMware providing the hot patch, I needed to monitor all hosts to make sure I that I could restart the hosts well in advance of any unplanned downtime and saved my bacon on two separate occasions with a PowerShell monitoring script.
Here are some links to my original posts;
http://communities.vmware.com/thread/430571?start=0&tstart=0
http://communities.vmware.com/thread/431987?start=0&tstart=0
I've updated the script quite significantly since my posts and continue to monitor in the background as a matter of course, but as random as the issue appeared it seemingly disappeared.
Might be worth logging this one to VMware?
Cheers,
Jon