VMware Cloud Community
tawatson
Contributor
Contributor

The file table of the ramdisk 'root' is full on ESXi 5.1

Good Afternoon,

Since upgrading to ESXi 5.1 last fall, we have been experiencing a problem with the ramdisk filling up on our hosts.  I have not yet been able to determine the cause, but when the ramdisk hits the wall, it is almost impossible to troubleshoot.  It usualy appears after 20 or so days of continuous uptime.  Once the ramdisk becomes full, it is not obvious that there is a problem untill you do something like a vmotion.  Then it fails with the following message:

A gereral system error occured

If you look at the host events, there are many of the following types of entries:

The file table of the ramdisk 'root' is full.  As a result, the file
/var/lib/vmware/hostd/journal/1358360282.17 could not be created by
the application 'hostd-worker'.
error
1/16/2013 11:18:02 AM
So I attempt to start SSH on the host to see if I can find what is taking up all of the space.  I go to Configuration -> security profile -> services properties, and enable SSH.  It appears that ssh has started, but in fact it has not.  It looks like ssh will not start due to the fact that the ramdisk is full.  The same thing happens with enabling the local console access.  So now I am stuck.  I have about 30 production vm's that are stuck on this host with a condition that only a reboot seems to fix.  Has anyone else seen this issue?
Thanks,
Andrew Watson
Tags (3)
4 Replies
memaad
Virtuoso
Virtuoso

Hi,

What is hardware vendor for ESXi host.

Here is one KB

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203307...

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
MKguy
Virtuoso
Virtuoso

Seems bad, can you still login using the DCUI or local shell instead of SSH?

Maybe you shouldn't restart the management agents yet, there's a chance this will fail starting the agents again as well, leaving you with disconnected, completely unmanageable host too.

If you can still login through the local shell then check if it's really ramdisk space or ESXi inodes filling up the host:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203779...

http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=&popup=true&languageId=&externalID=1...

If you can't login then you might still be able to run these esxcli commands from a remote host with the vCLI installed (like the vMA):

# esxcli --server $server system visorfs ramdisk list
Ramdisk Name  System   Reserved     Maximum      Used  Peak Used  Free  Reserved Free  Maximum Inodes  Allocated Inodes  Used Inodes  Mount Point            
------------  ------  ---------  ----------  --------  ---------  ----  -------------  --------------  ----------------  -----------  ---------------------------
root            true  32768 KiB   32768 KiB  3856 KiB   3864 KiB  88 %           88 %            8192              4096         2711  /                      
etc             true  28672 KiB   28672 KiB   316 KiB    356 KiB  98 %           98 %            4096              1024          458  /etc                   
tmp            false   2048 KiB  196608 KiB  6888 KiB   8508 KiB  96 %            0 %            8192               256           75  /tmp                   
hostdstats     false      0 KiB  654336 KiB  3340 KiB   3340 KiB  99 %            0 %            8192                32            4  /var/lib/vmware/hostd/stats

# esxcli --server $server system visorfs get

   Total Inodes: 524288

   Used Inodes: 3247

   Unlinked Inodes: 0

   Reserved Inodes: 0

   Peak Used Inodes: 3338

   Peak Unlinked Inodes: 2

   Peak Reserved Inodes: 2

   Free Inode Percent: 99

   Lowest Free Inode Percent: 99

-- http://alpacapowered.wordpress.com
Reply
0 Kudos
Terry3
Contributor
Contributor

Check this post

VMware KB: ESXi 5.1 host becomes unresponsive when attempting a vMotion migration or a configuration...

It seems to be caused by SNMP using all available inodes.

I was unable to vmotion machines off this host, did the above regarding snmp and I am able to evacuate the VMs from this host, so I may update to update 1 which should curb this problem.

jrmunday
Commander
Commander

Hi Andrew,

This looks very similar to an issue that I had ~5 months ago, but on version 5.0. In my case, sfcbd watchdog was exhausting inodes causing the host to become unresponsive (as per the links posted by MKguy).

I had a support case logged with VMware who provide a hot patch to address this, but this has long been superseded. In between VMware providing the hot patch, I needed to monitor all hosts to make sure I that I could restart the hosts well in advance of any unplanned downtime and saved my bacon on two separate occasions with a PowerShell monitoring script.

Here are some links to my original posts;

http://communities.vmware.com/thread/430571?start=0&tstart=0

http://communities.vmware.com/thread/431987?start=0&tstart=0

I've updated the script quite significantly since my posts and continue to monitor in the background as a matter of course, but as random as the issue appeared it seemingly disappeared.

Might be worth logging this one to VMware?

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
Reply
0 Kudos