Hi All,
I just had an interesting issue, and though I would share this as it might save you co-ordinating some planned downtime which could potentially be avoided.
We had a disconnected host where the guest VM's are all still running and can be accessed via RDP, but the host is not responsive through the iLO / DCUI and SSH is not running and can't be started.
The host logged the following sequential events in vCenter;
The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1481830/etc/hosts could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1482016/etc/sfcb/repository/root/interop/cim_listenerdestinationcimxml.idx could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1482194/etc/vmware/hostd/vmAutoStart.xml could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file /etc/vmware/esx.conf.LOCK.17554 could not be created by the application 'hostd-worker'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_threshold.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_hysteresis.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52c25dd2-064a-abee-ce4c-cafd051d527c could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sel_header.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52ca5a12-1d8d-7902-1e14-170d2c282951 could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_readings.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /etc/vmware/esx.conf.LOCK.17554 could not be created by the application 'hostd-worker'.
Unable to apply DRS resource settings on host. A general system error occurred: Invalid fault. This can significantly reduce the effectiveness of DRS.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/523777d0-72dc-9e0b-c6b0-9d32a5255317 could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52fc39a4-62d0-866e-50a3-663209c9ca28 could not be created by the application 'sfcb-CIMXML-Pro'.
The vSphere HA availability state of this host has changed to Unreachable
Host is not responding
Alarm 'Host connection state' on myhost.mydomain changed from Green to Red
Alarm 'Host connection state' on myhost.mydomain sent email to myemail@mydomain
vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
Alarm 'vSphere HA host status' on myhost.mydomain changed from Green to Red
vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
Cannot scan the host myhost.mydomain because its power state is unknown.
Host is not responding
I found this KB article, but was unable to start the process as I couldn't SSH onto the host;
Since I knew which guests were running on the affected host, I contacted the business and arranged emergency downtime to shut these guests down so that I could power cycle the host and deal with the issue. After lots of co-ordination we finally agreed on a suitable time which satisfied all business areas, and started the remediation.
Now here is the interesting part ... within seconds of shutting down guest VM's with a simple for loop and the shutdown command the host staus changed to Green and was connected to vCenter again.
for /f %i in (C:\_temp\targets.txt) do shutdown -s -m \\%i -t 0 -f
I enabled SSH and ran "stat -f /" - results below;
~ # stat -f /
File: "/"
ID: 1 Namelen: 127 Type: visorfs
Block size: 4096
Blocks: Total: 449852 Free: 324368 Available: 324368
Inodes: Total: 8192 Free: 55
After running throught the above mentioned KB article, the inodes were still exhausted;
/var/run/sfcb # stat -f /
File: "/"
ID: 1 Namelen: 127 Type: visorfs
Block size: 4096
Blocks: Total: 449852 Free: 324565 Available: 324565
Inodes: Total: 8192 Free: 122
So now that the host was available again I put it into Maintenance mode, rebooted it and checked again after the reboot (plenty of free inodes);
~ # stat -f /
File: "/"
ID: 1 Namelen: 127 Type: visorfs
Block size: 4096
Blocks: Total: 449852 Free: 332942 Available: 332942
Inodes: Total: 8192 Free: 5721
All VM's that were shutdown were now powered up using PowerCLI.
So the interesting point that could potentially be taken from this is that next time this issue occurs, I might be able to resolve the issue by shutting down one or more running VM's without affecting all guest VM's ... so perhaps shutdown the lowest priority non-production VM's first to see if this frees up enough inodes to get the host responsive again.
So two questions;
Cheers, & happy new year!
Jon
Note: Discussion successfully moved from VMware ESXi 5 to Availability: HA & FT
Bumping to the top ... Thanks.
Typically you should try to clean up certain directories. You will need to restart the management services in most cases before it becomes back again in vCenter.
Here's a KB that tells which directories to look for:
http://kb.vmware.com/kb/2037798
Not sure about monitoring it.
Hi Duncan,
Just closing this thread and feeding back some information ...
I have (with the help of the PowerCLI community) written a script to monitor this, see thread below;
http://communities.vmware.com/message/2178047#2178047
Cheers,
Jon
Awesome, thanks for sharing!