We are hosting a farm of 4 ESX Hosts.
I discovered a few hours ago than one of them was show as "disconnected" in Virtual Center Interface. All VMs still running well
I discovered that /dev/sda2 was full because of a few core dumps in /var/core
I deleted a few of them after taking backup but this ESX host still appear as disconnected.
df -h returns after deleting 300 M of core dumps a weird result. Disk used is still 100%. Is there a trash on Vmware that I need to empty? How can I get back this host to connected state?
/dev/sda2 4.9G 4.6G 0 100% /
as far as I know there is no trash - can you post the output of the vdf -h command - have you tried reconnecting to the host - it will not automatically reconnect - right click on the host in the inventory and reconnect -
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 4.9G 4.6G 0 100% /
/dev/sda1 99M 26M 68M 28% /boot
none 132M 0 132M 0% /dev/shm
/dev/sda6 2.0G 144M 1.7G 8% /var/log
/vmfs/devices 10.0T 0 10.0T 0% /vmfs/devices
/vmfs/volumes/46f25c02-86eb125c-b2f1-001a64646fc8
203G 200G 3.6G 98% /vmfs/volumes/DATA1
/vmfs/volumes/46f25c20-5b2c6774-b669-001a64646fc8
204G 198G 6.2G 96% /vmfs/volumes/DATA2
/vmfs/volumes/471f5442-b8b1d556-fe80-001a64646fc8
419G 406G 13G 96% /vmfs/volumes/DATA3
/vmfs/volumes/471f545a-23995656-e774-001a64646fc8
217G 210G 6.6G 96% /vmfs/volumes/DATA4
/vmfs/volumes/471f85a3-90e292a4-08d6-001a64646fc8
119G 119G 140M 99% /vmfs/volumes/ORACLE1
/vmfs/volumes/471f916d-ae01a9aa-b364-001a64646fc8
203G 191G 12G 94% /vmfs/volumes/OS1
/vmfs/volumes/471f91db-8032e070-ff1f-001a64646fc8
204G 200G 4.4G 97% /vmfs/volumes/OS2
/vmfs/volumes/47221fce-a6512d7c-ef2d-001a64646fc8
59G 59G 141M 99% /vmfs/volumes/ORACLE2
/vmfs/volumes/47a7335e-d97a22d2-4e8e-001a64646fc8
878G 463G 415G 52% /vmfs/volumes/LUN_9-11
/vmfs/volumes/47b5be41-219e25e0-870f-001a64646fc8
477G 477G 328M 99% /vmfs/volumes/LUN_10
/vmfs/volumes/47ea63b1-0f30153c-7e0c-001a646469ee
60G 17G 43G 28% /vmfs/volumes/local VS03
reconnect option not available. It appears gray
I only have disconnect option (funny for a host which is already in disconnect state)
Here's an idea, from command line execute the following as root.
cd /
du -has *
This will go into the root directory and do a file size on each folder existing printing it's name. Once this is done you can get a general idea of what is sucking up the space. Most of the time users place stuff in their home directories (which is why it's a good idea to create a separate partition for /home) and it'll fill up the disk.
Post the outputs if your not sure.
I deleted all /var/core files
Disk space is back to normal values
but ESX host still appear as disconnected in Virtual Center, and still does not respond by http
Found this in /var/log/messages at the moment of crash dump generation. Maybe I should restart '/usr/sbin/vmware-hostd -u' but I am not sure and I do not want all VMs to stop suddently as they are actually running well even if I cannot manage them
Jul 31 01:40:02 reipesxvs02 VMware[init]: + Aborted (core dumped) setsid $CMD
Jul 31 01:40:02 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 289333 seconds
Jul 31 01:40:02 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:02 reipesxvs02 hostd-support: Failed to create directory hostd-support-28861
Jul 31 01:40:02 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:05 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 289333 seconds
Jul 31 01:40:05 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 1 seconds (quick failure 1)
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 0 seconds (quick failure 2)
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 0 seconds (quick failure 3)
Jul 31 01:40:06 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 1 seconds (quick failure 4)
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 0 seconds (quick failure 5)
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 0 seconds (quick failure 6)
Jul 31 01:40:07 reipesxvs02 watchdog-vpxa: End '/opt/vmware/vpxa/sbin/vpxa', failure limit reached
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 7 seconds (quick failure 1)
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:09 reipesxvs02 hostd-support: Failed to create directory hostd-support-1687
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 2)
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:09 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 1 seconds (quick failure 3)
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 4)
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:10 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 1 seconds (quick failure 5)
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u'
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: '/usr/sbin/vmware-hostd -u' exited after 0 seconds (quick failure 6)
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Jul 31 01:40:11 reipesxvs02 watchdog-hostd: End '/usr/sbin/vmware-hostd -u', failure limit reached
no
option is greyed
I have the same problem, did you solve it?
My solution was to delete the core file and delete VMwareClusterManager.trace file that was over 1.4GB and restart the manegment service. Hope this helps anybody with simular problem.
Hello,
Until you free up space in /, you will have issues with management of ESX.
Find what files are taking up the most space, perhaps it is something in /tmp, /var, or in /home?
You may want to consider reinstalling but add 5GB partitions for the following filesystems:
/
/var
/var/log
/tmp
/home
/boot should be 200MBs or so
This will allow plenty of room so that / itself does not fill up and cause issues.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
SearchVMware Blog: http://itknowledgeexchange.techtarget.com/virtualization-pro/
Blue Gears Blogs - http://www.itworld.com/ and http://www.networkworld.com/community/haletky
As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization