VMware Cloud Community
cvann357
Contributor
Contributor

Lost connection to HA Host - esx.conf errors in console

All the VM's seem to be running, but I cannot access the host from the vSphere client app (shows disconnected).

When I open up the console I get repeating errors:

Exception occured: Error interacting with configuration file /etc/vmware/esx.conf: Unable to create or open a LOCK file : Read-only file system

touch: connat touch '/var/lock/subsys/sfcbd-watchdog': Read-only file system

Failed to remove stale ticket /var/run/vmware/tickets/vmtck-0474d28d-efca-4d

I'll finish by stating I'm new to VMware, came from the dark side (MS HyperV) - so take it easy on me please....

0 Kudos
2 Replies
peterdabr
Hot Shot
Hot Shot

The file system the host turned read-only on could be related to some problems with the storage that holds ESX files. Not sure if you boot ESX from local drive or from shared storage but make sure the storage is accessible. In the past, I had similar problem with OCZ branded SSD drive failing over time to accept any writes until the reboot of ESX host took place. The OCZ had a fault firmware.Your VMs are most likely running fine as they are on shared storage.If they are on the same local storage as ESX host, then all of filesystems for VMs will become read-only as well and manual intervention to fix it in VMs will be required.

Nonetheless, reboot ESX host to make sure LUN is accessible and that ESX host can boot up fine. Then check the log files in /var/log/messages and /var/log/vmkernel (if storage is iscsi) to find clue about storage problems. Also make sure underlying storage hardware like raid card, raidset (if any) works properly and raidset is synchronized. Check for errors in raidcard logs.

Also, make sure your ESX host has the latest patches.

0 Kudos
ds236
Contributor
Contributor

I'm seeing the same issue with ESXi 4.1 on two brand-new servers. I can configure things for a while after boot, and other things appear fine for a while after boot, then I stop being able to make any config changes, same as reported in this thread. Another reboot and I'm back in.

The whole comment about the disk (in my case a local drives, brand new, and having run perfectly on tests run with another OS) are still writeable. Logging into maintenance access (via SSH) I can certainly create files without issue. (a simple "touch foo" does not fail, for example). I also see the system successfully writing to /var/log/messages, something that would not happen if the file system had switched to read-only.

Other things stop working right too... IMPI status of host sensors stop being reported too. So it's not just updating configs that doesn't work, it's a lot of interaction with the system. Is it possible the hardware has issues? Sure. Two identical systems with the same issue make it unlikely (and a third, that I'd used showed the same issues too).

Something running on a periodic basis appears to cause problems, and rebooting appears to solve them. This sure sounds like a software problem and not hardware. Coming up with a way to diagnose this would surely be helpful, and it sounds like might help a lot of folks.

0 Kudos