Solved: Re: Odd Console Error after rebooting ESX 4.1

trwagner1 · ‎08-24-2011

I received the following error on an ESX 4.1 box recently rebooted.

Error interacting with configuration file /etc/vmware/esx.conf: failed attempting to lock
file. Another process has locked the file for more than 20 seconds. The processing
holding the lock is /usr/sbin/esxcfg-nas (2999). This operation will complete if it is
run again after the lock is released.

Then, after seeing this twice on the console, I receive the following error:

option with the name CIMEnabled already exists

If I look at this process, it is: "/usr/sbin/esxcfg-nas -b". I'm not sure what the -b option is for.

The server remains off line. If I kill that esxcfg-nas process, nothing happens, but I didn't really expect it would.

That's not something I've run across before.

Has anyone seen anything like this?

Thanks

Ted

bretti · ‎08-24-2011

It sounds like to me, the host was trying to remount the storage on boot up and unable to reach the NFS server. I am curious though as to why removing vmnic0 would correct that problem. It's possible that if the load balancing policy on the vSwitch was not set to route based on IP Hash and you had two uplinks connected to the same vSwitch, that the communication from the host to the NFS server was interrupted during the mount operation. By removing one of the uplinks all the traffic was forced down one connection. Have you tried to put vmnic0 back in the configuration at this point?

View solution in original post

trwagner1 · ‎08-24-2011

I neglected to add that I tried to restore the esx.conf file using the procedures found in KB 1004451 but that doesn't fix the problem.

trwagner1 · ‎08-24-2011

Well, in the end, it was a painful day but as it turns out, the problem had to do with the configuration of vSwitch0 and using NFS. I had to delete the link to the NFS data store from the host, then remove vmnic0 from vSwitch0. Once I did that the VM was reachable via TCP and vCenter. The odd thing was that all vmnics were responding and could see CDP information.

I'm still scratching my head a bit on this one. If someone could explain why this would cause that, I'll grant you the credit so I'm not answering my own question that way.

bretti · ‎08-24-2011

It sounds like to me, the host was trying to remount the storage on boot up and unable to reach the NFS server. I am curious though as to why removing vmnic0 would correct that problem. It's possible that if the load balancing policy on the vSwitch was not set to route based on IP Hash and you had two uplinks connected to the same vSwitch, that the communication from the host to the NFS server was interrupted during the mount operation. By removing one of the uplinks all the traffic was forced down one connection. Have you tried to put vmnic0 back in the configuration at this point?

trwagner1 · ‎08-25-2011

That is entirely possible. The vSwitch didn't have any special configuration on load balancing or fail over. I just had the default settings on the nics and the teaming set with default settings.

Now that I'm reviewing these settings and writing this, I think I now know why this happened. And, you are correct, the host does appear to have tried to remount the storage on boot up....

This is something I'll have to address with our Cisco folks. The gentleman who configured the ports on the Cisco switch for vmnic0 and vmnic1 is no longer with my company.... but, here's the run down.

for some reason, vmnic0 thinks it's on a network with just 1 IP address. It should be a trunk port and see two different vlans. The ports on the switch should be configured identical and they are not. That would cause the problem with the host not being ablve to reach the NFS server.

All

Odd Console Error after rebooting ESX 4.1