We have come across a curious issue with degraded or unresponsive NAS mounts within a VM (RedHat 4 32-bit) when it is hosted on ESX 4.1 that goes away when the VM is migrated to an ESX server running ESX 4.0 U2. We have opened a case with VMware.
The VM has two NAS mounts and executing a simple 'ls' command on the mount point or a 'df' command will hang uner 4.1. The odd part of it is that we cannot reproduce the problem on our internal test VMs, mounting the exact same NAS devices. So in summary the problem is only reproducable on certain VMs but is always caused by migrating to ESX 4.1 and fixed by migrating to ESX 4.0 U2. Migration itself does not fix it, for example going from and ESX 4.1 server to another ESX 4.1 server does not fix it and migrating from ESX 4.0 U2 to ESX 4.0 U2 does not cause it.
It appears to be something about how these particular VMs interact with 4.1. We thought turning off IPTABLES within the VM helped, but that is not consistent. We are currently working with a clone of one of the problematic VMs so that we can strip it down to the simplest level that still causes the problem.
Yes, I should have mentioned that we are running the most current version. It is an idea to remove tools and see if there is any change in behavior.
Just another update. We replicated this behavior on one of our small RH4 linux test VMs so we know it is not specific to something within the VM managed by the original administrator who brought the problem to our attention. We also know that the issue goes away under 4.1 if you switch from a vmxnet driver to E1000, although performance does degrade with E1000.
Will test with RedHat 5 next and see if other mount points cause the same issue. Version of ESX definitely plays a role since migrating back to ESX 4 always solves the problem, whereas migrating to another ESX 4.1 server does not.