This is first time this problem occured. This is fileserver and clients open documents via network sharing on this server. Ping is ok (there is no network problems) After some time server on VM (Windows 2003 R2) freezes and clients lost server for 5-6 seconds. Then they continue to work.
There are no errors in event viewer. I found similar problem in communities with
vmx| GuestRpcSendTimedOut: message to toolbox timed out. and
vmx| GuestRpc: app toolbox's second ping timeout; assuming app is down
I uninstalled vmware tools but problem still remains. Freeze is more frequenty when more client work on fileserver (VM). Storage is IBM DS5100, ESX is 4.1. There are no snapshots on VM.
Is it possible the problem to be with I/O on storage?
What is your experience with this type of error logs?
ok its a file server - wonder if its a pathing issue to your back end storage?
No, It't not SAN issue, this VM worked fine without problems 2-3 days ago. SAN is defined one year ago by the book. It's not SAN issue.
The only things that change/problem in this 2-3 days are:
1. Windows updates and antivirus updates.
2. Failed battery cache pack on IBM Storage DS5020 on DR site (there is a remote mirroring with DS5100, but this issue is a little bit unlogical). I can't point to some logical dependence in this issue. (maybe some bug ???)
I want to be sure what kind of problem is this issue. Is it I/O problem (Storage), or Is it some vmware problem (vmware tools ...) ?
Is your storage is set for synchronus replication , writes are not being commited untill they have been written at your secondary site.
If the performance of your secondary array has degraded - that coudl in turn affect performance of primary storage.
Check the vmkwarning logs. You should get a clue if it is a problem been caused due to Storage.
VMs may freeze if the storage and ESX host connectivity has some issues or there is a path thrashing going on.
If possible try to migrate the VM to local storage and check if that resolves the issue.
Storage use asinchrounus remote mirroring. I don't know how this can affect to primary LUN's
Ok, thanks for suggestions I allready contact vmware support about this issue, I'll send feedback when solve the problem.
Hi to all,
I want to share feedback from this problem with you all. The problem definitlly was I/O problem (not vmware tools, not HW problem ...).
As I allready mentioned storage system on remote site had failed battery (there was assynchronus replication with primary storage). The whole situation with replication impact on primary site is absoluttly unlogical, but when the cache battery was replaced the problem gone. Definittly it is some IBM storage async replication bug