how frequent it happen? and also is it the first time? please elaborate more for us to assist you further.
What sort of VM ? have you got something running snapshots to it ?
Hello,
This is first time this problem occured. This is fileserver and clients open documents via network sharing on this server. Ping is ok (there is no network problems) After some time server on VM (Windows 2003 R2) freezes and clients lost server for 5-6 seconds. Then they continue to work.
There are no errors in event viewer. I found similar problem in communities with
vmx| GuestRpcSendTimedOut: message to toolbox timed out. and
vmx| GuestRpc: app toolbox's second ping timeout; assuming app is down
I uninstalled vmware tools but problem still remains. Freeze is more frequenty when more client work on fileserver (VM). Storage is IBM DS5100, ESX is 4.1. There are no snapshots on VM.
Is it possible the problem to be with I/O on storage?
What is your experience with this type of error logs?
ok its a file server - wonder if its a pathing issue to your back end storage?
No, It't not SAN issue, this VM worked fine without problems 2-3 days ago. SAN is defined one year ago by the book. It's not SAN issue.
The only things that change/problem in this 2-3 days are:
1. Windows updates and antivirus updates.
2. Failed battery cache pack on IBM Storage DS5020 on DR site (there is a remote mirroring with DS5100, but this issue is a little bit unlogical). I can't point to some logical dependence in this issue. (maybe some bug ???)
I want to be sure what kind of problem is this issue. Is it I/O problem (Storage), or Is it some vmware problem (vmware tools ...) ?
Is your storage is set for synchronus replication , writes are not being commited untill they have been written at your secondary site.
If the performance of your secondary array has degraded - that coudl in turn affect performance of primary storage.
Failed battery cache pack on IBM Storage DS5020
Random guess but is cache mirroring enabled on your controllers? Maybe it can't mirror the cache anymore and the IO request fails
Oli
Check the vmkwarning logs. You should get a clue if it is a problem been caused due to Storage.
VMs may freeze if the storage and ESX host connectivity has some issues or there is a path thrashing going on.
If possible try to migrate the VM to local storage and check if that resolves the issue.
If possible try to migrate the VM to local storage and check if that resolves the issue.
This is a great suggestion! Especially if you can hot clone...
Oli
Storage use asinchrounus remote mirroring. I don't know how this can affect to primary LUN's
Ok, thanks for suggestions I allready contact vmware support about this issue, I'll send feedback when solve the problem.
Are you managing the host via virtual center? Could there be an issue with the number of connections or the vc services being restarted?
Hi to all,
I want to share feedback from this problem with you all. The problem definitlly was I/O problem (not vmware tools, not HW problem ...).
As I allready mentioned storage system on remote site had failed battery (there was assynchronus replication with primary storage). The whole situation with replication impact on primary site is absoluttly unlogical, but when the cache battery was replaced the problem gone. Definittly it is some IBM storage async replication bug