Hi @ all
Our Problem is the following:
Some of our guests have the problem, that they occasionally freeze. That means the guests don't respond to any network query (for example a ping) and the console is black.
When I open the console in the Virtual Infrastructure Client and use the mouse or keyboard the guest continue working normal.
There is neither in the Eventviewer of the Virtual Infrastructure Client nor in the Eventviewer of the guest OS something special noticed. There are also any of the powersaving options disabled and the newest VMware Tools installed.
Our Environment:
ESX 3.0.1
Virtual Infrastructure 2.0.1
The OS of the affected guest are "Windows 2000 Professional" and "Windows Server 2003 Standard"
Has/Had anyone the same problem?
PS: sorry for my probably not correct english...
Hi,
are there any other VMs on the same ESX host which do not have this special issue? Are they on the same virtual switch?
My first guess is that there are any networking problems inside your ESX host. Do you have any kind of network failover configured on your virtual switch, and if so: How is the configuration for failover (link detection, beacon probing, ...)
cu,
Alex
Hi Alex
Yes, they are other VMs on the same Host and the same virtual switch and they work properly.
The VMs i'd tell you are distributed over different ESX Hosts but are all on the same virtual switch but on different VLANs.
About failover setting:
Load Balancing: Port ID
Network Failure Detection: Link Status only
Notify Switches: Yes
Rolling: No
Active Adapters: 3 nic's that are connected to a cisco switch (3 GBit/s trunk)
Standby Adapters: None
Unused Adapters: None
greetings cedrick
Are you using VST? IF so, is it across an IOS and COS switch?
Sorry, but i don't know what VST is... sorry. Yes, we have a cisco switch with IOS OS.
New Info: while I was searching the vmware.log files of the affected VMs about suspect entries, there are some that attracted attention to me...
The follow events occour in that time where the VM was hanging:
vmware.log
Jan 23 14:42:18.811: mks| SOCKET 2 recv error 5: Input/output error
Jan 23 14:42:18.811: mks| SOCKET 2 destroying VNC backend on socket error: 5
Jan 23 14:43:45.386: mks| SOCKET 3 recv error 5: Input/output error
Jan 23 14:43:45.386: mks| SOCKET 3 destroying VNC backend on socket error: 5
Does anyone know what this means?
I don't know what it means, but I've experienced the same issue from time to time. Normally, the guests freeze up and don't respond (or respond VERY slowly to the console). Networking is gone.
This has happened on Windows 2000 Server and Windows Server 2003. normally, just resetting the VM takes care of it. It doesn't happen very often.. maybe one VM every few weeks at most. We had one the other day that we had to remove and re-add the vNIC on the Guest.
Sorry - Virtual Switch Tagging - aka, running multiple vlans across trunked connections.
we have this problem too and never get anywhere with support!
@ cpfcg: Yes we run multiple VLANS accross the Trunk....
I think we make up a call at vmware. When there are any new cognitions or answers i will post it here.
mfg Cedrick
We are experiencing the same issue. Using NW trunking, console is ok but unable to ping the VM. We can occassionally get it back by reverting to a snapshot of the system but reverting back to exactly yhe same snapshot can cause it to fail the NW again. Have logged a call with VMware.
L
Hey LeeCarey
There's a little difference between our problems: I can get the VM back (fully functioning) by only open the console and move the mouse... or use a key on the keyboard...
g c
I noticed you said that load balancing is set to route based on port ID. Do you have your trunks also set up as an ether channel? If so, you'd need to set your load balancing to "Route based on IP hash", otherwise, packets can get dropped and your machines will be unresponsive to pings.
Randy
Hi there
Since my last entry in this post, the problem never comes back. Because I can't reproduce it, I have stopped searching for possibly reasons.
Thanks for all your answers.
g Cedrick
Hi,
We are also facing the same issues with ESX 3.0.1, The VM's freeze with black screen and we need to reset the VM's in order to bring back.
This is happening in my production environment and have escalated to vmware support team but they are not able to tell me what exactly the problem is.
When this problem occurs, the CPU hits peak and it doesn't release the CPU.
Regards
Sreenath
Message was edited by:
ramram77
I have the same problem, and vmware.log is recording socket errors at the time of the lock ups. Hopefully someone out there will stumble on this thread and know the answer
Hi Guys,
It's been a while since this post was last updated but I'm experiencing a similar issue whereby 2 VM instances of a host that has a total of 5 virtual machines, freeze ultimately lose network connectivity. The only way to recitfy the problem is to open a console to the VM and disable / re-enable the network adpater. The other 3 VMs on this host do not experience this issue.
This is the only error i can find in the vmware.log files for both VM instances. I have many other hosts in my environment running this ESX 3.01 this particular host is running VMware ESX Server 3.0.0 build-27701. All the VMs share the same virtual switch, no VST enabled. default network settings.
mks| SOCKET 16 recv error 5: Input/output error
mks| SOCKET 16 destroying VNC backend on socket error: 5
mks| SOCKET 17 recv error 5: Input/output error
mks| SOCKET 17 destroying VNC backend on socket error: 5
Is any one still experiencing this problem?
thanks
Hi,
The VNC backend being destroyed indicates that there's no network connection for the KVM remote console to continue working.
Are you by any chance accessing via a NATted network?
If so, then check the link to the Knowledge base article below:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=749640
Hi,
I also get the same sort of errors. However I get them on one particular guest (that was converted from MS VS 2005). I only get this when I try to commit a snap. Strange, still researching.... call in with vmware....
Hello All,
I am also seeing the same error and the symptoms you have all highlighted.
Would be interested to see if anyone has identified a root cause.
Thanks,
Ruben
If its been converted then make sure you are up to date with VMware patches (rollup 1 + post 1 patches) then make sure your virtual scsi controller is set to LSI and not Bus logic.
Hi,
We had the same issue on ESX 3.0.1 and VMware support has given a fix for this ,after that we have not faced this issue so far.
According to VMware , The problem has been root caused to an issue in LSILOGIC scsiport driver that would cause a race condition where the SCSI Virtual Adapter is disabled when entering standby while the code to handle one interrupt is still being processed, causing the code to loop indefinitely. But the code that makes the LSILogic SCSI virtual adapter behave differently is not activated by default.
In order to activate it, you should add the following option to your vmx files:
lsilogic.reflectIntrMask = TRUE
After making the above change reload the VMs by issuing the following commands
#vimsh
$ vmsvc/reload <VmId from the list>
#vmware-cmd <path to VM> start
Let me know if this resolves your issue.
Regards
Sreenath