VMware Cloud Community
crush
Enthusiast
Enthusiast

ESX 3.0.1 Guests Freeze

Hi @ all

Our Problem is the following:

Some of our guests have the problem, that they occasionally freeze. That means the guests don't respond to any network query (for example a ping) and the console is black.

When I open the console in the Virtual Infrastructure Client and use the mouse or keyboard the guest continue working normal.

There is neither in the Eventviewer of the Virtual Infrastructure Client nor in the Eventviewer of the guest OS something special noticed. There are also any of the powersaving options disabled and the newest VMware Tools installed.

Our Environment:

ESX 3.0.1

Virtual Infrastructure 2.0.1

The OS of the affected guest are "Windows 2000 Professional" and "Windows Server 2003 Standard"

Has/Had anyone the same problem?

PS: sorry for my probably not correct english... Smiley Happy

0 Kudos
22 Replies
kapplah
Enthusiast
Enthusiast

Hi,

are there any other VMs on the same ESX host which do not have this special issue? Are they on the same virtual switch?

My first guess is that there are any networking problems inside your ESX host. Do you have any kind of network failover configured on your virtual switch, and if so: How is the configuration for failover (link detection, beacon probing, ...)

cu,

Alex

0 Kudos
crush
Enthusiast
Enthusiast

Hi Alex

Yes, they are other VMs on the same Host and the same virtual switch and they work properly.

The VMs i'd tell you are distributed over different ESX Hosts but are all on the same virtual switch but on different VLANs.

About failover setting:

Load Balancing: Port ID

Network Failure Detection: Link Status only

Notify Switches: Yes

Rolling: No

Active Adapters: 3 nic's that are connected to a cisco switch (3 GBit/s trunk)

Standby Adapters: None

Unused Adapters: None

greetings cedrick

0 Kudos
thickclouds
Enthusiast
Enthusiast

Are you using VST? IF so, is it across an IOS and COS switch?

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
crush
Enthusiast
Enthusiast

Sorry, but i don't know what VST is... sorry. Yes, we have a cisco switch with IOS OS.

New Info: while I was searching the vmware.log files of the affected VMs about suspect entries, there are some that attracted attention to me...

The follow events occour in that time where the VM was hanging:

vmware.log

Jan 23 14:42:18.811: mks| SOCKET 2 recv error 5: Input/output error

Jan 23 14:42:18.811: mks| SOCKET 2 destroying VNC backend on socket error: 5

Jan 23 14:43:45.386: mks| SOCKET 3 recv error 5: Input/output error

Jan 23 14:43:45.386: mks| SOCKET 3 destroying VNC backend on socket error: 5

Does anyone know what this means?

0 Kudos
mmenne
Contributor
Contributor

I don't know what it means, but I've experienced the same issue from time to time. Normally, the guests freeze up and don't respond (or respond VERY slowly to the console). Networking is gone.

This has happened on Windows 2000 Server and Windows Server 2003. normally, just resetting the VM takes care of it. It doesn't happen very often.. maybe one VM every few weeks at most. We had one the other day that we had to remove and re-add the vNIC on the Guest.

0 Kudos
thickclouds
Enthusiast
Enthusiast

Sorry - Virtual Switch Tagging - aka, running multiple vlans across trunked connections.

we have this problem too and never get anywhere with support!

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
crush
Enthusiast
Enthusiast

@ cpfcg: Yes we run multiple VLANS accross the Trunk....

I think we make up a call at vmware. When there are any new cognitions or answers i will post it here.

mfg Cedrick

0 Kudos
LeeCarey
Contributor
Contributor

We are experiencing the same issue. Using NW trunking, console is ok but unable to ping the VM. We can occassionally get it back by reverting to a snapshot of the system but reverting back to exactly yhe same snapshot can cause it to fail the NW again. Have logged a call with VMware.

L

0 Kudos
crush
Enthusiast
Enthusiast

Hey LeeCarey

There's a little difference between our problems: I can get the VM back (fully functioning) by only open the console and move the mouse... or use a key on the keyboard...

g c

0 Kudos
Randy_B
Enthusiast
Enthusiast

I noticed you said that load balancing is set to route based on port ID. Do you have your trunks also set up as an ether channel? If so, you'd need to set your load balancing to "Route based on IP hash", otherwise, packets can get dropped and your machines will be unresponsive to pings.

Randy

0 Kudos
crush
Enthusiast
Enthusiast

Hi there

Since my last entry in this post, the problem never comes back. Because I can't reproduce it, I have stopped searching for possibly reasons.

Thanks for all your answers.

g Cedrick

0 Kudos
ramram77
Contributor
Contributor

Hi,

We are also facing the same issues with ESX 3.0.1, The VM's freeze with black screen and we need to reset the VM's in order to bring back.

This is happening in my production environment and have escalated to vmware support team but they are not able to tell me what exactly the problem is.

When this problem occurs, the CPU hits peak and it doesn't release the CPU.

Regards

Sreenath

Message was edited by:

ramram77

0 Kudos
timbonz
Contributor
Contributor

I have the same problem, and vmware.log is recording socket errors at the time of the lock ups. Hopefully someone out there will stumble on this thread and know the answer Smiley Happy

0 Kudos
fbd
Contributor
Contributor

Hi Guys,

It's been a while since this post was last updated but I'm experiencing a similar issue whereby 2 VM instances of a host that has a total of 5 virtual machines, freeze ultimately lose network connectivity. The only way to recitfy the problem is to open a console to the VM and disable / re-enable the network adpater. The other 3 VMs on this host do not experience this issue.

This is the only error i can find in the vmware.log files for both VM instances. I have many other hosts in my environment running this ESX 3.01 this particular host is running VMware ESX Server 3.0.0 build-27701. All the VMs share the same virtual switch, no VST enabled. default network settings.

mks| SOCKET 16 recv error 5: Input/output error

mks| SOCKET 16 destroying VNC backend on socket error: 5

mks| SOCKET 17 recv error 5: Input/output error

mks| SOCKET 17 destroying VNC backend on socket error: 5

Is any one still experiencing this problem?

thanks

0 Kudos
wila
Immortal
Immortal

Hi,

The VNC backend being destroyed indicates that there's no network connection for the KVM remote console to continue working.

Are you by any chance accessing via a NATted network?

If so, then check the link to the Knowledge base article below:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=749640

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
0 Kudos
AlexPT
Contributor
Contributor

Hi,

I also get the same sort of errors. However I get them on one particular guest (that was converted from MS VS 2005). I only get this when I try to commit a snap. Strange, still researching.... call in with vmware....

0 Kudos
rubenr
Contributor
Contributor

Hello All,

I am also seeing the same error and the symptoms you have all highlighted.

Would be interested to see if anyone has identified a root cause.

Thanks,

Ruben

0 Kudos
AlexPT
Contributor
Contributor

If its been converted then make sure you are up to date with VMware patches (rollup 1 + post 1 patches) then make sure your virtual scsi controller is set to LSI and not Bus logic.

0 Kudos
sreenathmv
Contributor
Contributor

Hi,

We had the same issue on ESX 3.0.1 and VMware support has given a fix for this ,after that we have not faced this issue so far.

According to VMware , The problem has been root caused to an issue in LSILOGIC scsiport driver that would cause a race condition where the SCSI Virtual Adapter is disabled when entering standby while the code to handle one interrupt is still being processed, causing the code to loop indefinitely. But the code that makes the LSILogic SCSI virtual adapter behave differently is not activated by default.

In order to activate it, you should add the following option to your vmx files:

lsilogic.reflectIntrMask = TRUE

After making the above change reload the VMs by issuing the following commands

#vimsh

$ vmsvc/getallvms

$ vmsvc/reload <VmId from the list>

$ quit

#vmware-cmd <path to VM> start

Let me know if this resolves your issue.

Regards

Sreenath

0 Kudos