VMware Communities > VMTN > VMware Server > VMware Server 1 > Documents

Bridged networking just quit!

VERSION 1 Published

Created on: Sep 10, 2008 12:05 PM by tbozzo - Last Modified:  Sep 10, 2008 12:07 PM by tbozzo

All:

I've found several threads on this with no real resolutions, so I'm posting this in the hopes someone has since found a resolution and can share it...or that VMWare has found the bug and fixed it.

I had been accessing some test VMs on my machine using VMWare Server regularly, then didn't for a couple of weeks. When I tried to do so yesterday, none of the guests can communicate with the host or other guests...and the host can't even ping any of the guests. Guest OSes include Win2003 and CentOS...and they all have the same problem.

Any ideas? I did try to switch to Host-only, as that had been mentioned in one of the threads. No go.

Please advise. I was testing this to provide a go/no-go on deploying VMWare at a client site, and now I feel I must strongly advise against its use in anything resembling a production environment...a shame, as until now, I had no major reservations with it. Ugh.

Thanks in advance...
Mark

fyi...this post has been moved to the VMware Server forum.

Eric Siebert
VMTN User Moderator

i have had the same issue quite often - also if VMs weren`t accessed for some longer time.

what is the type of network card inside the host ?

you could workaround that by installing some scheduled job inside the VM which is pinging another machine from time to time.

Regarding the ping issue this is strange...
Are you sure your guests machine are in the same Ip range than your
host ?

The time you didn't use you VMs, did you change anything on your host ( Service pack, updates, anything ?)

What is your host os ?
I had similar issues for network communications between Host and guests but it was sending / Receiving files ping was working fine.

I have two different network adapters: one wireless, one wired. The wireless is an Intel; the wired one is a Broadcom, I believe. I'll check later and let you know.

One thing I forgot to mention is that the host OS in WinXP, SP2.

I'm nearly certain that Microsoft updates were applied in the weeks between when the VMs worked and when I noticed that they had stopped. But no IP changes occurred during that time, so all ranges were the same as they had been when it worked - I double-checked it!

This sounds like a VMWare bug, and a very serious one. Why would VMWare stop allowing IP traffic between the host and all guests? BTW, I also double-checked a Solaris-x86 VM I have, and it doesn't communicate with anything anymore, either.

Any workarounds? Please? Thanks in advance...

Mark

Did you test to ping one VM with another, and ping one VM with other machine on your network ?

Also can you test to ping any machine on your network from your host ?

The wireless is an Intel; the wired one is a Broadcom,
I believe. I'll check later and let you know.

please post exact model and also post operating system + driver information.
please also post the type of network switch where the system is hooked to.

there are guesses,that this problem is not related to vmware alone.

so - the only way to get deeper into this and systematically trying to catch this error is to collect statistical information to see, if the problem is specific to certain setup constellations.

what about starting to compile such list within this thread ?

OK, here are the specifics on my network cards:

1. Wired card is a Broadcom NetXtreme Gigabit Ethernet card, running driver version 8.27.1.0.
2. Wireless card is an Intel PRO/Wireless 2200BG with driver version 9.0.2.31.

I've tried pinging across VMs, from host to VM, from VM to host...everything I can think of. I can ping and ssh into other computers on the network from the host...but when it comes to communication to/from a VM, nothing works.

Again, this exact configuration worked previously. I had suspected an OS patch broke it, but several others have reported the same problems, and some of those people indicate no OS patch was applied in the interim.

Someone suggested that this happens a lot when VMs haven't been accessed in quite awhile. But that is also inconsistent, as one post said it had worked one day in the morning and stopped working that same afternoon.

Again, any thoughts or suggestions would be greatly appreciated.

Maybe try this ?

1. Click Start, click Run, type regedit , and then click OK.
2. Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
3. On the Edit menu, point to New, click DWORD Value, and then type EnableRSS
4. Double-click EnableRSS, type 0 , and then click OK.
5. Exit Registry Editor.

If you are still experiencing problems (like slow file copying), you should also disable Offloading support:
1. Click Start, click Run, type regedit, and then click OK.
2. Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
3. In the right pane, make sure that the DisableTaskOffload registry entry exists. If this entry does not exist, follow these steps to add the entry:
a. On the Edit menu, point to New, and then click DWORD Value, and then type DisableTaskOffload .
4. Double-Click DisableTaskOffload, type 1, and then click OK.
5. Exit Registry Editor.

ON your host ...

I'm having the same problem and it appears to have happened in the same timeframe that other people are having similar problems, about 7-10 days ago.

I first thought it was my host OS (Win XP) that had problems and so I did a reinstall (my PC needed it anyway), but this didn't help. I've tried an XP guest and a Knoppix guest and they both exhibit the same behaviour.

The problem is that they are able to get an IP address from the DHCP server, but nothing after that works. From the guest I can ping the IP of the host and from the host I can ping the guest. From the guest I can't get to anything else on the network. My host PC works fine (network wise).

I've read all of the threads and tried changing the EnableRSS & DisableTaskOffload registry keys with no success (yes, I rebooted).

Something that is interesting is that up until I reloaded my PC I was using VMware Server v1.0.0 (don't know the build number), so it's not just the current version of VMWare that has the problem. This makes me believe that it could well be a Windows update that has caused this problem as nothing else changed on my PC when the problem occurred (prior to the reload).

This makes me believe that it could well be a Windows update that has
caused this problem

i don`t think so, since this problem also exists on linux

The reason I'm blaming Windows is that I didn't change anything on my PC and it just sopped working. I was using an old version of VMware, so it wasn't a VMware update that caused it (ie. I hadn't updated VMware). I'm happy to be proven wrong though.

I'm not really worried about what t he problem is, or blaming MS, VMware, Linux or someone else, I just want to find a resolution for it !

Agreed. I've taken an alternate route to "solving" this on some of the VMs to which I need to gain immediate access by switching to NAT on a single virtual adapter, but this is not a real solution. VMWare, are you listening??!?!

Good idea, I hadn't thought to use NAT. This allows me to continue to use my VM as it mostly used for outbound network traffic which NAT is fine for. If I was using it to host a server (inbound connections) I wouldn't be nearly as happy to use NAT.

Hi Mark,

There a million + users of VMware Server that don't have an issue with networking when properly configured.
This include some of the fortune 100 companies.

Post the output from " ipconfig /all from your Windows host

Post the output from "ipconfig /all" from your Windows guest or " ifconfig -a " from your Linux host.

Post the .vmx file from the virtual machine

If using Bridged networking for your VM's, make sure that your are Bridged to the correct physical network adapter, by turning off automatic Bridging(Automatic Bridging tab) and selecting the physical network adapter in the dropdown menu for vmnet0 (Host Virtual Network Mapping tab)

Also check the properties of the physical network adapter and make sure that the VMware Bridge Protocol is installed and that there is a check mark next to it.

Kevin:

Perhaps there are 1 million+ users...but I have a few questions about this:

1. Are they using the free versions?
2. Do they get a resolution if/when this happens to them that we aren't getting?
3. This is not an isolated case. Just search this forum (and Google) to see many other examples of the same faulty behavior. Where does the problem lie, and why isn't it being addressed?

I note that most cases of this occur under a Windoze host, but there are examples of Linux hosts having the same problem. The common thread is that a VMWare product is loaded, works with several guests (VMs), then doesn't. No warnings, in many cases no host OS changes. Why?

I'll upload the information you mentioned when I'm at my desk tomorrow. Thank you (Kevin and all!) for your help in trying to resolve this.

Mark

Hi Mark,

Kevin:

Perhaps there are 1 million+ users...but I have a few
questions about this:

1. Are they using the free versions?


There is only one VMware Server product and it's free to everyone.

2. Do they get a resolution if/when this happens to
them that we aren't getting?

Companies that are using VMware Server in a production environment, typically purchase a support contract so that they can get immediate assistance with configuration issues..etc
The community forum is not VMware Technical Support, it is one of the self help options available (Like the documentation, knowledge base..etc) to users that have decided not to purchase a support contract to deal with their issues on their own without the assistance of VMware Technical Support.

3. This is not an isolated case. Just search this
forum (and Google) to see many other examples of the
same faulty behavior. Where does the problem lie, and
why isn't it being addressed?

I can google any hardware/software product and find users that are having the same issue because they did not configure something correctly.


I note that most cases of this occur under a Windoze
host, but there are examples of Linux hosts having
the same problem. The common thread is that a VMWare
product is loaded, works with several guests (VMs),
then doesn't. No warnings, in many cases no host OS
changes. Why?

Again this means nothing, many users make the same common mistakes.

I'll upload the information you mentioned when I'm at
my desk tomorrow. Thank you (Kevin and all!) for your
help in trying to resolve this.

Yes, please do post the request information so we can troubleshoot your issue.

Mark

Hi Mark,

Kevin:

Perhaps there are 1 million+ users...but I have a

few
questions about this:

1. Are they using the free versions?

There is only one VMware Server product and it's free
to everyone.


My point - which you apparently missed - was that perhaps they are NOT using the free version, i.e. VMWare Server. You mentioned that 100 million+ users were using this product in a production environment with no problems; I responded with this question: are they using a free version? To get right to the point, are they using a product with a different code base...one that might have corrected an internal VMWare bug that exists in Server?

Either way, if this is a VMWare issue, can I trust their production code line if they knowingly leave bugs in their free version? This is just something that weighs on my mind...


2. Do they get a resolution if/when this happens
to
them that we aren't getting?

Companies that are using VMware Server in a
production environment, typically purchase a support
contract so that they can get immediate assistance
with configuration issues..etc
The community forum is not VMware Technical Support,
it is one of the self help options available (Like
the documentation, knowledge base..etc) to users that
have decided not to purchase a support contract to
deal with their issues on their own without the
assistance of VMware Technical Support.


Yes, I am well aware of that. I guess I didn't speak directly enough to the question earlier. My point here, as in my previous response above, is that IF VMWare realized this was a bug and they provide a fix for the paid versions and IF they withhold that fix from the free versions, can I really trust them for a production environment?

3. This is not an isolated case. Just search this
forum (and Google) to see many other examples of
the
same faulty behavior. Where does the problem lie,
and
why isn't it being addressed?

I can google any hardware/software product and find
users that are having the same issue because they did
not configure something correctly.


So, if several people configure VMWare Server properly "enough" to work flawlessly for weeks, then it QUITS working for no apparent reason - with no changes to configuration or OS - this is due to our ineptitude configuring VMWare? I simply don't follow that thought process.


I note that most cases of this occur under a
Windoze
host, but there are examples of Linux hosts having
the same problem. The common thread is that a
VMWare
product is loaded, works with several guests
(VMs),
then doesn't. No warnings, in many cases no host
OS
changes. Why?

Again this means nothing, many users make the same
common mistakes.


Please see my previous response.

I'll upload the information you mentioned when I'm
at
my desk tomorrow. Thank you (Kevin and all!) for
your
help in trying to resolve this.

Yes, please do post the request information so we can
troubleshoot your issue.


Mark

It doesn't appear that you are interested in trying to troubleshoot this, as you aren't listening to anything that I or many other contributors have posted. Never mind, but thank you ever so much for your responses.

Hi Mark,

I don't understand why you believe there are two versions of VMware Server.
As I said before, there is only one version of VMware Server and it is free.
There is no paid version and a free version.

The only thing you need to pay for is a service contact if you decide to use VMware Technical support services. This is no different than any other software company.

If there is is bug is our software, we are always willing to know about it.
That why there is VMware Server 1.03, because it was released to address bugs found in the previous versions (1.0,1.0.1,1.0.2).

My point was that there is many users running VMware Server without any networking issue.

Looking at your posting the was no information provided that anyone could really troubleshoot your issue.
So please post the requested information so I can help you troubleshoot your issue or feel free to purchase a support contract to talk with VMware Technical support directly.

If someone is willing to help, then I'm happy to accept. The output from the commands requested is below. This is after I have changed to NAT on the VM so that it works. If you'd like the output from when it's not working, let me know and I'll re-jig the VM and grab the output again. Both host and guest are Win XP SP2. As I mentioned in a previous post I did a reinstall of the host OS last week, but the VM is the same one (I copied it off and back again). Having said that, I created a new VM and ran knoppix in it and it shows the same problems.

I'm running this on my notebook, which is why there are two network connections (wireless and GB eth). I work in IT support, so feel free to ask any questions you like ;)

(on a side topic, I think the "non-free" version of VM Server Mark is referring to is ESX)

===== HOST ipconfig =====
Ethernet adapter VMware Network Adapter VMnet8:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : VMware Virtual Ethernet Adapter for
VMnet8
Physical Address. . . . . . . . . : 00-50-56-C0-00-08
Dhcp Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.42.1
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :

Ethernet adapter VMware Network Adapter VMnet1:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : VMware Virtual Ethernet Adapter for
VMnet1
Physical Address. . . . . . . . . : 00-50-56-C0-00-01
Dhcp Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.92.1
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :

Ethernet adapter Wireless Network Connection:

Description . . . . . . . . . . . : Intel(R) PRO/Wireless 2915ABG Networ
k Connection
Physical Address. . . . . . . . . : 00-12-F0-5B-63-99
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . : 192.168.144.64
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.144.10
DHCP Server . . . . . . . . . . . : 192.168.144.17
DNS Servers . . . . . . . . . . . : 192.168.144.17
192.168.144.36
192.168.144.33
Lease Obtained. . . . . . . . . . : Thursday, 5 July 2007 11:00:48 AM
Lease Expires . . . . . . . . . . : Thursday, 12 July 2007 11:00:48 AM

Ethernet adapter Local Area Connection 2:

Media State . . . . . . . . . . . : Media disconnected
Description . . . . . . . . . . . : Broadcom NetXtreme Gigabit Ethernet
Physical Address. . . . . . . . . : 00-0A-E4-C0-B6-FC

Ethernet adapter Local Area Connection:

Media State . . . . . . . . . . . : Media disconnected
Description . . . . . . . . . . . : Bluetooth LAN Access Server Driver
Physical Address. . . . . . . . . : 00-0E-9B-DE-84-29

=====================

===== GUEST ipconfig =====
Ethernet adapter Local Area Connection 3:

Media State . . . . . . . . . . . : Media disconnected
Description . . . . . . . . . . . : VMware Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . : 00-0C-29-FD-F7-00

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . : localdomain
Description . . . . . . . . . . . : VMware Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . : 00-0C-29-FD-F7-F6
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . : 192.168.42.128
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.42.2
DHCP Server . . . . . . . . . . . : 192.168.42.254
DNS Servers . . . . . . . . . . . : 192.168.42.2
Primary WINS Server . . . . . . . : 192.168.42.2
Lease Obtained. . . . . . . . . . : Thursday, 5 July 2007 12:43:11 PM
Lease Expires . . . . . . . . . . : Thursday, 5 July 2007 1:13:11 PM
=====================

VMX config file

config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
memsize = "512"
ide0:0.present = "TRUE"
ide0:0.fileName = "Windows XP Professional.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
displayName = "Windows XP Professional"
guestOS = "winxppro"
priority.grabbed = "normal"
priority.ungrabbed = "normal"

ide1:0.autodetect = "TRUE"

ide0:0.redo = ""
ide1:0.startConnected = "TRUE"
ethernet0.addressType = "generated"
uuid.location = "56 4d 45 2d f1 5e 09 47-6c d8 6f 3c 62 4a bd 48"
uuid.bios = "56 4d c4 74 04 af fc e8-ad 71 a8 ea d4 fd f7 f6"
ethernet0.generatedAddress = "00:0c:29:fd:f7:f6"
ethernet0.generatedAddressOffset = "0"

floppy0.startConnected = "FALSE"
floppy0.autodetect = "TRUE"
Ethernet0.connectionType = "nat"
Ethernet0.vnet = "VMnet4"
Ethernet1.present = "TRUE"
Ethernet1.startConnected = "FALSE"
Ethernet1.connectionType = "custom"
Ethernet1.vnet = "VMnet2"
Ethernet1.addressType = "generated"
Ethernet1.generatedAddress = "00:0c:29:fd:f7:00"

ethernet1.generatedAddressOffset = "10"
tools.syncTime = "FALSE"


Fascinating. I also have Intel PRO/Wireless and Broadcom NetXtreme adapters. This seems beyond coincidental that we're both having the same issue with the same network adapters.

BTW, the VMs that were previously communicating - until they simply stopped - included a Solaris x86, multiple CentOS Linux installs, and a couple of Win2003 servers. Considering the commonalities presented, along with the fact that they previously worked with bridged networking, it seems highly improbable that we've just "configured something incorrectly."

As an interesting tie-in, search the VMWare forums (I don't remember which subforum) and you'll see that someone reported identical symptoms with their VMs communicating fine one morning, then failing that afternoon...with no configuration changes or platform updates. As a former software developer, this sounds very much like a bug. Of course, it's impossible to tell from where I sit now...but you have to marvel at the series of coincidences.

Mark

Hi tonyqa,

Not sure why you have NAT configured in your VM with vmnet4, since typically this line should be vmnet8 (Ethernet0.vnet = "VMnet4" ) and not vmnet4

Ethernet0.connectionType = "nat"
Ethernet0.vnet = "VMnet4"

If you like would, please post your configuration with Bridged.

Also how is your host connected to the local network / internet? (Cable modem, DSL, Router, Switch..etc)

Is this a home or business environment?

The line:

Ethernet0.vnet = "VMnet4"

appears to be an obsolete line from when I tried to see if manually bridging the wireless NIC to adapter vmnet4 and then making that ethernet0 in the VM would work. I don't know why it's still in the config file, I'll remove it manually.

OK, below is the output from ipconfig on the VM and the config file. I have manually configured my wireless NIC to be bridged to VMnet4 and my wired NIC to be bridged to VMnet2. These are then set to eth0 & eth1 on the VM.

This is a business environment. The host is connected via Cisco wireless AP and Cisco switches to a Cisco PIX firewall which is connected to two different ISP using load balancing with BGP. I manage the network infrastructure, which is why having a network problem in a VM is so annoying ;)

It is on an IBM R52 notebook (type 1846-4LM).

======== GUEST ipconfig ========

Ethernet adapter Local Area Connection 3:

Description . . . . . . . . . . . : VMware Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . : 00-0C-29-FD-F7-00
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . : 192.168.144.71
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.144.10
DHCP Server . . . . . . . . . . . : 192.168.144.17
DNS Servers . . . . . . . . . . . : 192.168.144.17
192.168.144.36
192.168.144.33
Lease Obtained. . . . . . . . . . : Thursday, 5 July 2007 2:17:11 PM
Lease Expires . . . . . . . . . . : Thursday, 12 July 2007 2:17:11 PM

Ethernet adapter Local Area Connection:

Description . . . . . . . . . . . : VMware Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . : 00-0C-29-FD-F7-F6
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . : 192.168.144.78
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.144.10
DHCP Server . . . . . . . . . . . : 192.168.144.17
DNS Servers . . . . . . . . . . . : 192.168.144.17
192.168.144.36
192.168.144.33
Lease Obtained. . . . . . . . . . : Thursday, 5 July 2007 2:17:12 PM
Lease Expires . . . . . . . . . . : Thursday, 12 July 2007 2:17:12 PM
===========================

config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
memsize = "512"
ide0:0.present = "TRUE"
ide0:0.fileName = "Windows XP Professional.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
displayName = "Windows XP Professional"
guestOS = "winxppro"
priority.grabbed = "normal"
priority.ungrabbed = "normal"

ide1:0.autodetect = "TRUE"

ide0:0.redo = ""
ide1:0.startConnected = "TRUE"
ethernet0.addressType = "generated"
uuid.location = "56 4d 45 2d f1 5e 09 47-6c d8 6f 3c 62 4a bd 48"
uuid.bios = "56 4d c4 74 04 af fc e8-ad 71 a8 ea d4 fd f7 f6"
ethernet0.generatedAddress = "00:0c:29:fd:f7:f6"
ethernet0.generatedAddressOffset = "0"

floppy0.startConnected = "FALSE"
floppy0.autodetect = "TRUE"
Ethernet0.connectionType = "custom"
Ethernet1.present = "TRUE"
Ethernet1.startConnected = "TRUE"
Ethernet1.connectionType = "custom"
Ethernet1.vnet = "VMnet2"
Ethernet1.addressType = "generated"
Ethernet1.generatedAddress = "00:0c:29:fd:f7:00"

ethernet1.generatedAddressOffset = "10"
tools.syncTime = "FALSE"

Ethernet0.vnet = "VMnet4"


After changing back to NAT, it still has that line there, so the config program is being lazy and not removing lines that it doesn't need.

I removed it before I went back to bridging and now that I've changed back to NAT it's back again.

(the line is)
Ethernet0.vnet = "VMnet4"

Message was edited by:
tonyqa

To add some more information.
I am having the same problem with some VM's on Fedora Core 6. All are bridged and cannot be reached by (so far) ssh, httpd and RDP from the hosting server but are reachable from other VM's and other computers. These were working for a while. I believe the problem may have started after upgrading the server to 1.0.2. I cannot be positive because I don't access the VM's very often from the hosting server. I just upgraded to 1.0.3 but the problem is still there.

One of the VM's is a mailserver which is happily still accepting email from the outside world though it is having a problem (timeouts) sending outgoing messages. This may not be related though.

Packet captures show TCP checksum errors. A capture file is available if required.

Message was edited by:
scunningham

so - let`s try to make a list - maybe we can "see" something from that:

-SrMarcos-
VMware:
Server ???
Hardware/Driver:
Broadcom NetXtreme Gigabit Ethernet / 8.27.1.0. <--??
Intel PRO/Wireless 2200BG / 9.0.2.31 <--which is used for bridging?
Host:
WinXP SP2
Guests:
Solaris x86, multiple CentOS Linux installs, and a couple of Win2003 servers.
Symptoms:
Communication suddenly stopped

-tonyqa-
VMware:
Server ???
Hardware/Driver:
Intel(R) PRO/Wireless 2915ABG <---??
Broadcom NetXtreme Gigabit Ethernet <--which is used for bridging?
Host:
Win XP
Guests:
XP, Knoppix
Symptoms:
Guests get IP via dhcp, but no communication possible

-scunningham-
VMWare:
Server 1.0.3
Hardware/Driver:

please tell <
Host:
Fedora Core 6
Guests:
please tell <
Symptoms:
One of the VM's is a mailserver which is happily still accepting
email from the outside world though it is having a problem (timeouts)
sending outgoing messages. This may not be related though.
Packet captures show TCP checksum errors.

-devzero-
VMWare:
trhoughout all hosted products, have seen this on workstation for a long time, but on gsx, server too.
Hardware/Driver:
can`t tell for now<
Host:
Mostly SuSE Linux
Guests:
Windows, Linux
Symptoms:
VMs loose network connection and aren`t pingable anymore. Especially if they haven`t been used for some time. It just "looks" that they loose their connection due to lack of useage or absence of communication

In the information you posted below, it looks like both virtual network adapters were able to get a IP from your DHCP server in Bridge mode.

So what can't the VM's do?

OK, below is the output from ipconfig on the VM and
the config file. I have manually configured my
wireless NIC to be bridged to VMnet4 and my wired NIC
to be bridged to VMnet2. These are then set to eth0 &
eth1 on the VM.

This is a business environment. The host is connected
via Cisco wireless AP and Cisco switches to a Cisco
PIX firewall which is connected to two different ISP
using load balancing with BGP. I manage the network
infrastructure, which is why having a network problem
in a VM is so annoying ;)

It is on an IBM R52 notebook (type 1846-4LM).

======== GUEST ipconfig ========

Ethernet adapter Local Area Connection 3:

Description . . . . . . . . . . . : VMware
Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . :
00-0C-29-FD-F7-00
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . :
192.168.144.71
Subnet Mask . . . . . . . . . . . :
255.255.255.0
Default Gateway . . . . . . . . . :
192.168.144.10
DHCP Server . . . . . . . . . . . :
192.168.144.17
DNS Servers . . . . . . . . . . . :
192.168.144.17

92.168.144.36

92.168.144.33
Lease Obtained. . . . . . . . . . : Thursday,
5 July 2007 2:17:11 PM
Lease Expires . . . . . . . . . . : Thursday,
12 July 2007 2:17:11 PM

Ethernet adapter Local Area Connection:

Description . . . . . . . . . . . : VMware
Accelerated AMD PCNet Adapter

Physical Address. . . . . . . . . :
00-0C-29-FD-F7-F6
Dhcp Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IP Address. . . . . . . . . . . . :
192.168.144.78
Subnet Mask . . . . . . . . . . . :
255.255.255.0
Default Gateway . . . . . . . . . :
192.168.144.10
DHCP Server . . . . . . . . . . . :
192.168.144.17
DNS Servers . . . . . . . . . . . :
192.168.144.17

92.168.144.36

92.168.144.33
Lease Obtained. . . . . . . . . . : Thursday,
5 July 2007 2:17:12 PM
Lease Expires . . . . . . . . . . : Thursday,
12 July 2007 2:17:12 PM
==========================

config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
memsize = "512"
ide0:0.present = "TRUE"
ide0:0.fileName = "Windows XP Professional.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
displayName = "Windows XP Professional"
guestOS = "winxppro"
priority.grabbed = "normal"
priority.ungrabbed = "normal"

ide1:0.autodetect = "TRUE"

ide0:0.redo = ""
ide1:0.startConnected = "TRUE"
ethernet0.addressType = "generated"
uuid.location = "56 4d 45 2d f1 5e 09 47-6c d8 6f 3c
62 4a bd 48"
uuid.bios = "56 4d c4 74 04 af fc e8-ad 71 a8 ea d4
fd f7 f6"
ethernet0.generatedAddress = "00:0c:29:fd:f7:f6"
ethernet0.generatedAddressOffset = "0"

floppy0.startConnected = "FALSE"
floppy0.autodetect = "TRUE"
Ethernet0.connectionType = "custom"
Ethernet1.present = "TRUE"
Ethernet1.startConnected = "TRUE"
Ethernet1.connectionType = "custom"
Ethernet1.vnet = "VMnet2"
Ethernet1.addressType = "generated"
Ethernet1.generatedAddress = "00:0c:29:fd:f7:00"

ethernet1.generatedAddressOffset = "10"
tools.syncTime = "FALSE"

Ethernet0.vnet = "VMnet4"


Correct, the VM is getting an IP address via DHCP on both virtual NIC's. This is what is confusing me so much.

So what can't the VM's do?

Anything after that. They can't ping, can't HTTP, can't RDP, can't pass any traffic at all.

An interesting development. I went back through to reconfigure the networking (once again) and saw that in one of my previous futile reconfiguration attempts, I had set VMNet0 to the loopback adapter. I changed it back to the physical adapter and fired up a Solaris VM...and it worked. I then fired up a CentOS VM, reconfigured the network adapters within...and it worked. I haven't tested the others, but I suspect all will work as it did before.

I really don't know what happened to make this stop working, and I'm not sure what changed that allowed it to work again. I do understand why it didn't work in the middle, though, with VMNet0 set to the loopback adapter. Sigh.

I'll continue to watch this thread and contribute where I can, but until it breaks again, I don't know how much help I'll be. In the interim, here's another question: are all of these failures on laptops with identical network cards? Devzero's list looks like a good thing to complete...

Mark

Hi Mark,

Glad to hear that you were able to get your networking working again. :-)

Eureka !!

I just found what my problem is. It WAS an update, but not a Windows update nor a VMware update. It was an AV update. I've just UN-installed Trend Micro (AV app) and bridged networking is working again.

For those others having problems, give this a go if you're using Trend (or even some other AV app)

It's definitely Trend, as during the uninstall it dropped all of my network connections (open sessions dropped) and then they came back again after the uninstall progam finished. I have no idea what Trend is doing, except that it must be some sort of firewalling service that it run. Grrrr ! BAD Trend, if I wanted to install a firewall I would do that !

moved to the bottom of this thread

I hadn't considered an anti-virus application update as a potential culprit, but that would make perfect sense. And with the frequency of updates (nearly daily) that those get, that could explain the sudden stopping (or even re-starting) of network connections. Sigh. Nothing like a constantly changing configuration to make troubleshooting more difficult...

That said, I'm not using Trend on my machine, but rather Symantec. The most obvious commonality still seems to be the network adapters...

-scunningham-
VMWare:
Server 1.0.3
Hardware/Driver:
please tell <

00:10.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
Driver : 8139too

Host:
Fedora Core 6
Guests:
please tell <

Mixture or Linux Distros and Windows 2000 Server

Symptoms:
One of the VM's is a mailserver which is happily still accepting
email from the outside world though it is having a problem (timeouts)
sending outgoing messages. This may not be related though.
Packet captures show TCP checksum errors.

The timeout problem for outbound mail is not related. My ISP is now blocking these.

moved to the bottom of this thread

I to am having all of my VM's disconnecting after 3-4 hours.

VMserver 1.0.3
HP DL 140 G3 Server
(2) Duo core Xeon 2.2
(2) 146 gig scsi hard drives
(2) broadcom 1gig nic's (No Wireless)
4 gigs ram

Host OS Windows 2000 Terminal Server

VM#1 Windows XP Pro SP2
256 meg ram
8 gig partition

VM#2 Windows 2003 Standard
1 gig ram
(2) 20 gig partitions

VM#3 CentOS 4.5
1 gig ram
8 gig partition

All of the VM's can ping one another as well as ping to & from the host but no other machine on the network can ping a VM. The machines on the network can ping the host.

This was a fresh install of the host and all the VM's, and this problem was there from the start.

The only way I could get these to stay online for more then 3 or 4 hours was to disable one of the network cards in the host OS. I would like to use my second nic for load sharing. I am also curious if it will be crashing after a few days now? Does anyone have any idea's how to fix this?


hello!

(2) broadcom 1gig nic's (No Wireless)

could you add the exact nic model and/or the driver version ?

furthermore, i think about extending the list a little bit and adding the switch model where the server is attached to.
so if you know what switch is used, you can add that, too

regards
roland

Nr. 2
Name: scunningham
VMWare: Server 1.0.3
Hardware/Driver: 00:10.0 Ethernet controller: Realtek
Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev
10), 8139too
Host: Fedora Core 6
Guests: Mixture or Linux Distros and Windows 2000
Server
Symptoms: One of the VM's is a mailserver which is
happily still accepting email from the outside world
though it is having a problem (timeouts) sending
outgoing messages. This may not be related though.
Packet captures show TCP checksum errors.
Comment: The timeout problem for outbound mail is
not related. My ISP is now blocking these.

some things to try on this one would be

-disable any of the offload features on the host physical NIC

-disable any of the offload features on the guest virtual NICs

-make sure you have current driver code for the realtek. I had really bad experiences with these interface types under linux (with both distro built drivers and compiles from vendor source) even without any virtualization running. i ended up installing e1000 cards after getting tired of the flakiness...that was last fall, perhaps since that time the linux driver has improved????


Networking problems, to resolve:

Nr. 1
Name: SrMarcos
VMware: Server ???
Hardware/Driver: Broadcom NetXtreme Gigabit Ethernet
/ 8.27.1.0, Intel PRO/Wireless 2200BG / 9.0.2.31
<--which is used for bridging?
Host: WinXP SP2
Guests: Solaris x86, multiple CentOS Linux installs,
and a couple of Win2003 servers.
Symptoms: Communication suddenly stopped
Comment: for now, it started working again again
after re-configuring the network - but no clue, why
it happened.


there are many issues related to the Broadcom gig nics and windows 2003...esp with the recent OS updates.

in short, a combination of a few factors are contributing to the problem....the microsoft "scalable networking" feature additions, the lack of full driver support from the NIC driver vendors, the lack of correct feature support in some earlier generations of the hardware

-update all your NIC drivers (both at the HOST, and at the GUEST) to current versions from your hardware vendors (the current versions may still not correctly support the NDIS 6 features that are ultimately supposed to resolve these issues)

-disable any of the the problematic features (TCP offloading, Recieve Side Scaling, DoS) and retest

some background info/reference here:

http://msexchangeteam.com/archive/2007/07/18/446400.aspx

http://support.microsoft.com/kb/936594

http://www.vmware.com/community/thread.jspa?threadID=78606&start=30&tstart=0

http://support.microsoft.com/kb/898468

Nr. 3
Name: devzero
VMWare: troughout all hosted products, have seen this
on workstation for a long time, but on gsx, server
too.
Hardware/Driver: can`t tell for now, but i think it
were different brands of nics involved
Host: Mostly SuSE Linux
Guests: Windows, Linux
Symptoms: VMs loose network connection and aren`t
pingable anymore. Especially if they haven`t been
used for some time. It just "looks" that they loose
their connection due to lack of useage or absence of
communication

this sounds like either a power management related problem (most physical nics have power saving enabled by default), or potentially a problem with the arp cache on either the host or the switch...alternatively a NAT configuration would show symptoms like this if the translation tables are getting exhausted or timing out too quickly

you may want to verify that if you force continuous activity to the systems that the connection stays steady and that its only during periods of a lack of use where the issue comes up....you may want to also look at the switch side logs to see if there are any errors getting reported on that end

thanks for your comments!

you may want to verify that if you force continuous activity to the systems that
the connection stays steady and that its only during periods of a lack of use
where the issue comes up....you may want to also look at the switch side logs
to see if there are any errors getting reported on that end
yes, if i configure some cron based periodical ping all is fine. i tested this on some systems and it seems it`s a "workaround".
for now i don`t have access to the switches and don`t know if they offer any logging feature. will ask our network-admin the next time i`m back in business...

The Nic's are

(2) Embedded Broadcom NetExtreme 5721 pci-e gbit
Driver Version 9.52.0.0 (dated 5/15/06, This is the newest version for Windows 2000)

The switch it is attached to is a Cisco 9550 48 port.

I did see that the power save was enabled on the host OS nic's so I disabled it. At this time I do not know if this fixed anything because I just did it. I will post again tomorrow and let everyone know if that fixed it.
Thanks for everyones help.
Dallas

moved to the bottom of this thread

Just read this thread with great interest. I've had similar issues for the past 10 months. We converted a physical server into a VM. (Mainly for performance gain)

Since then we've had the network disconnections. It seems to have come in two forms. One where the machine can still be pinged and those where it can't.

First of we disbaled power management on the physical nic (a broadcom BCM5708C by the way!) but this made no difference. We set up a continual ping this made no difference either. I have now added another card a 3com Gigabit (SX/TX) and the problem is still evident.

We currently run about 12 VM machines, this is our busiest. The disconnection does seem to happen at the servers quieter moments although we have witnessed it during a busy period.

The only difference on this server to our other vm's is the fact that it was created using the tool to convert from a physical server ( i forget the name)

The Nic connection is bridged by the way, when i have a few mins i'll gather some info up and post it here.

Vm Server 1.0.1 Build 29996
HW - Dell PE2900, 4 GB Ram, Dual Xeons 2ghz,
Host - Windows Server 2003 R2 SP1.
Guest - Windows Server 2003 R2 SP1. (Allocated 2.5gb Ram - Created from Physical Server)
Symptoms - Bridge network connection fails during quieter periods, sometimes it can reply to ping requests sometimes not. A Disable / Enable of the VM adapter usualy rectifies the problem. See above for other things tried.

Message was edited by:
elnino9

Hello, I have had a VMWare Server 1.03 running in production for over two months, quite happily I might add. Late last night though, something fell over, and I was looking for advice on how I might go about finding precisely what. However, it is sounding slightly familiar to things mentioned in this thread.

My first stop was /var/log/vmware/ and the serverd log files. These were very minimal however and showed nothing catastrophic. The log files in my /vm store also show no events over the past few days.

I noticed this thread when hunting for a solution, but I'm not sure if my symptoms are precisely the same. In my situation, none of the virtual servers could be contacted in any way, but neither could I log in to the host with the VMWare Server Console - the first time it complained that it could not establish a secure connection (or similar) and then afterwards just claimed a timeout.

I could log into the host with SSH (it is not running X). A quick glance at 'top' showed vmware-rtc was up the top as usual, but the vmx processes were rather far down, not up high as usual, indicating very low or no processing was going on. I noted vmware-serverd was on the list too, but I can't remember if vmnet-bridge was there. I was having a slight panic at the time and didn't think to look for that.

A './vmware stop' in /init.d got stuck on halting the VM's. I then foolishly did a './vmware start' and it promptly hung the server. In hindsight, good old 'shutdown -r now' would have likely been wiser.

The system booted successfully but VMWare did not come up. Now attempting a './vmware start' caused it to complain the virtual networking was not set up properly. I ran through the perl config script like I did when I originally installed VMWare Server, it rebuilt the vmmon module and then off everything went again, just fine. This is my first experience of this issue, but it has made me slightly nervous since I was happy with the setup and moved it to production nearly two months ago now.

That said, I'd still like an idea of how to find out what went wrong and how to prevent it, so I thought I'd post here. For the record it is 1.0.3 running on Linux 2.6.19.7, e1000 LAN cards (PCIE server adapter version) and an aacraid 2410SA RAID card.

    • moved to the bottom of this thread --

hi rdthickman,

i`m not sure if your posting really fits into this thread and recommend posting it as a separate one again. this thread is about sporadically loose of VMs network connectivity so you may understand that it`s my intention to keep it "clean" at some degree. if people start to reply to your posting in this thread things get mixed up very quickly. maybe i`m wrong, but i think you have a different kind of problem

regards
roland

there are many issues related to the Broadcom gig nics and windows >2003...esp with the recent OS updates.
i really start thinking Broadcom is crap. We try to setup an unattended windows installation on a HP server-blade and the nic drivers don`t "eat" the ip configuration from unattend.txt....
anyway - it`s weird that we have many people with broadcom here. not sure if this is by chance or because the problem is related to broadcom or nic hardware/driver quality.
so let`s collect more data first......

anybody out there with VMs loosing their network ?
:)

Ive just set up Fedora 6 on VMware Server running on WIndows 2003. I cannot get the Guest Fedora box to ping anything on the network. IM running a Intel(R) PRO/1000 EB Network Connection with I/O Acceleration. Ive had a freenas box running on the same server and the network sometimes worked and sometimes dropped so im keen to find out why its doing it

Nr. 2
Name: scunningham
VMWare: Server 1.0.3
Hardware/Driver: 00:10.0 Ethernet controller: Realtek
Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev
10), 8139too
Host: Fedora Core 6
Guests: Mixture or Linux Distros and Windows 2000
Server
Symptoms: One of the VM's is a mailserver which is
happily still accepting email from the outside world
though it is having a problem (timeouts) sending
outgoing messages. This may not be related though.
Packet captures show TCP checksum errors.
Comment: The timeout problem for outbound mail is
not related. My ISP is now blocking these.

Sorry all.... Haven't had time to follow up on this.

The problem turned out to be the rtl drivers. Switched from the 8139too driver to 8139cp and problem went away.

Just a little update...

When posting my original message i set up some pings on my servers.

i set up a ping on my guest pinging a machine on our network

i also set up two pings on the host, one pinging the guest and one pinging another machine on the network.

I've not had the issue since.

Now it may be that my problem differs from yours but this seems to have done the trick (fingers crossed) by now i would have expected it to drop out.

I'm having this issue on a Dell PowerEdge 1750 with Broadcom integrated gigabit adapters running Ubuntu 6.06 Server. I'm trying the ping trick to see if it keeps them on the network. I just set this server up so I've only been having this issue for a day, but if I'm going to use this as a development environment I need the VMs to be on the network reliably. Figures, everything else about the setup went sooo smooth, and now this. :)

The ping trick didn't work, another one failed within 15 minutes. Crap. I guess Ubuntu is out the door since it's not supported by Dell.

I too have this same problem. I have 2 servers running and both lose connection randomly. The 2 vm's and the host are all running CentOS 5. The machine is a PowerEdge 2850. I'll grab the specs for this tomorrow but wondered if anyone has come up with a solid solution yet? I have tried to update to VMWare 1.0.4 and that doesn't seem to have fixed the problem. I lose connection at least 3 times a day.

Nr. 7
Name: curriertech
VMware Server 1.0.4 build-56528
Hardware: Dell PE1750, 8 GB Ram, Dual Xeons 3.06ghz,
Host: Ubuntu 6.06 LTS 2.6.15-26-server
Guests: Windows Server 2003 R2 SP2. (Allocated 2gb Ram - Created from scratch),broadcom NetXtreme NIC
Symptoms:
Bridge network connection fails during quieter periods, all ping requests fail. A timed perpetual ping to the VMs did not stop the problem. Restarting the guest temporarily fixes the problem.
UPDATE: I updated the kernel to 2.6.15-29-server and recompiled/installed VMWare Server. No network issues at all in the last 12 hours, but that may just be a fluke.


Networking problems, to resolve:

Nr. 1
Name: SrMarcos
VMware: Server ???
Hardware/Driver: Broadcom NetXtreme Gigabit Ethernet / 8.27.1.0, Intel PRO/Wireless 2200BG / 9.0.2.31 <--which is used for bridging?
Host: WinXP SP2
Guests: Solaris x86, multiple CentOS Linux installs, and a couple of Win2003 servers.
Symptoms: Communication suddenly stopped
Comment: for now, it started working again again after re-configuring the network - but no clue, why it happened.


Nr. 2
Name: scunningham
VMWare: Server 1.0.3
Hardware/Driver: 00:10.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10), 8139too
Host: Fedora Core 6
Guests: Mixture or Linux Distros and Windows 2000 Server
Symptoms: One of the VM's is a mailserver which is happily still accepting email from the outside world though it is having a problem (timeouts) sending outgoing messages. This may not be related though. Packet captures show TCP checksum errors.
Comment: The timeout problem for outbound mail is not related. My ISP is now blocking these.


Nr. 3
Name: devzero
VMWare: troughout all hosted products, have seen this on workstation for a long time, but on gsx, server too.
Hardware/Driver: can`t tell for now, but i think it were different brands of nics involved
Host: Mostly SuSE Linux
Guests: Windows, Linux
Symptoms: VMs loose network connection and aren`t pingable anymore. Especially if they haven`t been used for some time. It just "looks" that they loose their connection due to lack of useage or absence of communication


Nr 4
Name: Dallas
VMware: VMserver 1.0.3
Hardware/Driver: HP DL 140 G3 Server, Duo core Xeon 2.2, 146 gig scsi hard drives,4 gigs ram, 2 Embedded Broadcom NetExtreme 5721 pci-e gbit, Driver Version 9.52.0.0 dated 5/15/06, The switch it is attached to is a Cisco 9550 48 port.
Host: Host OS Windows 2000 Terminal Server
Guests: Windows XP Pro SP2, 256 meg ram, 8 gig partition - Windows 2003 Standard, 1 gig ram, 20 gig partitions - CentOS 4.5, 1 gig ram, 8 gig partition
Symptoms:
I to am having all of my VM's disconnecting after 3-4 hours. All of the VM's can ping one another as well as ping to & from the host but no other machine on the network can ping a VM. The machines on the network can ping the host. This was a fresh install of the host and all the VM's, and this problem was there from the start. The only way I could get these to stay online for more then 3 or 4 hours was to disable one of the network cards in the host OS.


Nr. 5
Name: Solema
VMWare: Server 1.0.3
Hardware: Dell PowerEdge 2950, 2xQuad Core Xeon 1.6GHz
8GB of DDR2 FB-DIMM, 2x73GB 15K RAID1 OS, 4x300GB 15K RAID5 VM and storage drive, 2x Broadcom integrated Gigabit NICs (teamed, assigned to host OS), 2x Dual-Port Intel Pro1000 Gigabit Adapters, running in a load-balance /failover NIC team (no TCP/IP on NIC team, assigned to VMware)
Host: Windows 2003 Enterprise x64 Edition
Guests: VM's, each running 2003 Enterprise x64 , Each VM is set up with two processors, on ethernet. All VM's are bridged to the Intel NIC team.
Symptoms:
The VM's on the machine will periodically, and frequently (once every couple days) lose the ability to communicate with the network. It's frustrating the users since the shipping system is on one of the VM's installed on the server. I had tried enabling diagnostic logging on the shipping VM, but when it went offline the log file was over 1GB in size! I took one NIC out of the team and assigned it exclusively to the shipping VM. Still all VM's lose their connectivity at some point. I really want to be able to use NIC teaming for high-availability failover purposes if possible.


Nr. 6
Name: elnino9
VMWare: Server 1.0.1 Build 29996
Hardware: Dell PE2900, 4 GB Ram, Dual Xeons 2ghz,
Host: Windows Server 2003 R2 SP1
Guests: Windows Server 2003 R2 SP1. (Allocated 2.5gb Ram - Created from Physical Server),broadcom BCM5708C NIC
Symptoms:
VM is a P2V`ed system. Bridge network connection fails during quieter periods, sometimes it can reply to ping requests sometimes not. A Disable / Enable of the VM adapter usualy rectifies the problem. Disabling power management on the nic made no difference. Continual ping made no difference, either. Another NIC (3com Gigabit) was added and problem is still evident. We currently run about 12 VM machines, this is our busiest. The disconnection does seem to happen at the servers quieter moments although we have witnessed it during a busy period.


Nr. 7
Name: curriertech
VMware Server 1.0.4 build-56528
Hardware: Dell PE1750, 8 GB Ram, Dual Xeons 3.06ghz,
Host: Ubuntu 6.06 LTS 2.6.15-26-server
Guests: Windows Server 2003 R2 SP2. (Allocated 2gb Ram - Created from scratch),broadcom NetXtreme NIC
Symptoms:
Bridge network connection fails during quieter periods, all ping requests fail. A timed perpetual ping to the VMs did not stop the problem. Restarting the guest temporarily fixes the problem.
UPDATE: I updated the kernel to 2.6.15-29-server and recompiled/installed VMWare Server. No network issues at all in the last 12 hours, but that may just be a fluke.


Nr. 8
Name: dseibert
I too have this same problem. I have 2 servers running and both lose connection randomly. The 2 vm's and the host are all running CentOS 5. The machine is a PowerEdge 2850. I'll grab the specs for this tomorrow but wondered if anyone has come up with a solid solution yet? I have tried to update to VMWare 1.0.4 and that doesn't seem to have fixed the problem. I lose connection at least 3 times a day.


Since upgrading Ubuntu to 2.6.15-29 (from 2.6.15-26) I have not had any further problems with the network connectivity of my VMs.

We are having similar issues with certain VMs on a host losing network connectivity and a reboot of the VM fixes the issue.
Can someoen tell me what details you need form me and I will be more than happy to post them here.

thanks

go back 2 posts and you will see ;)

The problem has returned for me after having no issues for the past week. :(

I wonder if this issue is getting any attention from VMWare?

I rebuilt my server with RHEL4 and for a week all was working great but as of today one of the VMs fell off the network again.

Hi, my name is Scott and I am have intermittent loss of network connection with a VMware free server. We built our company's sharepoint server on a guest server running windows server 2003 enterprise x64 with SP2. We noticed that many times after the weekend we would come in and not be able to ping the the guest server, now it will happen throughout the week. Everything looked fine when we consoled in and it would start working again after a reboot. Our host machine is a Dell Blade 1955 system running Windows Server 2003 Ent. x64 w/ SP2. The Adapters are Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client). We had the adapters teamed and I read in one of the threads that the broadcom software can cause network problems with VMs, so I have just upgraded our drivers to version 3.4.10.0 and removed the teaming in an attempt to solve the issue. Our VMware Server is version 1.0.3 build-44356. We have two other VMs on this blade, one is the development server for sharepoint with the same configs, and they do not have these symptoms. What other information will you need to investigate this problem?


Another update, my system hasn't displayed the symptoms since September now. We were getting it 2-3 times a week.

we have a ping on the host pinging out to another machine on the net and the guest machine. I also have another machine on the net pinging the host.

Certainly seems to be working for me. I sugest anyone with the issue tries this.


Having random disconnect problems as well.
VMServer 1.0.4
Server hardware - Dell Power Edge, 8 GB Ram, Dual Xeon 5130 @ 2ghz
Host NIC - Broadcom BCM5708C NeteXtreme II GigE (Driver: 2.6.14.0 Dated 4/3/2006)
Host - Windows Server 2003 R2 SP2 64 bit
Guest - Windows Server 2003 R2 SP2 64 bit. (2.75gb Ram, 2 CPU, usinge1000 network driver) Running Exchange 2007(Edge transport & Hub roles)
Symptoms - Bridge network connection fails randomly and reboot of guest restores connectivity. Sometimes in the middle of the night, sometimes in the middle of the day the guest loses it's network connection and can be restored by a reboot of the guest VM. There are 2 other VMs on this server that are NOT having this problem, both are same OS but are 32bit. Something that I tried and resulted in some odd findings; when it failed, I moved from automatic bridging to a specific net (VMNet6). After moving the bridge over to the connection explicitly, network connectivity returned, then upon reverting back to automatic bridging, it failed again and would not come back. After a reboot of the host, the automatic bridging once again worked fine for the time being.

Are there any workarounds or fixes already known ?
In my case the network timeouts occur so often, so it´s impossible to use it in production !

HOST: Debian 4 (64bit)
GUEST: Windows Server 2003 Standard (32bit)
NETWORK CARD: RealTek 8168

My guest uses briged networking with fix IP !


If you are using it "in production" I suggest that you use one of VMware Server certified host operating systems.

Debian is not even mentioned there.

http://www.vmware.com/pdf/server_admin_manual.pdf

Page 7-9


Hi,

currently I do not use it in production ! I´ve just built up a testing environment and ran into this problems !
I know that Debian is not offical supported (yet ?) but I read about many people which run VMWare Server on Debian without
any problems ...

elnino9, what was the fix that you used to solve your problem? It sounds like your symptoms are similar to what we are dealing with.

Hi All,

We've purchased a Dell PowerEdge 1950 and we've been trying for couple of weeks to run 2 Windows 2003 guest OSs. The virtual machines run fine for an undefined period of time and then connectivity is lost (the guest OSes do not respond to pinging), the host OS is still fine. After about 10-15 minutes the guest machines come back to life as nothing ever happened and they did not crash, there is absolutely nothing in the event logs. The guest servers use "bridged networking" and use dynamic IPs, the networks cards in the server are Broadcom BCM5708C NeteXtreme II, and the we use VMware Server 1.0.4. We've tried various versions of the host OS (Windows 2003 32-bit Standard and Enterprise editions and also Windows 64-bit Standard and Enterprise editions) and various versions of the network drivers from Dell and Broadcom. Same issue.

At the same time we were researching the issue on the internet, it seemed that some people point out that this behaviour might be caused by the new TOE (TCP/IP Offload Engine) that's in the Broadcom NICs. So we followed the procedure below supplied by Dell. Same issue.

Now, the identical 2 guest machines work just fine on two different machines and they also run fine on one other machine. So far one common thing that we have noticed when researching this issue is the use of the Broadcom NeteXtreme II NICs. I would say to stay away from those network cards if you're purchasing a new server and using it with VMware. I am tempted to try Microsoft Virtual Server just to narrow down the problem, however due to time constraints and the return policy on the Dell server I will not have time to try it.

At this point, I think I'll return the server and do some more research before buying a new one.
Please let me know if anyone has resolved this issue using a "real" solution, not half-measures (i.e. constant pinging of the guest machines or switching to NAT for VMware network configuration)?

Thanks,
Arthur

======================================================================
For the interested, here is the Dell procedure to disable TOE:

How to disable TOE
1) Uninstall the Broadcom driver from the add-remove programs and power down the server,
2) Remove the TOE Key by following these steps.
Removing the TOE Key
Before removing the TOE key:
a. Disconnect the system from AC power and Remove the top cover.
b. Locate the TOE key.
<<DSC05058[1].jpg>>
c. Press in on the TOE key RJ-11 release latch.
<<DSC05060[1].jpg>>
d. Lift the TOE key out of the chassis.

3) Power on the server and reinstall the latest Broadcom driver.
4) Follow the steps of the article : http://support.microsoft.com/kb/927695
5) Follow the steps of the article http://support.microsoft.com/kb/904946
6) Turn off the TCP Chimney
To turn off TCP Chimney by using the Netsh.exe tool, follow these steps:
a. Click Start, click Run, type cmd, and then click OK.
b. At the command prompt, type Netsh int ip set chimney DISABLED, and then press ENTER


Hello,

i have the same Problem.

The only difference is, we had NO Network connection, wenn the VMs are startet. They succesfully obtain an IP Adress from our DHCP Server, but they dont get an answer when we trie to ping another PC on Network. Same issue when we ping from Network to the VMs. The only answer of a ping we get is when we ping from the VMServer to the VMs.

The VMware Server 1.0.0.4 is running on a Dell Latitude D820 XP SP2 with one Broadcom Netxtreme 57xx Gigabit Controller and a Intel Pro Wireless 3945ABG Network Connection. The VMs are configured in bridget mode (auto and manual testet). Nat works.

VM1 = Linux Suse 10.x

VM2 = Windows 2000 Pro SP4

The VMs works on another VMware server (also a latitude D820, same type/model) fine. But i cant find a misconfiguration or any difference between.

A few pages bevore i read the reason with Trendmicro. We had also Trendmicro (officescan) at work. I uninstalled it, but no change.

Thanks

Thomas



well, originally we had tried getting a ping request to the vm from another source but this didnt seem to work. in sheer desperation i ended up putting a ping on the host pinging out to another machine on the network
and the guest machine. I also have another machine on the net pinging
the host.
Sry, my english is not very good. Did i understand right? You keep the VMs alive when you make a continuity ping to the VMs?

That dont solve our Problem. Our VMs never get an working network connection in bridging mode. They only obtain an IPadress/DNS Entrys/etc. from our DHCP Server, when they just startet and up, ping or network just wont works.

edit @elnino9: sry i have not seen that your answer was a responce to smuray. So, forget my reply edit


Just FYI. I experienced the same problem over the weekend. We upgraded our corporate Trendmicro Anti-virus to the new 8.0 version. It appears this version does not play well with VMWare. it has something to do with the OfficeScan firewall. I found this article: http://esupport.trendmicro.com/support/viewxml.do?ContentID=EN-1035343&id=EN-1035343

I have not tested the above suggestion as I immediately uninstalled the Officescan client to get throug the day (which seems to resolved the issue for now).

I thought I would pass along my experiences. Hopefully it will be helpful to others.


No one with any Idea?

I guess we're on our own then. Hopefully they fix it in v2.0


Well, my problem is fixed. It's not VMware, it the physical CD-ROM/DVD drive that was causing the machines to stop and only if they drive was connected to a VM machine. Now I just disconnect the CD-ROM/DVD drive from all of my VM machines and connect it only when I need it. Typically after the disconnect happens, I get this error in the Application Event Log on the host server:

Event ID: 9

The device, \Device\Ide\IdePort0, did not respond within the timeout period.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


This issuw sucks, I know. Here's what happened, VMWare for some crazy reason changed your NIC in your guest. If you had DHCP enabled most likely you got a differen't IP. If you have a Static IP my guess is that on your guest you're network connection will be called "Local Area Connection 2" or 3 or however many times this occured. When you try and assign your static IP your guest OS will say "There is already a device assigned this IP..blah blah blah" then it will assign it anyway and it's back online. This is a very very common problem but nobody has admitted to it. Just search for losing network connection, can't ping or whatever the typical symptom.

Personally I think the CD/DVD thing is a bug but again, no one will fess up to it. Having to remove one of the more importatnt devices on your system just for it to work properly means some code needs to be revisited.

I had the same problem just last week when I "ghosted" the primary virtual hard drive of one of my VMs (running Windows Server 2003 Standard) to a larger one. The virtual network card, which used to have a static IP address, was suddenly receiving DHCP addresses. When I put it back to it's regular IP address, a warning appeared telling me that another adapter had the same address and whether I was sure I wanted to proceed.

The same thing happened to me many years ago on an old physical server that was running Win2K Adv. Server. The proper fix, IIRC, was to boot up in safe mode, go into the device mangler, delete all instances of the network card, and reboot normally and let the OS redetect the NIC. Perhaps the same thing will work in the VM.

I would try the fix in the VM I mentioned in the first paragraph, however, it is running normally and my users are not complaining so I'll just go ahead and leave it be.

Hopefully someone found this useful.



try this.

Install VMWare Server on another machine, copy your directory that has the VM you are having connectivity problems to it, register the VM on the new machine and fire up your VM on the new machine. I had a similar issue and it was because of certain CISCO Router/Switch rules that my machine was bound to that gave me problems. Also, try and get a machine that does not have a Broadcom network anything. Run out a buy an Intel Pro1000. VMWare Servers has undocumented tissy fits with Broadcoms that no one will fess up to. Strangely, on any Windows machine it works flawlwssly. It's when VMWare uses it's drivers within the guest that the bug rears it's nasty face.

A few other things to check...

Shut off the Windows Firewall...

Consider connecting your machine to a Gigabit switch. If you are physically linked to a 100Mbit connection, your VM is trying to talk to anymachine via tcp/ip at 1Gbit. If your trying to connect to Windows 2003 Server it will notice that data is geting sent to it faster than the NIC connection. That's when Denial of service protection kicks in.....no more connection. So consider upgrading your Switch to 1Gbit.

This NIC failing has plagued me for as long as I have been using VMWare. Yes these big name companies and million users are using this product, but lets look at how many have a Broadcom NIC and how many of those unfortunate souls are having the same problems. We (users in the Forum) can only see what is posted, VMWare will not disclose how many have incidents/calls on this issue. I bet the majority of those big name companies are users in this forum having the same problem.

I'm not slamming you VMWare but there is an issue here that needs to be addressed. Hopefully you fixed it in version 2.0.

I just downloaded it


COS wrote:
I'm not slamming you VMWare but there is an issue here that needs to be addressed. Hopefully you fixed it in version 2.0.
I just downloaded it
Good luck...
http://communities.vmware.com/community/beta/server2.0

Peter, thanks for the link....
I can't connect to the MUI on my Vista host. I'll post in the beta section.

There is no MUI, no VMware Server Console. There is Web Access UI and misplaced VMware Infrastructure Client.

Peter, I still can't get anything going. I posted my question here...
http://communities.vmware.com/thread/115215?tstart=0

I can not get to any console to build a VM.


I found that three of my VMs had the cd drive connected, so I disconnected them. Within 10 minutes, one of them fell off the network again, so I don't think the cdrom issue is the 'fix'. :(


Do you have a Broadcom NIC? If so consider getting an Intel Pro 1000 or something other than a Broadcom.

Yep, broadcom. I'll try changing to an Intel card and see if that fixes it.


All:

Here is a new twist. After finding and resolving the problem once before with my VMs stopping all communication with the host via Bridged Networking, all of my VMs (Solaris, multiple flavors of Linux, Windoze running on a WinXP host) STOPPED AGAIN. They were working yesterday, but today...nothing.

I found and resolved the problem again, and this time I am 100% CERTAIN that I did not make the change that caused it. I used to heap ridicule upon people who claimed their computer's (and applications') settings just "changed themselves", but I've reconsidered that now (sheepish grin). I made no changes to the Virtual Networking settings, yet there they were...

The VMNet0 was set to bridge to an automatically-selected network adapter. Apparently, it's now choosing the wrong one, as there are times my wireless adapter is active; other times, it's my wired adapter. Today I tried both with no luck. BUT...setting VMNet0 to a physical adapter restores all communication!

I hope this helps someone. I really, really like VMWare...but I'm beginning to distrust it.

Marcos


I just installed an Intel card and bound the VMs to it. We'll see how it goes. :)

Great!
Please let us know if it exibits the same. My desktop that has an onboard Intel NIC has never lost connection for my VM's however, my server that has a friggin Broadcom has lost connectio on the VM's at least 4 times.

Now here's some bad info for EVERYONE.....
We just recieved our HP DL380 Servers and the onboard NIC's are Broadcom!

WTF!


Hi All,

Just to contribute our own experience with this issue :-

Our servers are all based on Intel server platform components (ie. Intel Server Chassis, Motherboard, RAID controllers, etc.). The onboard NICs are also Intel GB NICs and the add-on quad port and dual port Gigabit NICs are also Intel ones.

The NICs have always been manually assigned (never automatic bridging) and the Host server runs either Windows 2003 server 64bit Standard or Enterprise Editions.

The problem also occurs with these servers. The frequency of the occurrence appears to be random - one windows server VM ran for months without any issues and then suddenly presented the problem (no network traffic in or out of the VM - host still has network connectivity). Other VMs could run a day or so and then encounter the problem. Often the problem presents itself overnight when the servers are relatively quiet (ie. little or no network activity) and only once or twice during working hours. No errors are logged in the Event logs of either VMs or Host. With Servers hosting multiple VMs, not all VMs are affected at the same time.

As the problem does not affect all VMs at the same time, rebooting the host to fix the problem was out of the question. We have found that when the problem occurs, disabling then re-enabling the network adapter within the Windows VM (not the host) usually fixes the problem (until it reoccurs) without needing to reboot either the VM or the host. We had less success with Linux VMs as disabling and reenabling the network interface does not always work and usually have to resort to rebooting the linux VM to fix the problem. As a workaround, following a suggestion made by another posted on this issue, opening a Command prompt and running a continuous ping from the VMs to another PC / router / print server on the network appears to prevent the network from dropping off. If someone accidentally closes the command prompt which ends the continuous pings, the problem will reoccur.

No windows updates have been performed on either host or VMs (automatic updates are disabled) after the initial setup. All server admins have been briefed on the issue and all know not to install anything or update anything without first checking with us. So far, we know that virtual servers built from scratch (ie. Server Assembled, OS installed, All updates applied, VMWare Server Installed finally VMs created, configured and guest OSes installed, all updates applied), do encounter this issue at all various client sites. Note that similarly built servers (ie.same model boards, CPUs, etc) running Microsoft Virtual Server at different client sites have never encountered this issue.

We are also hanging out for a resolution to this problem.


Don't hold your breath, it doesn't seem like VMWare is giving this issue any attention at all...and with them working on 2.0 now we'll probably just have to wait for it to get out of beta and then upgrade to fix this issue. :(


Just had the VM guest lose it's network connection again. I noticed something odd this time, it had taken all of the CPUs allocated to it and pegged them out at 100% usage. I went into the guest and it seemed to function fine, but was showing high CPU usage. I disabled the network adapter inside of the guest and re-enabled it and it all came back to life without any other drama and the CPU usage dropped back down to where it should be.

1.0.4 on Windows Server 2003 x64 SP2, the guest is 2003 x64 SP2 as well.

I am thinking this could be fixed with an alternate network adapter driver in the guest.... Anyone loaded up 2.0 in a 64 bit environment and seen what network card / driver version it uses? Curious if that driver is newer, and if it is, can it be used in a 1.0.4 enviornment?


It's been two weeks on the Intel cards, and no issues yet.

up to 3 weeks without incident, now. I wish I'd installed the intel nic's as soon as this issue became apparent!

That's great news!
We are purchasing different NIC's for our HP servers that are running VMWare Server. Darn those Broadcom NIC's.

Darn those Broadcom NIC's.
couldn`t that be a broadcom driver issue, too ?
anyway - nic are cheap and broadcom gave me hassle in other circumstances, too....

What specific Broadcom NIC models are you replacing? We also experienced random VM disconnections and could not get anything usefull off VMware Support.


Ours are the NetXtreme integrated gigabit (x2) cards. Those seem to be on every server we have, but fortunately only two are being used with VMWare Server so only those two have had issues.


According to "Device Manager" our NIC's are "Broadcom NetXtreme Gigabit Ethernet" and there are two of them onboard. We had problems with them losing connection so we now use Intel Pro 1000 PCI-X cards. We have not had an issue with the servers we added the Intel cards 4 months ago yet. With the broadcoms it was once every two weeks or so. It's been 4 months on the older servers.

I find this thread fascinating, but frustrating as well. I am having problems with the bridged network dieing. Things were working fine. I shut down all guest systems, then rebooted the host. When I restarted things, I could not communicate with any of the guests from PC's on the LAN. One of the guest systems is a DC, so I could not get IP addresses for the other systems. I could ping a guest from another guest, but the guest could not ping other PC's on the LAN.

I'm using an Intel Pro/1000 MTW onboard card on the server. It did have the power mgmt set on, so I disabled that. I also forced the VMnet0 adapter to that NIC instead of having it automatically chosen. I am using VMware server 1.0.4 with Windows 2003 Server Enterprise SP1. All guest systems are using Server 2003 Standard. No windows updates have been done, so no changes were made to the O/S. I currently am not running any AV.

In you guest OS right click the "My Network Places" and select "Properties".
You should see your network connection. Does it say "Local Area Connection" or "Local Area Connection 2"?
I am assuming that you have a static IP to your guest. Try reassigning the IP, if my guess is correct, it will say "There is already a device with this IP....." or something like that. It will assign it anyway. Some people just delete the NIC in the guest and rediscover it.


Just had the same intermittent network failure on my laptop (vista). NIC is Realtek, not Broadcom.

Fixed it by running vmware-toolbox and selecting Ethernet as a shared device...


Some people have said they fixed this problem by changing the virtual NIC from an AMD to an Intel NIC.


I am having a similar issue but i'm not sure if the syptoms are exactly the same.

I have a Windows 2003 Enterprise server running VMWare Server 1.0.4. I have a few virtual machines running on it. Most are Windows but some are linux. The virtual machine I am having problems with a Windows 2003 Standard machine. Basically what will happen is that the bridged network card will suddenly stop responding to network requests (both from the guest as well as to the guest). I can restore the connection by disconnecting the virtual card and reconnecting it. Once I disconnect and reconnect the card immediately starts responding to pings. I know the exact time the machine stopped responding and I checked the event logs on both the guest and host machines and there was nothing. I also checked the vmware logs and there was also nothing. I did see a line item stating that the network card had been enabled at the time that I performed the disconnect and reconnect but thats it.

The network configuration

I have checked everything I can think of to rule out windows issues. I have had this problem before with other Windows Vmware images running on Windows Vmware Servers. I never resolved the issue because I ended up abandoning the project I was working on.

I'm hoping someone has some useful insight into what might be causing this issue. Its very annoying to have to get up in the middle of the night to disconnect and reconnect the card.

Also on a side note, I read in an ealier post that someone changed their virtual NIC from AMD to INTEL. Can someone tell me how to do this?


Well I wish I wasn't here to say this right now, but after over 3 flawless weeks on the intel cards, I've had two VMs fall off the network in the last two days. :(


Unfortunately, I'm not surprised.

As mentioned in my earlier post, my servers have always been using Intel cards and they do encounter the problem so I still think it's a VMWare problem and nothing to do with the hardware. Why ? Because a couple of these server used to be MS Virtual Servers before being converted to VMWare Servers and they never had the problem under MS Virtual Server.


BTW, I forgot to mention in my previous post that the network cards we are using are Intel Pro/1000 EB w/ IO Accelleration cards, not Broadcom. I noticed in an earlier post that someone was using the same card and had the same issue. Is it possible the there are some Intel cards that are having this issue as well? This perticular server has been running fine for years. I moved it to a new host server last weekend and thats when the problem started. Sounds to me like a vmware/hardware incompatibility but then again I did upgrade to 1.0.4 at the same time so who knows.


I was meaning: VMware tends to spuriously suppress my Ethernet device, I dont know why.

It happened again this morning; so I just ran vmware-toolbox, selected "devices" folder and checked again the Ethernet device, and again it's working fine.

Might be because I frequently suspend and restart my virtual machine.


I have been experiencing the same exact thing - and am about to lose my mind.

Have 2k3 host OS running VMware server with 4 physical NICs. Only 1 is accessible on the host, the other 3 are bridged to the 2 VMs.

Each of the 2K3 VMs work perfectly - with all the networking and bridging happeneding as they should. However - then, after a few days or a few hours - POOF. Networking goes away. After a reboot, things are fine. But this thing is like a ticking time bomb.

I have the VMs set for AMD NICs and will change to Intel to see how it goes...

Anywone else with any ideas?

Get paid VMware support and engage them, if you are planning to use VMware Server on a longer term.
If still not satisfied, use other product.

This is pulling at straws but can you confirm the link speed of the physical Ethernet switch the VMWare Host is connected to? Is it 100Mb or 1000Mb?
If it is 100Mb then try and set the virtual NIC's down to match the slowest link. I had some problems in the past with this. My Guests were linked at 1000Mb and my host was linked 100Mb. Theoretically this shouldn't matter because the Virtual Switch is supposed to do the "Dumbing down" of the link speed but it fixed my problems.

Here's the theory of why it may fix the problem. On 2003 Server and XP as well as Vista, when they recieve more data than the link speed, it sees this as a DOS attack and shuts down the network.


Dkitch,

Have you tried disconnecting the virtual network card instead of rebooting the server. When my server goes offline simply disconnecting the card and reconnecting it brings it back. May be alot easier than rebooting the whole box.

Can you tell me how you plan on switch the network card from AMD to Intel? I have heard people say that thats how they fixed it but no one will tell me how to do this.



COS,

I can confirm that my both ends are 1000Mbps. I have an Intel pro/1000 EB card connected to a cisco 1000Mbps switch.

If Windows is shutting down the connection as a deffensive response to a potential attack wouldn't you think it would give something in the Windows event log saying that thats what happened? Both my host and guest machine had nothing in the event logs.



An update -

I have discovered that a simple unplug / re-plugging in of the cat5 cable jumpstarts the connection within seconds and all then is well. I've yet to go down the path of changing the NICs to Intel as opposed to AMD - that process is next.


Our VM's did not show anything in the logs either. It was recomended by KevinG that I need to make the links the same speed and it fixed the problem. Actually he recomended we get a gigabit switch but back then $$ was tight so we just stepped down to 100Mb at the guest.
Our symptoms were as follows....
Connection was stable....
Data transfers were stable...
A medium sized data transfer occured...
NETWORK BLACKOUT at the guest...
Reboot VM and you're back online.

sound familiar?

COS,

After you took KevinG's advise and reset the link speed to 100mb, how long have you been running without problems? Did you do anything about changing the adapter type to AMD, or simply reset the link speed? I join with the other comments in saying if this is a Windows defense mechanism, it is wierd that nothing shows up in the security log. Perhaps some more investigation is needed. Perhaps also the virtual switch is not working properly and IT is the one shutting things down. My vote is for the latter, as Windows is usually pretty good at complaining to the logs about things that are a real problem (BSD's are often the exception).


Oh yes! Here! All of a sudden! At least a similar issue.

I've been running a single XP VM on a single XP host for more than a
year now - sometimes in bridged, sometimes in NAT mode. Host is VMWare
server 1.03. Got a Broadcom "NetXtreme 57xx Gigabit Controller", WLAN
is disabled. Now just today back at office I tried to switch to bridged
mode - and have no LAN contact out of the VM for the first time and for no
reason I can think of. I'm not aware having changed my host installation.

After dedicating the only NIC available as the VMNET0 adapter VMWare
told me my bridged ethernet interface was down. This persists after host reboot and I have no idea how to get back to a functional bridged mode. NAT
works fine.

After reading parts of this thread I updated my NIC driver - didn't
help. There's not much software installed on the host system besides AVG Antivirus and a Cisco VPN Client.

Thanks for any help on that.

Herbert


It has been a little more than 2 years and the guests are all running fine with no incidents.
hmmm.....
I never quite put it together before but these are broadcom NIC's on the host and they exhibited the same problems everyone is having. Could it be that dumbing the guest down to 100Mb may workaround the problem? I think i'll try bringig up a VM on the host with Broadcom NICS and see if they lose net connection if I set the Guest nic to 1000Mb but not send a ton of data.
I'll let you know when I set this up.

Thanks


OK. I'm glad others are on board with my mysery.

Tomorrow AM, I'm going to manually reconfigure each of my 2 bridged NICs in each of my 2 guess system to step down to 100meg. I'll keep the NICs in AMD mode and keep the HOST broadcom NIC at GIG speed.

The first time THAT goes down, I'll step back the HOST link speed to 100.

When THAT goes down, I'll most likely throw in a quad port intel NIC and disable the broadcoms altogether...

More to come...


Same problem as all here. I' ve created a batch that starts with the Guest - OS ( W2k3 Enterprise) and does a "ping - t" to the ADS - Server. After that the NIC ( Broadcom Giga) works without quit.
No solution, but a workaround

Olasan


Below is a post I put on experts exchange that lists my setup and the issues I have. No solution yet, but you can use my hardware to compare to yours. Mine seems to happen during backup of the System State, which is probably pretty intense for the server.

I have a rather odd issue relating to VMWare Server, Backup Exec and Windows 2003 R2 EE 64-bit.

Issue:
Guest
server nic stops responding. Can't connect to the server (ping or
shares) and the server can't connect to anything (Ping or shares). To
get the Nic working again I use VMWare Server Console, open the guest
server network connections, disable the nic, re-enable the nic.

Setup:
-I have 3 Dell PowerEdge 2950's all identical (VS1, VS2, VS3). Dual Quad Cores, 6 Nic's on each.
-Running VMWare Server 1.0.3 and 1.0.4
-Symantec Backup Exec 11d
-I
currently have the issue on two guests on two seperate VMWare Servers,
VS2 & VS3. One runs Citrix the other runs WSUS & Certificate
Server.
-All VS Servers and guests are running Win2k3 R2 EE 64-bit.
-Intel 1000PT Dual-Port nic's
-Backup
does not backup raw vmware files, I'm actually connecting to the
BERemote agent on the VMServer backing it up as if it were another
physical server.
-/3GB is not used
-No AV software

What I've Found Thus Far:
-The nic freeze seems to happen during nightly backups
-The nic freeze seems to be a result of VSS creating a snapshot of the volume to be backed up
-This does not happen consistently. We may got 3-4 days with no issues and then it happens again
-Event viewer has no information. Just displays netlogon issues, etc that are a result of the nic loss.


What' I've Done Thus Far:
-I've
setup a mock backup job that runs every 5 min during the day that backs
up the System State and Shadow Copy Components of the server. I can get
it to crash with this job. Same symptoms though, might get 10-15
successes then it the nic will stop responding.
-Bios Updates
-Updated NIC Drivers (Are the same as version as guests that are solid)

Thanks,
Andy



Oh no! This was my own fault! Sorry for that.

Herbert


So far SQL has not restarted since I upgraded the network card however I just received this update to my board post. It's an interesting theory. If the problem returns I will have to start looking at BE and see if there is some type of conflict with BE and the new version of Vmware.

Trevor Your
Operations Manager
ACCURATE TECHNOLOGIES INC.
47199 Cartier Drive
Wixom, Michigan 48393, USA
Phone: (248) 848-9200, Ext 121
http://www.accuratetechnologies.com


OK. Well, I've just about had it with VMWARE.

I added and additional quad port Intel NIC to eliminate the Broadcoms from the equation and had 24 stable hours. About 30 mins ago, even with my pings hitting it from every which direction, it took a dive.

It wakes up fine if i un/re-plug a cable.

Is still really a dead end? i.e. - No Solution out there?



Have you ruled out a posible hardware problem?


NIC hardware should be ruled out, I have it on occuring on seperate servers. For us, it seems to happen after a ms snapshot is done and then there is heavy network traffic. I wonder if it has anything to do with the offloading on the nic? I may try to disable this next.


Do you have packet errors on the physical switch trunk ports?

And if so what are the errors?



No packet errors:

GigabitEthernet1/0/13 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 001c.5754.a08d (bia 001c.5754.a08d)
Description: VS2 VMNET1
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 12/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:01, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 49623000 bits/sec, 4187 packets/sec
5 minute output rate 853000 bits/sec, 1648 packets/sec
966107898 packets input, 2745091341 bytes, 0 no buffer
Received 891792 broadcasts (0 multicast)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
399706018 packets output, 725775459 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out



Ok,

Are you using network load balancing?

Are you connecting a single host to multiple physical switches?


My logs are clean as well - as far as i know.

My setup is as follows -

Nic 1 bound to Host and then bridged to VM1 and VM2


Nic 2 NOT BOUND to Host and only bridged / bound to VM1


Nic 3 NOT BOUND to Host and only bridged / bound to VM2


Nic 4 NOT BOUND to Host and bridged / bound to VM1 and VM2


All 4 are now on a single Quad port Intel server NIC - i've eliminated the broadcoms from the equation.

Thing is - it seemd to be only NIC 4 that dumps out. A simply un/re-plug always does the trick.

My next plan of attack is to add a Nic 5 which would be Nic 4 split out to feed VM1 and VM2 separately - without a double bridge as this appears to be where the issue is.


Are all the nics are connected to one physical switch?

Mine are connected to a cisco 3750 switch.

I've got:


Intel 2port:
1- Host - Not bridged
2- VMNET0


Intel 2port:
3- VMNET1
4- Disconnected


-I have two different servers causing the issue, see EE post above.
-No NLB


-I'm now testing after disabling TCP Segmentation Offload on both Host & Guest.


This is falling on deaf VMWare ears. I'd call VMWare and get paid suport and post what they think is the solution.
One thing to try is make sure every link is the same as your slowest link.
i.e. If you physical switch is 100Mb and your VM's are linked 1000Mb, step them (or at least the one you know always fails) down to 100Mb. This eliminates the Guest Windows OS DOS (Denial Of Service) protection being the cause.

COS wrote:
i.e. If you physical switch is 100Mb and your VM's are linked 1000Mb, step them (or at least the one you know always fails) down to 100Mb. This eliminates the Guest Windows OS DOS (Denial Of Service) protection being the cause.
That particular one can be easily removed by a simple registry fix.

It may have nothing to do with VMware I had this exact symptom when I used to run GSX and it was W2K3 that was the root cause.

I solved it by using VLAN capable NICs and drivers.

Ok So far we have ruled out the physical switches and nic hardware.

Lets look at VMware Server host OS.

Time lines?

Have the systems run OK from day one and this problem has started relatively recently?

Correlation?

Does it only occur under an increased packet load scenario?

Since SP2?


Peter, can you post the reg file?

DisableDOS
http://support.microsoft.com/kb/898468

Netsh int ip set chimney DISABLED
http://support.microsoft.com/kb/945977


I've troubleshot this indepth with all the variables listed. No solution yet.

My latest was disabling TCP Segementation Offload. Server just crashed, didn't work.

See my post on page 8 or 9 to list all the variables that's i've got.

Onward we go with the troubleshooting...


Does everyone that experience this issue use Backup Exec? If so what version? I'm on 11d.

Are you able to turn on a perfmon to see what the peek network usage is when it drops?


We are using BackupExec 11d as well.

That being said I believe I have resolved our issue. My VM was originall running on GSX 3.2. I moved the machine to Vmware Server 1.0.4 recently and right after that is when the problem started. I noticed one day that our new VM that we just created said that it was connected at 1.0Gbps and the server i was having problems with was saying it was connected at 10Mbps. I know that this value is not reflective of the actual speed but it was something to look into non the less. I upgraded the vmware tools so that the new network driver was loaded. Its been a week now and I have had no network dropouts since. I am going to try backing out some of my diagnostic changes and see what happens.

Hope this helps someone else out there.


I've got the issue on both 1.0.4 and 1.0.3. The two vmware tools builds I've had it on are 56528 and 44356.

Are you running the vmxnet drivers in the guest or the intel drivers?

I've also setup a test backup job that runs every 5 min in the day backing up the system state. This lets me fast-forward the troubleshoot.


No Backup Exec here.

My 2 VM boxes only run MSFT ISA Server....


And they don't have the backup exec client loaded? Are the ISA servers heavy loaded?

So the issue must be related to congestion?


Are you able to turn on a perfmon to see what the peek network usage is when it drops?

If the peek usage exceeds the actual network capacity then

I see two possible fixes which are already posted by Peter_vm and COS.

1) Disable the offending DOS patches

2) Throttle the nic speed.

otherwise we can dig further



From a CPU tick POV, neither guest VM ISA is loaded. WIth both VMs doing their thing, the host CPU barely cracks 15%. As far as traffic, I have it in a mostly eval environment. I have a few machine behind both VMs, a bunch of PINGs throwing packets back and forth, and a VPN link that I'm pushing some decent traffic through.

My next thing is to eliminate the doubt VMnet - i.e. my VMnet that appears on both VMs. As through the process of elimination, this needs to be looked at.

More to come....



Network running at ~12% during backup, less on nightly as it goes to tape. Max I see hit is 24%...



12-24% not high enough to be a load issue.

How about a DOS issue, what are the TCP session states doing when the backup occurs

Netstat -s

TCP Statistics for IPv4

Active Opens = 138654

  • Passive Opens = 2758*
  • Failed Connection Attempts = 18849*
  • Reset Connections = 17521*
Current Connections = 37
Segments Received = 1006736405
Segments Sent = 1001887239
Segments Retransmitted = 45615

Do a before and after.


I'm showing 12, 9, 9... Not as high as yours...

We are not running any backup exec products on our guests. In fact, there is no backup anything installed on our guest. We backup each VM by running a vbscript to shut down a VM guest then copy the VM's directory to a nearstore. When the copy is finished the VM is fired back up and it goes to the next VM.


Have we ruled out the chimney feature?

Netsh int ip set chimney DISABLED


I have not ruled out chimney, and actually saw that earlier. I think I will disable it now...

I did find one thing... I'm monitoring from a monitored port using wireshark, and it appears only TCP and ICMP stop responding... UDP continues on...

Will let you know of any further findings... Thanks!

Really?

That's bizarre, this would indicate a TCP receiver failure. Are you sure it is maintaining UDP conversions. Do you see bidirectional UDP activity?

When it fails do a ping from the server outwards and from a node inwards with network monitor running and see if you capture it. If you do this means that VMware is fine and it is a OS TCP stack problem

You could try.

netsh interface tcp set global rss=disabled


Are you sure it is maintaining UDP conversions. Do you see bidirectional UDP activity?

that`s very interesting find.

is this 100% verified or can somebody acknowledge this?

to test for bidir udp connectivity - netcat/socat are very nice tools

Attached are snapshots of wireshark. (Server that fails is 10.10.1.10)

AtFailure.gif = Everything was working fine, then it wasn't.
PostFailure.gif = Data capture after the nic failed. Unable to ping or browse to server, but it seems to still try name queries.

These were captured from a seperate workstation using a port mirror.

What can I use to test UDP on Windows platform?

I was able to crash the server with the TCP Chimney disabled. Is the rss disabled still viable?


Looks like a UDP conversation to me! Confirmed!