VMware Communities
coalese
Enthusiast
Enthusiast

Workstation 7: Can't SSH to guest from host....Bridged networking broken?

Hi:

Just finished upgrading to Workstation 7 (from 6.5.3 prior). Host is a WinXP Pro SP3 system. Guests are Ubuntu Linux (Jaunty, 9.04).

Everything worked fine wit the prior 6.5.3 Workstation where I was able to SSH in to the guest Linux VMs from the Windows host, typically using Putty or WinSCP.

Now with 7.0, and no other changes to the VMs, I cannot get an SSH connection to the the guest Linux VMs from the Windows host, typically using Putty or WinSCP. However, I CAN SSH in from an external machine (laptop or other server machines I have around). So I know that SSH is running fine on the guests, as it always has.

Network connections to the guests are bridged, as they always have been.

What is strange is that I can ping the guests successfully from the host and get a response. Just SSH isn't working for some reason. Just tested https access to the guess and that doesn't seem to be working either though it did with 6.5.3. What's funny is that my Putty log looks like this:

2009-11-01 16:46:48 Looking up host "coalese-vm-rd"

2009-11-01 16:46:48 Connecting to 10.66.66.32 port 22

...but it never connects, though it has obviously found the vm on the network at the correct IP address. Same idea with HTTPS access to a web server on the guest....it just sits there saying it's connected, but the browser on the host never gets a response. Accessing SSH/HTTPS from a separate machine works just fine, which is why I suspect that the bridged networking in 7.0 is fubared.

It's almost like the bridge is blocking or not making available various ports like 22 and 443 on the guest to the host machine.

Anyone else run into such strangeness?

Reply
0 Kudos
46 Replies
admin
Immortal
Immortal

We'll get an XP SP3 host up and running ASAP and see if we're able to reproduce this.

One of our QA engineers loaned me his XP SP3 box with Intel pro/100m and NVidia nforce ethernet adapters, and I couldn't reproduce the problem while bridged to either of them. So this issue might be specific to your network hardware or drivers.

Greg

Reply
0 Kudos
rezboom
Contributor
Contributor

It looks like packets heading from the host to the guest are not being properly checksummed.

I disabled "Offload Checksum" on the physical network adapter the guest is bridged to, and now everything works as it should! Smiley Happy

Go to the device manager, and look for this setting in the properties of the bridged network adapter. (The setting in question may be named differently on your setup)

I hope this works for you guys having similar problems as well!

(My mainboard is an ASUS P5K Premium WiFI-AP, I'm using the onboard Realtek RTL8169/8110 Family Gigabit Ethernet NIC).

Reply
0 Kudos
coalese
Enthusiast
Enthusiast

Could you do me one more

favor and look into what offloading options you've enabled on the

bridge adapter (e.g., checksumming, TCP segmentation offloading aka

TSO, etc.)? Does host/guest communication work properly when those

offloading options are disabled? What adapter model/driver are you

using, for that matter?

I'm running a Netgear GA311 gigabit adapter with their latest drivers (v6.2). Never had any problem with this card.

Just like rezboom, I tried disabling Offload Checksum (it was set to Tx/Rx Checksum), and that fixed the problem for me as well! Good workaround.

However, what is confusing is that 6.5.3 (and earlier) never had this problem and the network card/driver has been flawless for a few years now. So I still think it's something strange that was introduced by 7.0.0 that is causing the root problem.

Reply
0 Kudos
admin
Immortal
Immortal

So I still think it's something strange that was introduced by 7.0.0 that is causing the root problem.

Yep, it looks like this is a regression. The logic that decides when to checksum packets changed between WS 6.5 and 7.0, and if we misdetect the offloading configuration of the bound adapter in WS 7.0, then the bridge is probably not going to do any checksumming at all.

I've filed a PR, so we'll get some developers looking at this. Thanks for bearing with me and providing us with useful feedback.

For all future visitors to this thread who are experiencing the same issue: please provide your NIC model and driver version, so that we can expand our hardware interop testing to watch for these failures in the future.

Thanks,

Greg

Reply
0 Kudos
kingneutron
Expert
Expert

--FYI Greg, I reported this same issue during the RC phase.

http://communities.vmware.com/thread/236176?tstart=0

I have a Gigabyte motherboard with Realtek Gig-ethernet card, and the problem with Bridged networking only showed up on the Windows XP-SP3 side. Debian 5.0364 bit host OS works fine with WS 7.

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

./. If you have appreciated my response, please remember to apply Helpful/Correct points. TIA

./. If you have appreciated my response, please remember to apply Helpful/Correct points. TIA
Reply
0 Kudos
riepe
Contributor
Contributor

Have a similar problem with using NAT configured. Guest runs for a while without any problem and stops working suddenly. ping is still OK but no other network activity (no nslookup, no http, ...). Restarting the VM-guest does not help. Restarting the vm services (/etc/init.d/vmware) helps for a while. I also tried to disable the offload checksum using ethtool -k ... but this workaround did also not work.

Host:

SLES11: Linux testhost 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 i686 i686 i

NIC1: Ethernet controller: Intel Corporation 82566DM Gigabit Network Connection (rev 02) (used for VM-networking)

NIC2: Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) (disabled)

Driver: e1000e

Guest:

Win XP SP3 with all patches and antivirus program

Running the VM-guest in a VMware Workstation 6.5.3 with no problem.

Maybe helps finding failure

Peter

Reply
0 Kudos
donmichelangelo
Contributor
Contributor

Another User-Report here: I also had the same problem that i couldnt connect via ssh/smb (other services i didnt checked) on my ubuntu 9.10 guest. First I thought it was a firewall related problem then ssh prob in the same time also figured out that smb doesnt work as it did before. Beside nearly pulling out my hair (yes I still have hair), I was close put this problem on the Ubuntu Upgrade what I had done before. After reading this thread this morning and applying the workaround with disabling the TCP Checksum of my Realtek NIC it finally works. Smiley Happy

Specs of the host machine:

Windows XP SP3 german

Gigabyte GA-P35-DS4 Mainboard with Realtek NIC 8111B

VMw Workstation 7

edit:

Realtek NIC driver version:

5.686.103.2008

I will later try out the newest version of the NIC driver...

Reply
0 Kudos
jianhui
Contributor
Contributor

I have similar problem on VMware Workstation 7.

Host: Windows XP SP3

Guest: RHAS4

NIC: Realtek RTL8168C(P)/8111C(P) PCI-E Gigabit Ethernet NIC

when i disable offload checksum, it works fine.

Reply
0 Kudos
kkeeton
Contributor
Contributor

same issue here

Host windows 7 fully updated

Guest Ubuntu (9.10)

driver b57nd60x

10.100.4.0 (later updated to 12.x.x)

same Host (windows7)

Guest server 2003 -no problems

I only have a problem with a Ubuntu

Used the auto install function (tested with the boot only on ubuntu and same issues.. testing with another distro now).

My 2003 server works with no issues.

one interesting is that I can get a DHCP request with no issues.

cant find a way to hack off the checksum on b.c. drivers yet so cant test that.

Reply
0 Kudos
kkeeton
Contributor
Contributor

dont seem to have the issue on other builds of linux, only newest ubuntu.. will test with other ubuntus

Reply
0 Kudos
jkounis
Contributor
Contributor

I have exactly the same problem on an Ubuntu 9.04 host / Win XP SP3 guest on

VM Workstation 7 (see http://communities.vmware.com/message/1419849). I can't

figure out how to disable "Offload Checksum" in the host OS,

so I'm stuck.

With bridged networking, I can ping any IP address from the guest precisely

3 times before I lose connectivity. I don't understand it: Once I lose

connectivity to one IP address, I can ping another IP address, but my quota is

3 pings per address.

NAT seems to work, but there are some weird timing issues that seem to

affect network connectivity on the guest with it, too. For example, Outlook on

the guest loses network connection to our IMAP server and has to be killed

through Task Manager at least once every half-hour.

This network problem is rendering my Windows XP guest unusable.

Reply
0 Kudos
kingneutron
Expert
Expert

--I realize it may be a pain to do this, but if you're using a Linux host I would recommend backing out 9.x and installing Ubuntu 8.04-LTS. It's supported, long-term, and works well with Vmware.

http://releases.ubuntu.com/hardy/

--Additionally, I would go with the Desktop or Alternate ISO - not the Server.

./. If you have appreciated my response, please remember to apply Helpful/Correct points. TIA

./. If you have appreciated my response, please remember to apply Helpful/Correct points. TIA
Reply
0 Kudos
jkounis
Contributor
Contributor

Thank you for the suggestion. Unfortunately, My installation depends on recent versions of software that are specific to Jaunty. Going back two versions of Ubuntu to Hardy would entail a lot of work just to get Workstation 7 networking to work. In fact, due to the advantages of some newer versions of MySQL server and Squirrelmail in 9.10, I am seriously considering upgrading to Karmic Koala rather than downgrading to Hardy.

I don't think it should be necessary to regress all the versions of all my software packages to 18-month-old versions, just to accommodate a broken version of VMWare Workstation.

Furthermore, it's not clear to me that it will fix anything. It appears to me that there's a bug in the way checksums are calculated in IP packets that are passed between VMWare Workstation and the host operating system. The software worked fine in 6.5 but ended up getting broken in version 7.0. I'm hoping a version 7.0.1 will come out soon to fix this problem.

I wonder: Is bridged networking working for anyone on Workstation 7 with a Linux Host Operating System and a Windows Guest?

Reply
0 Kudos
iase32
Contributor
Contributor

OK, I can confirm the same problem with Windows XP SP3 host and Windows XP guest. The host and the guest can ping each other and they can resolve their SMB names (so ICMP and UDP seems to be working). However there is no SMB network and generally nothing that uses TCP. Both the host and the guest have normal networking with any other machine on the network, the problem appears just between them.

All other possible reasons for the problem were ruled out (all firewalls disabled, etc.). The same VM was working flawlessly under Workstation 5.5.9. BTW, I hope that VMWare have solved the network instability problems in Workstation 6.0 and 6.5, which have kept us from upgrading so far. The instability manifested itself whenever the network is used in short intensive bursts and appears as network erros. After some testing I am convinced that this is a race condition issue but I have not time to test it on Workstation 7 yet.

I used the proposed workaround and disabled offloading checksums, which seems to solve the problem. My NIC is Realtek RTL8168C(P)/8111C(P) PCI-E Gigabit Ethernet.

Reply
0 Kudos
Gidian
Contributor
Contributor

The disable "Offload Checksum" fix worked for me as well. Here is an overview of my system and some comments on this behavior.

Setup

Workstation 7

Realtek RTL8168C(P)/8111C(P) PCI-E Gigabit Ethernet NIC

Host: Windows XP - 64bit

Guests: Windows XP - 32bit, CentOS, Fedora, Ubuntu

Symptoms

Bridge connections worked for all guests as far as communicating outside the network was concerned but not to the host. Strangely pinging from host > guest and guest > host also worked fine. However, none of my Windows network shares and samba network shares worked between guest and host. And of course SSH did not work either. Guest to guest shares and SSH worked ok though.

I just also wanted to mention, that similarly to this issue, in WS 6.5, the speed for transferring a file with network shares was extremely slow. And a similar fix (Disabling "Large Send Offload") in the NIC fixed that issue. I already uninstalled WS 7 so I do not know if it is the same case here.

Reply
0 Kudos
coalese
Enthusiast
Enthusiast

Just upgraded to Workstation 7.0.1.

Problem has not been fixed in this release, despite it being brought to VMWare's attention many moons ago.

The same workaround where you disable Checksum Offload works with 7.0.1.

Let's just say, given how much info has been provided to VMWare long ago, that having this bug still be present in the recent 7.0.1 release is rather unimpressive.

If there's one thing I can't stand, it's incompetence, and this is cutting pretty close to that, IMO.

Reply
0 Kudos
admin
Immortal
Immortal

Hi Coalese,

The problem has been addressed (for many moons) on our development branches, so the next minor point release (7.1) will have the fix. For sub-catastrophic issues, we rarely crossport fixes to our sub-minor point release branches.

Sorry for the inconvenience,

Greg

Reply
0 Kudos
coalese
Enthusiast
Enthusiast

I suppose that depends on who defines "sub-catastrophic".

Many folks have run into the issue and not been able to communicate to their VMs as a result of this, and haven't found this thread which outlines the workaround.

I would humbly suggest that not being able to talk to a VM from your host machine might have been more serious than you guys have made it out to be.

But personally, since I already knew about the fix, it wasn't that big a deal. Just disappointing is all.

Reply
0 Kudos
hebalder
Contributor
Contributor

Hi,

workaround works for me too.

http://communities.vmware.com/message/1488422#1488422

cu

Hugo

Reply
0 Kudos
merwinspall
Contributor
Contributor

I'm afraid I have to agree with coalese.

I've lost track of the number of hours I lost working around the fact the host<->guest networking did not work correctly on some VM setups, and all due to this darn bug.

It may have a simple workaround, but for most people, one must FIND THIS THREAD (or similar one) in order to implement it.

In my case, it came to down to actually figuring it out when I remembered I had issues with WireShark not working because of the side effects of checksum offloading optimization, and theorizing that it probably also interfered with VMWare as well (which it does, at least in some cases).

I turned off the offloading and the problems evaporated.

Months passed and I experienced this issue with another VM at my new employer.

Forgetting the solution I ran a google search and found this thread - luckily - since my memory failed me.

Sure enough, once again, disabling the offloading optimization fixed my problems.

I'd be willing to bet that millions of dollars of employee time have been wasted trying to discover the cause of this issue; and VMWare considers it non-catastrophic?

How about catastrophic to the pocket book? To customer satisfaction?

You bet it's disappointing.

Shame on you VMWare.

Reply
0 Kudos