VMware Cloud Community
Myst
Contributor
Contributor

Shorewall and ESX 4 - Strange Networking Issue

Hi, I am wondering if you could help me out with this very strange situation, before I go down the VMware Support route. I want to see if anyone else has had this problem before.

We have a couple of ESX4 DRS/HA Clusters on our network, different IP ranges, gateways and firewall's (and in the case of this werid issue, completely different data centres!). We have not noticed this issue with our ESX3 clusters we currently still have.

The two builds I am able to prove this happens on (so I am able to get additional logs if required), as 175625, and U1 (208167).

-- We have built two Linux boxes, one with Fedora 11 and one with Fedora 12 on, which we then have installed Shorewall on. (F11, Shorewall v4.2.10 installed || FC12, Shorewall v4.4.3) Which installs find and we have had no issues. However once we set the server up and we set up NAT and everything for it, we start to get network issues, but only for servers on the same physical host as the shorewall server, that use the shorewall as their firewall.

eg Internet <-> Shorewall <-> Server

The moment we vMotion the Shorewall server, or the server behind it to a different physical host, the network issues stop.

Network issues are extended wait times for connecting and getting data from the server, say for example the Server is a web server, and shorewall has port 80 forwarded to it, going to the site behind the server can take anywhere up to 3 minutes to complete, however once we vMotion the servers away from each other, they will take 2 or 3 seconds -- in this instance we have used phpMyAdmin, a standard PHP script with the phpinfo() command, zip file download, and zip file uploads via FTP. -- This can also include slow console via SSH etc, basically anything networkable.

Taking one example, we have two physical hosts, ESX4 U1, in an HA/DRS cluster;

Network: - Standard Virtual Switches

>> vSwitch0 - 1 x Virtual Machine Port Group - NAT Network, 1x Service Console, 2 Physical adapters assigned to the vSwitch;

>> vSwitch1 - 1 x Virtual Machine Port Group - External Network, 1 Physical adapter assigned to the vSwitch;

These are available on both physical hosts.

Shorewall server has two nics, one assigned to the NAT network, one assigned to the External network.

Linux server / Windows server, single nic assigned to the NAT network.

Both physcal servers are connected to the same physical switches as each other, with the external network switch and NAT network physical switch being separate from each other. (The two physical adapters on the vSwitch0 are both connected to the same physical switch).

Does anyone have any idea, or would you like to see some logs etc?

Thanks in advance

Colin

0 Kudos
2 Replies
Rumple
Virtuoso
Virtuoso

are you using the vmxnet3 network adapter? Have you tried switching to a vmxnet2 or flexible network adapters to see if that makes any difference?

0 Kudos
Myst
Contributor
Contributor

On the Shorewall server, the network adaptor is currently set to Flexable -- I will try switching it to a vmxnet3 adapter and see if we still get the same problem (I'll message back once we have done the work -- its a live system so just need to tell everyone its going down)

Col

0 Kudos