VMware Cloud Community
TryllZ
Expert
Expert

ESXi login page loads quite slow across private network

Hi,

I have 2 networks, 192.168.28.0 (in VMware Workstation) and 10.0.64.0 (on a physical Dell Server). Both networks have 3 ESXi and 3 Windows Server in each. Both are behind their respective firewalls, the network between firewalls is 192.168.1.0.

I'm trying trying to access 10.0.64.0 network's ESXi login page from a Windows Server in 192.168.28.0, and vice versa and they seem to be just too slow to load. The following is the loading pages in seconds in each VM.

Loading 10.0.64.74 login page on:

192.168.28.40 - 80 secs
192.168.28.41 - 110 secs
192.168.28.43 - 110 secs

Loading 192.168.28.74 login page on:

10.0.64.40 - 110 secs
10.0.64.41 - 90 secs
10.0.64.43 - 4 secs

The only difference between the last one with 4 secs and other VM is that this one has just 1 vNIC while the rest have multiple vNIC.

A trace route to see the route packets take.

ESXi Server to Windows Server

[root@esxi1s:~] traceroute 192.168.28.40
traceroute to 192.168.28.40 (192.168.28.40), 30 hops max, 40 byte packets
1 10.0.64.67 (10.0.64.67) 0.796 ms 0.869 ms 0.685 ms
2 192.168.1.21 (192.168.1.21) 2.187 ms 2.747 ms 2.874 ms
3 servermdc (192.168.28.40) 3.475 ms 3.752 ms 3.408 ms

Windows Server to ESXi Server

tracert 10.0.64.74
Tracing route to esxi1s.vlab.lab [10.0.64.74]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms 192.168.28.35
2 3 ms 3 ms 3 ms 192.168.1.31
3 3 ms 3 ms 3 ms esxi1s.vlab.lab [10.0.64.74]

Of note is that the laptop does not have an Ethernet NIC, I'm using a USB 2.0 to 10/100 Ethernet LAN, while the Server has a 10/100/1000 Gigabit Ethernet LAN.

I pinged between networks for 60 sec to see if any connectivity issue exists, but pinging seems fine even with large packets.

Ping from Windows to ESXi Server

ping /n 5000 /l 1500 10.0.64.74

Pinging 10.0.64.74 with 1500 bytes of data:
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=3ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
Reply from 10.0.64.74: bytes=1500 time=4ms TTL=62
* not displaying all pings due to character number limitation *
Ping statistics for 10.0.64.74:
Packets: Sent = 50, Received = 50, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 3ms, Maximum = 4ms, Average = 3ms

Ping from Windows to ESXi Server

ping /n 5000 /l 1500 192.168.28.40

Pinging 192.168.28.40 with 1500 bytes of data:
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=5ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=5ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
Reply from 192.168.28.40: bytes=1500 time=4ms TTL=126
* not displaying all pings due to character number limitation *
Ping statistics for 192.168.28.40:
Packets: Sent = 63, Received = 63, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 3ms, Maximum = 5ms, Average = 4ms

I did a tcpdump on NIC of firewall VMs on both Laptop and Dell Server to see if anything is traversing delaying the packets when I use web browser. I got the below which I didn't understand what it is as there was nothing no network traffic initiated by me.

The below is just a small chunk of data, there is a lot of these packets flowing very rapidly and I don't why

The IP 192.168.1.25 is the laptop's NIC IP address.

Interface of Firewall VM on Laptop

root@firewallsm:~ # tcpdump -i em0 host 192.168.1.21 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 262144 bytes
18:34:53.051214 IP 192.168.1.21.22 > 192.168.1.25.59346: Flags [P.], seq 4218103046:4218103234, ack 1055845067, win 513, length 188
18:34:53.051351 IP 192.168.1.25.59346 > 192.168.1.21.22: Flags [.], ack 188, win 4101, length 0
18:34:53.051722 IP 192.168.1.21.22 > 192.168.1.25.59346: Flags [P.], seq 188:456, ack 1, win 513, length 268
18:34:53.052019 IP 192.168.1.21.22 > 192.168.1.25.59346: Flags [P.], seq 456:604, ack 1, win 513, length 148
18:34:53.052179 IP 192.168.1.25.59346 > 192.168.1.21.22: Flags [.], ack 604, win 4106, length 0
18:34:53.052520 IP 192.168.1.21.22 > 192.168.1.25.59346: Flags [P.], seq 604:848, ack 1, win 513, length 244
18:34:53.052785 IP 192.168.1.21.22 > 192.168.1.25.59346: Flags [P.], seq 848:996, ack 1, win 513, length 148

Interface of Firewall VM on Dell Server

root@firewallsm:~ # tcpdump -i vmx0 host 192.168.1.31 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmx0, link-type EN10MB (Ethernet), capture size 262144 bytes
18:36:42.419721 IP 192.168.1.31.22 > 192.168.1.25.59352: Flags [P.], seq 3277660158:3277660346, ack 306397935, win 513, length 188
18:36:42.420237 IP 192.168.1.31.22 > 192.168.1.25.59352: Flags [P.], seq 188:360, ack 1, win 513, length 172
18:36:42.420571 IP 192.168.1.31.22 > 192.168.1.25.59352: Flags [P.], seq 360:508, ack 1, win 513, length 148
18:36:42.420885 IP 192.168.1.31.22 > 192.168.1.25.59352: Flags [P.], seq 508:656, ack 1, win 513, length 148
18:36:42.421206 IP 192.168.1.31.22 > 192.168.1.25.59352: Flags [P.], seq 656:804, ack 1, win 513, length 148

I can see ack from the Syn-Ack TCP Hand-Shake. But why so many and is it possible these are causing delays in loading the webpages as its just too many of them ?.

Labels (1)
  • ..

0 Kudos
13 Replies
jburen
Expert
Expert

There is a lot of info in your post but for me, it is not clear how the networks are connected and where those networks live. You talk about 192.168.28.0 in VMware Workstation but what kind of network is that? And how did you place three ESXi hosts on that Dell server? Maybe you can draw up a simple image?

 

Consider giving Kudos if you think my response helped you in any way.
0 Kudos
TryllZ
Expert
Expert

I added in all that I could test to find what the issue, apologies, I understand its too much text.

The following is how the networks are connected end-to-end.

The ESXis are running as normal VMs on the physical Dell/ESXi Server (paravirtualization), I understand its not supported by VMware, this is just a test bed.

0 Kudos
jburen
Expert
Expert

There are so many variables involved that answering your question is almost impossible. But you already have a direction: the ESXi host with one vmnic is faster so look into your IP configuration of your hosts. And I would make the subnetting less complex. So use a full class c subnet for every network instead of subnetting that class c even further with a 27 bit subnetmask. Also look at the firewalls that are between the two.

Consider giving Kudos if you think my response helped you in any way.
0 Kudos
TryllZ
Expert
Expert

"the ESXi host with one vmnic is faster so look into your IP configuration of your hosts"

I'm sorry but you are misunderstood, there is no ESXi that has 1 vNIC, its a Windows VM (testVm on Dell Server) that has just 1 vNIC, everything else has more than 2 vNIC.

I even tried the testVm on Workstation to have just 1 vNIC to see if thats the issue, but its not.

0 Kudos
jburen
Expert
Expert

I'm sorry but solving this remotely is impossible. Maybe start with a simple setup: Windows client VM and ESXi host in the same network. Then test page loads. Then split them into separate networks (but on the same "side") and test again. The third test is with your current setup. Then hopefully you can determine the source of your issue. I think in the end the issue has nothing to do with the software but with the configuration of your networks, firewalls, and tcpip stacks.

 

Consider giving Kudos if you think my response helped you in any way.
0 Kudos
TryllZ
Expert
Expert

Thanks @jburen,

I understand, and can, and have done simple setups and am past it now.

I have done setting up Windows and ESXi on same and different networks in Workstation and they work fine, next step in Workstation was vCenter with Distributed Switch, have setup that up as well completely. Due to laptop's RAM limitation I could not do more on Workstation.

This is the next step for me, as for trying to solve it remotely I can setup a TeamViewer, if its feasible for you to go through.

Appreciate your help.

0 Kudos
TryllZ
Expert
Expert

Seemingly I found the source of the problem by capturing packets and analyzing in WireShark.

There seem to be too many Out-Of-Order and Retransmission TCP packets, any one having any idea as to why.

PCAP File - https://we.tl/t-FaqwAYkBNJ

0 Kudos
TryllZ
Expert
Expert

Also, just a thought..

Could the retransmissions be due to the Server NIC supporting 10/100/1000Gbps while the laptop NIC supporting only 10/100Mbps causing a bottleneck ?!

0 Kudos
jburen
Expert
Expert

I`m not sure if the speed difference plays a role in this situation. TCP Retransmission occurs if a packet gets dropped but the reasons for this can vary. Is it possible to use a 1 Gb nic in your laptop? Or switch the server to 100 Mb?

What I did see was that your servers have two IP addresses:

10.0.64.138 servernfs.vlab.lab
10.0.64.148 servernfs.vlab.lab
10.0.64.43 servernfs.vlab.lab
10.0.64.53 servernfs.vlab.lab
192.168.28.40 servermdc.vlab.lab
192.168.28.50 servermdc.vlab.lab
192.168.28.72 esxi1w.vlab.lab
192.168.28.82 esxi1w.vlab.lab

Did you try to connect to a server using the IP address? Or do you use the DNS name? Why are you using multiple IP addresses for a server?

 

Consider giving Kudos if you think my response helped you in any way.
0 Kudos
TryllZ
Expert
Expert

Thanks,

I'm afraid its not possible to swap NICs, the server NIC is fixed while the laptop's NIC is a USB-to-Ethernet.

The 2 IP addresses are for redundancy, each IP is connected to a different firewall VM, then theres NFS interfaces and management interfaces.

10.0.64.138 servernfs.vlab.lab - Management - Connected to firewall1
10.0.64.148 servernfs.vlab.lab - Management - Connected to firewall2
10.0.64.43 servernfs.vlab.lab - NFS interface - Connected to firewall1
10.0.64.53 servernfs.vlab.lab - NFS interface - Connected to firewall2

I tried both IP and DNS names, either way it resolved too slow.

0 Kudos
jburen
Expert
Expert

Even though the server nic is fixed you could probably set it to 100Mbps. But what is the speed when you connect to the GUI from a VM on the same server? So stay inside your laptop or server (in the same subnet) and see if the connection is still slow.

I really think it is the way you connected the whole infrastructure together. First, get things to work, then add stuff like redundancy. And use simple IP subnets. So I would use 192.168.28.0/24 for the workstation VMs, 192.168.38.0/24 for the workstation ESXi hosts, 10.0.64.0/24 for the server VMs, and 10.0.65.0/24 for the server ESXi hosts. Don't make it complex when it is not necessary.

And when you do add redundancy, I would use a team instead of two separate configured nics. So a team with two physical nics but configured with one IP address. Or use separate subnets. I think that your current configuration creates some sort of loop where it is not clear where a specific IP address is located. It is way too complex.

 

Consider giving Kudos if you think my response helped you in any way.
0 Kudos
TryllZ
Expert
Expert

Even though the server nic is fixed you could probably set it to 100Mbps. - I'll set the server NIC to 100Mbps and observe.

But what is the speed when you connect to the GUI from a VM on the same server? So stay inside your laptop or server (in the same subnet) and see if the connection is still slow. - I can understand it to be slow for the 10.0.64.0 network as the DNS is in the 192.168.28.0 network. But its slow even when an ESXi in 192.168.28.0 network is being accessed by a VM in the 192.168.28.32 network (same side) and the DNS is in the same side as well. Anyway, I have noticed the delay is only when loading pages, not once I have logged in to the ESXi servers. When logged in they behave normally, no delays. I'm assuming for now it has to do with the firewall rules.

I agree it shouldn't be complex when not needed, I have done simple design and it works fine in a basic setup, with firewalls. I have moved it up a notch.

Thanks for the NIC-teaming advice.

0 Kudos
jburen
Expert
Expert

So when it works fine in a basic setup, check the changes that you made afterward and see if you can revert those changes.

 

Consider giving Kudos if you think my response helped you in any way.
0 Kudos