VMware Cloud Community
ksattler
Enthusiast
Enthusiast

After upgrade single ESXi host to 5.5 , PPTP connections not working anymore

Hi,

I have a single ESXi host upgraded to 5.5. On the host I'm running a router VM with a small Linux Debian 6 with shorewall. The last configuration change was in Dec. 2012.

In the time between the host and the VM were rebooted several times without any problems. PPTP connections worked fine.

After I upgraded the host to 5.5 the clients hang while starting a PPTP connection at the point "verifying username and password". I have already rebootet the host, the VM and the DSL modem, without success.

I also restored a old backup of the router VM... now I have installed a new VM with a minimal Setup, still no luck.

Can anyone confirm this?

edit: I have copied the VM to another standalone host running ESXi 5.1 Update 1. PPTP connections are working fine now. So the only difference is the ESXi version 5.1<->5.5. Any idea?

24 Replies
jshupe
Contributor
Contributor

I don't have a solution, but packet captures indicate that GRE packets are not being passed to the VM.

Reply
0 Kudos
ksattler
Enthusiast
Enthusiast

Yes... the next steps are to use a Network scanner so we can see what's going on.

But I'm interested if some other people here have the same problem.

Reply
0 Kudos
jshupe
Contributor
Contributor

My situation is a little different. I have a virtual router in my ESXi environment that's not receiving the GRE packets to pass on. I see them exiting the upstream switch interface but never making it to the virtual machine's NIC.

Reply
0 Kudos
djet
Enthusiast
Enthusiast

Same problem with TMG. As soon as it is running on 5.1 everything is fine. But after being moved to 5.5 all the clients can't establish pptp connection with error 628.

Opened Support Request # 13384393710

ksattler
Enthusiast
Enthusiast

Wow! Thanks!

I'm so happy that I'm not the only one with this problem... so it seems to be a general problem with ESXi 5.5.

Reply
0 Kudos
aunlu
Contributor
Contributor

Same here pptp fails with 628. VM can not receive the encapsulated PPP packets.

Reply
0 Kudos
djet
Enthusiast
Enthusiast

The support's answer was:

1. Can we have the customer try vmxnet3 vNIC?

2. The MTU of pNICs was set to 9000, can we make sure the path MTU is also 9000 in the customer's environment?

3. Please capture the packet traces at below places when this issue happens:

a) at the peer device of the PPTP connection.

b) at the physical switch port which connects to the vmnic which is used by the affected vm.

c) at a vmknic which is in the same port group as the affected vm.

d) inside the affected vm.

What vNIC are you using, E1000?

Reply
0 Kudos
ksattler
Enthusiast
Enthusiast

yes, E1000 with MTU 1500 on all components.

At this time, I hadn't time to capture the packet traces... sorry...

Reply
0 Kudos
djet
Enthusiast
Enthusiast

Don't bother with packet traces unless you have SR open. I was just quoting.

Reply
0 Kudos
CLS06
Contributor
Contributor

We had this same problem after upgrading to 5.5. Technical support couldn't work out what the problem was and just tried to pin it on the VPN server even though I'd setup the same Microsoft Routing and Remote Access on a physical server we luckily have as a catalogue server and re-NAT'd it and the VPN connected fine again. The only thing I could find that could have affected it in this way was the below article on how the new NSX networking uses GRE which I believe could be misrouting the traffic.

vSphere 5.5 Improvements Part 8 - Network Virtualization with NSX

Reply
0 Kudos
djet
Enthusiast
Enthusiast

I've changed E1000 to VMXNet3 on a single host and PPTP started working. Tonight gonna switch all the other VMs.

ksattler
Enthusiast
Enthusiast

This have to be only a workround. With E1000 it have to work, too!

In my case, I'm running a lot of pfSense VMs. VMXNET3 is not recommended for pfSense because there are too many issues. So changing from E1000 to VMXNET3 is not an option for me.

But thanks for that workaround.

Reply
0 Kudos
aunlu
Contributor
Contributor

Test Server running on esxi 5.5 (pfsense - 192.168.1.2)

Test Client running on esxi 5.5 (windows 7 - 10.0.0.2)

When i try to make vpn connection from 10.0.0.2 to 192.168.1.2 vpn fails with 628  (both ip was public.i've changed them for security purpose)

ESXi 5.5 Dump :

[2.1-RELEASE][root@pfSense.localdomain]/(1): tcpdump -i em0 -n proto 47

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes

12:16:43.874640 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 0, length 54: LCP, Conf-Request (0x01), id 13, length 40

12:16:45.881637 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 1, length 54: LCP, Conf-Request (0x01), id 14, length 40

12:16:45.960797 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 2, length 37: LCP, Conf-Request (0x01), id 1, length 23

12:16:45.961107 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 2, ack 2, length 27: LCP, Conf-Reject (0x04), id 1, length 9

12:16:47.882007 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 3, length 54: LCP, Conf-Request (0x01), id 15, length 40

12:16:49.003177 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 5, length 34: LCP, Conf-Request (0x01), id 3, length 20

12:16:49.003550 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 4, ack 5, length 38: LCP, Conf-Ack (0x02), id 3, length 20

12:16:49.882132 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 5, length 54: LCP, Conf-Request (0x01), id 16, length 40

12:16:51.891305 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 6, length 54: LCP, Conf-Request (0x01), id 17, length 40

12:16:53.901111 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 7, length 54: LCP, Conf-Request (0x01), id 18, length 40

12:16:55.911158 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 8, length 54: LCP, Conf-Request (0x01), id 19, length 40

12:16:57.920925 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 9, length 54: LCP, Conf-Request (0x01), id 20, length 40

12:16:59.142864 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 10, length 32: LCP, Term-Request (0x05), id 5, length 18

12:16:59.143346 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 10, ack 10, length 24: LCP, Term-Ack (0x06), id 21, length 6

12:16:59.933333 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 11, length 54: LCP, Conf-Request (0x01), id 22, length 40

12:17:01.942955 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 12, length 54: LCP, Conf-Request (0x01), id 23, length 40

Client disconnected with 628 and no more gre packet.

I've removed Server VM from esxi 5.5 inventory and added to esxi 5.1 inventory. Vpn connected without any problem. Client VM still on esxi 5.5.

ESXi 5.1 dump:

[2.1-RELEASE][root@pfSense.localdomain]/(1): tcpdump -i em0 -n proto 47

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes

12:23:53.876265 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 0, length 54: LCP, Conf-Request (0x01), id 1, length 40

12:23:53.900049 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 0, ack 0, length 41: LCP, Conf-Request (0x01), id 0, length 23

12:23:53.900369 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 1, ack 0, length 27: LCP, Conf-Reject (0x04), id 0, length 9

12:23:53.901002 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 1, ack 1, length 38: LCP, Conf-Request (0x01), id 1, length 20

12:23:53.901285 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 2, ack 1, length 38: LCP, Conf-Ack (0x02), id 1, length 20

12:23:53.976196 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, ack 2, no-payload, length 12

12:23:55.884549 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 3, length 54: LCP, Conf-Request (0x01), id 2, length 40

12:23:55.885880 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 2, ack 3, length 39: LCP, Conf-Reject (0x04), id 2, length 21

12:23:55.886203 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 4, ack 2, length 43: LCP, Conf-Request (0x01), id 3, length 25

12:23:55.887259 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 3, ack 4, length 43: LCP, Conf-Ack (0x02), id 3, length 25

12:23:55.887583 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 5, ack 3, length 41: CHAP, Challenge (0x01), id 1, Value bb1e68abc0ad367cb6bbb24fa81dac18, Name

12:23:55.887654 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 4, length 32: LCP, Ident (0x0c), id 2, length 20

12:23:55.887828 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 5, length 40: LCP, Ident (0x0c), id 3, length 28

12:23:55.888034 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 6, length 38: LCP, Ident (0x0c), id 4, length 26

12:23:55.891684 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 7, ack 5, length 75: CHAP, Response (0x02), id 1, Value bcc9cb081598c65e7ae13179a596f70e00000000000000004698cc5e42f84a2c27f13dabb21737[|chap]

12:23:55.893023 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 6, ack 7, length 66: CHAP, Success (0x03), id 1, Msg S=2CAE6B1D565A377A1191EDF4B0B5504DBA62[|chap]

12:23:55.894117 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 7, length 30: IPCP, Conf-Request (0x01), id 1, length 18

12:23:55.894578 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 8, length 24: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 1, length 12

12:23:55.897595 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 8, ack 8, length 32: IP6CP, Conf-Request (0x01), id 5, length 16

12:23:55.897797 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 9, ack 8, length 40: LCP, Prot-Reject (0x08), id 1, length 22

12:23:55.897839 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 9, length 24: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 6, length 12

12:23:55.898090 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 10, ack 9, length 28: unknown ctrl-proto (0x80fd), Conf-Ack (0x02), id 6, length 12

12:23:55.898304 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 10, ack 9, length 52: IPCP, Conf-Request (0x01), id 7, length 36

12:23:55.898620 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 11, ack 10, length 40: IPCP, Conf-Reject (0x04), id 7, length 24

12:23:55.898848 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 11, ack 10, length 28: IPCP, Conf-Reject (0x04), id 1, length 12

12:23:55.899032 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 12, ack 11, length 28: IPCP, Conf-Request (0x01), id 2, length 12

12:23:55.899285 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 12, ack 11, length 28: unknown ctrl-proto (0x80fd), Conf-Nack (0x03), id 1, length 12

12:23:55.899491 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 13, ack 12, length 28: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 2, length 12

12:23:55.900282 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 13, ack 13, length 34: IPCP, Conf-Request (0x01), id 8, length 18

12:23:55.900548 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 14, ack 13, length 34: IPCP, Conf-Nack (0x03), id 8, length 18

12:23:55.900598 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 14, length 24: IPCP, Conf-Ack (0x02), id 2, length 12

12:23:55.900838 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 15, length 24: unknown ctrl-proto (0x80fd), Conf-Ack (0x02), id 2, length 12

12:23:55.901489 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 16, ack 14, length 34: IPCP, Conf-Request (0x01), id 9, length 18

12:23:55.901905 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 15, ack 16, length 34: IPCP, Conf-Ack (0x02), id 9, length 18

12:23:55.937740 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 17, ack 15, length 61: compressed PPP data

12:23:55.952221 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 18, length 88: compressed PPP data

12:23:55.953861 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 19, length 73: compressed PPP data

Client connected and working...

Reply
0 Kudos
ksattler
Enthusiast
Enthusiast

Oh, I see that you also use pfSense. Can you change the NIC to vmxnet3?

Another Topic when changing to vmxnet3 is that you may have poor TCP perfomance (Release Notes from 4.1: Poor TCP performance can occur in traffic-forwarding virtual machines with LRO enabled). This also occurs on ESXi 5.0 and 5.1. You had to disable LRO by set Net.VmxnetSwLROSL, Net.Vmxnet3SwLRO, Net.Vmxnet3HwLRO, Net.Vmxnet2SwLRO and Net.Vmxnet2HwLRO to 0.

Does this also apply to ESXi 5.5?

Reply
0 Kudos
aunlu
Contributor
Contributor

after spending 4 hours, pfsense 2.1 can now use vmxnet3 interface without any problem and pptp vpn works. it seems E1000 interface depricated.

asifahmed
Contributor
Contributor

Hi Aunlu,

Thank you for updates,

I'm Asif from vmware support team, would like to capture support logs and tcpdump traces from your setup to further investigate on it, Would appreciate if you could spend some time on this

Please let me know how I can connect with you to discuss more on this.

Thanks

Asif.

Reply
0 Kudos
de2rfg
Enthusiast
Enthusiast

Reply
0 Kudos
aunlu
Contributor
Contributor

Hi Asifahmed,

You can send email to ali.unlu@teknotel.com

Thanks,

Reply
0 Kudos
thejfk
Contributor
Contributor

In my case switching from E1000 to vmxnet3 didn't fix my problem. I have the exact same symptoms (PPTP works on 5.1, doesn't work on 5.5) although in my case the server is initiating PPTP connections.

Reply
0 Kudos