Hi,
I have a single ESXi host upgraded to 5.5. On the host I'm running a router VM with a small Linux Debian 6 with shorewall. The last configuration change was in Dec. 2012.
In the time between the host and the VM were rebooted several times without any problems. PPTP connections worked fine.
After I upgraded the host to 5.5 the clients hang while starting a PPTP connection at the point "verifying username and password". I have already rebootet the host, the VM and the DSL modem, without success.
I also restored a old backup of the router VM... now I have installed a new VM with a minimal Setup, still no luck.
Can anyone confirm this?
edit: I have copied the VM to another standalone host running ESXi 5.1 Update 1. PPTP connections are working fine now. So the only difference is the ESXi version 5.1<->5.5. Any idea?
I don't have a solution, but packet captures indicate that GRE packets are not being passed to the VM.
Yes... the next steps are to use a Network scanner so we can see what's going on.
But I'm interested if some other people here have the same problem.
My situation is a little different. I have a virtual router in my ESXi environment that's not receiving the GRE packets to pass on. I see them exiting the upstream switch interface but never making it to the virtual machine's NIC.
Same problem with TMG. As soon as it is running on 5.1 everything is fine. But after being moved to 5.5 all the clients can't establish pptp connection with error 628.
Opened Support Request # 13384393710
Wow! Thanks!
I'm so happy that I'm not the only one with this problem... so it seems to be a general problem with ESXi 5.5.
Same here pptp fails with 628. VM can not receive the encapsulated PPP packets.
The support's answer was:
1. Can we have the customer try vmxnet3 vNIC?
2. The MTU of pNICs was set to 9000, can we make sure the path MTU is also 9000 in the customer's environment?
3. Please capture the packet traces at below places when this issue happens:
a) at the peer device of the PPTP connection.
b) at the physical switch port which connects to the vmnic which is used by the affected vm.
c) at a vmknic which is in the same port group as the affected vm.
d) inside the affected vm.
What vNIC are you using, E1000?
yes, E1000 with MTU 1500 on all components.
At this time, I hadn't time to capture the packet traces... sorry...
Don't bother with packet traces unless you have SR open. I was just quoting.
We had this same problem after upgrading to 5.5. Technical support couldn't work out what the problem was and just tried to pin it on the VPN server even though I'd setup the same Microsoft Routing and Remote Access on a physical server we luckily have as a catalogue server and re-NAT'd it and the VPN connected fine again. The only thing I could find that could have affected it in this way was the below article on how the new NSX networking uses GRE which I believe could be misrouting the traffic.
vSphere 5.5 Improvements Part 8 - Network Virtualization with NSX
I've changed E1000 to VMXNet3 on a single host and PPTP started working. Tonight gonna switch all the other VMs.
This have to be only a workround. With E1000 it have to work, too!
In my case, I'm running a lot of pfSense VMs. VMXNET3 is not recommended for pfSense because there are too many issues. So changing from E1000 to VMXNET3 is not an option for me.
But thanks for that workaround.
Test Server running on esxi 5.5 (pfsense - 192.168.1.2)
Test Client running on esxi 5.5 (windows 7 - 10.0.0.2)
When i try to make vpn connection from 10.0.0.2 to 192.168.1.2 vpn fails with 628 (both ip was public.i've changed them for security purpose)
ESXi 5.5 Dump :
[2.1-RELEASE][root@pfSense.localdomain]/(1): tcpdump -i em0 -n proto 47
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
12:16:43.874640 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 0, length 54: LCP, Conf-Request (0x01), id 13, length 40
12:16:45.881637 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 1, length 54: LCP, Conf-Request (0x01), id 14, length 40
12:16:45.960797 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 2, length 37: LCP, Conf-Request (0x01), id 1, length 23
12:16:45.961107 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 2, ack 2, length 27: LCP, Conf-Reject (0x04), id 1, length 9
12:16:47.882007 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 3, length 54: LCP, Conf-Request (0x01), id 15, length 40
12:16:49.003177 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 5, length 34: LCP, Conf-Request (0x01), id 3, length 20
12:16:49.003550 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 4, ack 5, length 38: LCP, Conf-Ack (0x02), id 3, length 20
12:16:49.882132 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 5, length 54: LCP, Conf-Request (0x01), id 16, length 40
12:16:51.891305 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 6, length 54: LCP, Conf-Request (0x01), id 17, length 40
12:16:53.901111 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 7, length 54: LCP, Conf-Request (0x01), id 18, length 40
12:16:55.911158 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 8, length 54: LCP, Conf-Request (0x01), id 19, length 40
12:16:57.920925 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 9, length 54: LCP, Conf-Request (0x01), id 20, length 40
12:16:59.142864 IP 10.0.0.2 > 192.168.1.2: GREv1, call 8351, seq 10, length 32: LCP, Term-Request (0x05), id 5, length 18
12:16:59.143346 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 10, ack 10, length 24: LCP, Term-Ack (0x06), id 21, length 6
12:16:59.933333 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 11, length 54: LCP, Conf-Request (0x01), id 22, length 40
12:17:01.942955 IP 192.168.1.2 > 10.0.0.2: GREv1, call 58405, seq 12, length 54: LCP, Conf-Request (0x01), id 23, length 40
Client disconnected with 628 and no more gre packet.
I've removed Server VM from esxi 5.5 inventory and added to esxi 5.1 inventory. Vpn connected without any problem. Client VM still on esxi 5.5.
ESXi 5.1 dump:
[2.1-RELEASE][root@pfSense.localdomain]/(1): tcpdump -i em0 -n proto 47
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
12:23:53.876265 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 0, length 54: LCP, Conf-Request (0x01), id 1, length 40
12:23:53.900049 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 0, ack 0, length 41: LCP, Conf-Request (0x01), id 0, length 23
12:23:53.900369 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 1, ack 0, length 27: LCP, Conf-Reject (0x04), id 0, length 9
12:23:53.901002 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 1, ack 1, length 38: LCP, Conf-Request (0x01), id 1, length 20
12:23:53.901285 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 2, ack 1, length 38: LCP, Conf-Ack (0x02), id 1, length 20
12:23:53.976196 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, ack 2, no-payload, length 12
12:23:55.884549 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 3, length 54: LCP, Conf-Request (0x01), id 2, length 40
12:23:55.885880 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 2, ack 3, length 39: LCP, Conf-Reject (0x04), id 2, length 21
12:23:55.886203 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 4, ack 2, length 43: LCP, Conf-Request (0x01), id 3, length 25
12:23:55.887259 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 3, ack 4, length 43: LCP, Conf-Ack (0x02), id 3, length 25
12:23:55.887583 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 5, ack 3, length 41: CHAP, Challenge (0x01), id 1, Value bb1e68abc0ad367cb6bbb24fa81dac18, Name
12:23:55.887654 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 4, length 32: LCP, Ident (0x0c), id 2, length 20
12:23:55.887828 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 5, length 40: LCP, Ident (0x0c), id 3, length 28
12:23:55.888034 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 6, length 38: LCP, Ident (0x0c), id 4, length 26
12:23:55.891684 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 7, ack 5, length 75: CHAP, Response (0x02), id 1, Value bcc9cb081598c65e7ae13179a596f70e00000000000000004698cc5e42f84a2c27f13dabb21737[|chap]
12:23:55.893023 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 6, ack 7, length 66: CHAP, Success (0x03), id 1, Msg S=2CAE6B1D565A377A1191EDF4B0B5504DBA62[|chap]
12:23:55.894117 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 7, length 30: IPCP, Conf-Request (0x01), id 1, length 18
12:23:55.894578 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 8, length 24: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 1, length 12
12:23:55.897595 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 8, ack 8, length 32: IP6CP, Conf-Request (0x01), id 5, length 16
12:23:55.897797 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 9, ack 8, length 40: LCP, Prot-Reject (0x08), id 1, length 22
12:23:55.897839 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 9, length 24: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 6, length 12
12:23:55.898090 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 10, ack 9, length 28: unknown ctrl-proto (0x80fd), Conf-Ack (0x02), id 6, length 12
12:23:55.898304 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 10, ack 9, length 52: IPCP, Conf-Request (0x01), id 7, length 36
12:23:55.898620 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 11, ack 10, length 40: IPCP, Conf-Reject (0x04), id 7, length 24
12:23:55.898848 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 11, ack 10, length 28: IPCP, Conf-Reject (0x04), id 1, length 12
12:23:55.899032 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 12, ack 11, length 28: IPCP, Conf-Request (0x01), id 2, length 12
12:23:55.899285 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 12, ack 11, length 28: unknown ctrl-proto (0x80fd), Conf-Nack (0x03), id 1, length 12
12:23:55.899491 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 13, ack 12, length 28: unknown ctrl-proto (0x80fd), Conf-Request (0x01), id 2, length 12
12:23:55.900282 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 13, ack 13, length 34: IPCP, Conf-Request (0x01), id 8, length 18
12:23:55.900548 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 14, ack 13, length 34: IPCP, Conf-Nack (0x03), id 8, length 18
12:23:55.900598 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 14, length 24: IPCP, Conf-Ack (0x02), id 2, length 12
12:23:55.900838 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 15, length 24: unknown ctrl-proto (0x80fd), Conf-Ack (0x02), id 2, length 12
12:23:55.901489 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 16, ack 14, length 34: IPCP, Conf-Request (0x01), id 9, length 18
12:23:55.901905 IP 192.168.1.2 > 10.0.0.2: GREv1, call 47596, seq 15, ack 16, length 34: IPCP, Conf-Ack (0x02), id 9, length 18
12:23:55.937740 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 17, ack 15, length 61: compressed PPP data
12:23:55.952221 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 18, length 88: compressed PPP data
12:23:55.953861 IP 10.0.0.2 > 192.168.1.2: GREv1, call 43958, seq 19, length 73: compressed PPP data
Client connected and working...
Oh, I see that you also use pfSense. Can you change the NIC to vmxnet3?
Another Topic when changing to vmxnet3 is that you may have poor TCP perfomance (Release Notes from 4.1: Poor TCP performance can occur in traffic-forwarding virtual machines with LRO enabled). This also occurs on ESXi 5.0 and 5.1. You had to disable LRO by set Net.VmxnetSwLROSL, Net.Vmxnet3SwLRO, Net.Vmxnet3HwLRO, Net.Vmxnet2SwLRO and Net.Vmxnet2HwLRO to 0.
Does this also apply to ESXi 5.5?
after spending 4 hours, pfsense 2.1 can now use vmxnet3 interface without any problem and pptp vpn works. it seems E1000 interface depricated.
Hi Aunlu,
Thank you for updates,
I'm Asif from vmware support team, would like to capture support logs and tcpdump traces from your setup to further investigate on it, Would appreciate if you could spend some time on this
Please let me know how I can connect with you to discuss more on this.
Thanks
Asif.
FYI, there is a new VMware KB article.
Point-to-Point Tunneling Protocol (PPTP) connections may not work on ESXi 5.5
In my case switching from E1000 to vmxnet3 didn't fix my problem. I have the exact same symptoms (PPTP works on 5.1, doesn't work on 5.5) although in my case the server is initiating PPTP connections.