IP Routing issue on ESXi that promisc mode fixes

apenney · ‎12-12-2009

I am having a problem. I have a dedicated box from who are a kind of budget ISP in many ways having restricted ways of doing stuff. They give you these 'IP Failover' addresses that supposedly 'route to $primaryip'. I have no idea what that means BUT, here's the issue:

Overview:

94.23.244.220 - physical box, running ESXi v4u1

Virtual MAC, vm name, ip, product

00:50:56:06:43:9d brie 94.23.159.62 vmware

00:50:56:07:f8:43 cara 94.23.159.64 vmware

These are what OVH refer to as 'failover IPs' and they claim they 'route to 94.23.244.220.

I built a vm, brie, and it uses 94.23.156.62 (with a gateway of 94.23.156.254) and everything works fine.

I built a vm, cara, and it uses 94.23.156.64 (same gateway) and it cannot contact the network at all.

From CARA I can ping BRIE and get a response. That places this into BRIE's arp cache:

94-23-159-64.kimsufi.com (94.23.159.64) at 00:50:56:07:f8:43 on eth0

From BRIE (The working box) I cannot ping CARA.

Here's the strange thing:

If I put the "VM Network" on the vSwitch into promisc mode and then I put the network card on cara into promisc mode all traffic works fine. Cara can resolve DNS and all the good stuff. The weirder thing is that by sniffing on Brie I can see that the traffic seems to come in find and arp requests are answered correctly. How can I resolve this so things route properly?

Edit: fixed.

J1mbo · ‎12-12-2009

Are there typo's in this:

00:50:56:06:43:9d brie 94.23.159.62 vmware

00:50:56:07:f8:43 nickp 94.23.159.64 vmware

00:50:56:09:03:92 cara 94.23.158.76 vmware on different subnet so would need a route somewhere

...

I built a vm, brie, and it uses 94.23.156.62 agrees to above

I built a vm, cara, and it uses 94.23.156.64 stated as nickp above

...

From CARA I can ping BRIE and get a response. That places this into BRIE's arp cache:

94-23-159-64.kimsufi.com (94.23.159.64) at 00:50:56:07:f8:43 presumably this was pinging nikcp, and appears correct

...

From BRIE (The working box) I cannot ping CARA expected if there's no route to 94.23.158.76.

HTH

apenney · ‎12-12-2009

Argh,

The one that's labelled 'nickp' is really 'cara', I meant to delete the third one. There's only two vm's, not three, so just ignore the third one as a mistake.

J1mbo · ‎12-12-2009

Ah OK.

I wonder if whatever is upstream (i.e. next hop out) has learnt 00:50:56:06:43:9d and is sticking with it; in other words the IP side of things might be working but return packets could be directed to that MAC address, hence working when in p-mode.

Thanks

apenney · ‎12-12-2009

That makes some sense. Is there any way to confirm this? I would (think) if I sniffed the traffic on vmnic0 would show me incoming packets with the MAC address of the other VM? When I ran tcpdump on brie and wireshark on cara all the packets appeared to have the correct destination MAC listed in the packet headers.

Edit:

Running tcpdump-uw -n -i vmk0 | grep 94.23.159 returns me something .. disturbing:

21:29:01.422893 IP truncated-ip - 3 bytes missing! 8.8.8.8.53 > 94.23.159.64.51058: 50642 1/0/0 A[domain]

21:29:01.594041 IP truncated-ip - 54 bytes missing! 8.8.8.8.53 > 94.23.159.64.53095: 6434 3/0/0 CNAME[domain]

If I look for 94.23.159.62 (my working VM) specifically I don't see anything in tcpdump while generating traffic.

apenney · ‎12-12-2009

False lead, I needed '-s 0' in tcpdump to get it to work, sniffing again I see this:

21:41:26.905929 00:24:c3:84:04:00 > 00:30:48:be:34:a0, ethertype IPv4 (0x0800), length 88: 8.8.8.8.53 > 94.23.159.64.58890: 19361 1/0/0 A 207.246.126.10 (46)

21:41:30.906353 00:24:c3:84:04:00 > 00:30:48:be:34:a0, ethertype IPv4 (0x0800), length 88: 8.8.8.8.53 > 94.23.159.64.58890: 19361 1/0/0 A 207.246.126.10 (46)

00:30:48:be:34:a0 is the MAC of the ESXi server. So it seems that outbound packets from 94.23.159.64 go out with the mac of the esxi server itself and are then returned to that same port. I guess I need esxi to rewrite incoming stuff to 94.23.159.64 to the mac address of the VM. Does that make any sense? Is my provider crazy with broken switches?

J1mbo · ‎12-12-2009

Is the connection expecting to be directly connected to a device, rather than a switch?

apenney · ‎12-12-2009

I don't think so. I'm not sure I fully understand, the physical box is definitely supposed to be plugged into a switch. I have discovered another guy with a similar problem (but a total different set of circumstances):

Only when he set the vswitch to promisc (via setting a vm to promisc, it requires both to go promisc) did his traffic start flowing too. I almost wish there was a way to just make the vSwitch act like a vHub for now.

All

IP Routing issue on ESXi that promisc mode fixes