I am having a problem. I have a dedicated box from who are a kind of budget ISP in many ways having restricted ways of doing stuff. They give you these 'IP Failover' addresses that supposedly 'route to $primaryip'. I have no idea what that means BUT, here's the issue:
Overview:
94.23.244.220 - physical box, running ESXi v4u1
Virtual MAC, vm name, ip, product
00:50:56:06:43:9d brie 94.23.159.62 vmware
00:50:56:07:f8:43 cara 94.23.159.64 vmware
These are what OVH refer to as 'failover IPs' and they claim they 'route to 94.23.244.220.
I built a vm, brie, and it uses 94.23.156.62 (with a gateway of 94.23.156.254) and everything works fine.
I built a vm, cara, and it uses 94.23.156.64 (same gateway) and it cannot contact the network at all.
From CARA I can ping BRIE and get a response. That places this into BRIE's arp cache:
94-23-159-64.kimsufi.com (94.23.159.64) at 00:50:56:07:f8:43 on eth0
From BRIE (The working box) I cannot ping CARA.
Here's the strange thing:
If I put the "VM Network" on the vSwitch into promisc mode and then I put the network card on cara into promisc mode all traffic works fine. Cara can resolve DNS and all the good stuff. The weirder thing is that by sniffing on Brie I can see that the traffic seems to come in find and arp requests are answered correctly. How can I resolve this so things route properly?
Edit: fixed.
Are there typo's in this:
00:50:56:06:43:9d brie 94.23.159.62 vmware
00:50:56:07:f8:43 nickp 94.23.159.64 vmware
00:50:56:09:03:92 cara 94.23.158.76 vmware on different subnet so would need a route somewhere
...
I built a vm, brie, and it uses 94.23.156.62 agrees to above
I built a vm, cara, and it uses 94.23.156.64 stated as nickp above
...
From CARA I can ping BRIE and get a response. That places this into BRIE's arp cache:
94-23-159-64.kimsufi.com (94.23.159.64) at 00:50:56:07:f8:43 presumably this was pinging nikcp, and appears correct
...
From BRIE (The working box) I cannot ping CARA expected if there's no route to 94.23.158.76.
HTH
Argh,
The one that's labelled 'nickp' is really 'cara', I meant to delete the third one. There's only two vm's, not three, so just ignore the third one as a mistake.
Ah OK.
I wonder if whatever is upstream (i.e. next hop out) has learnt 00:50:56:06:43:9d and is sticking with it; in other words the IP side of things might be working but return packets could be directed to that MAC address, hence working when in p-mode.
Thanks
That makes some sense. Is there any way to confirm this? I would (think) if I sniffed the traffic on vmnic0 would show me incoming packets with the MAC address of the other VM? When I ran tcpdump on brie and wireshark on cara all the packets appeared to have the correct destination MAC listed in the packet headers.
Edit:
Running tcpdump-uw -n -i vmk0 | grep 94.23.159 returns me something .. disturbing:
21:29:01.422893 IP truncated-ip - 3 bytes missing! 8.8.8.8.53 > 94.23.159.64.51058: 50642 1/0/0 A[domain]
21:29:01.594041 IP truncated-ip - 54 bytes missing! 8.8.8.8.53 > 94.23.159.64.53095: 6434 3/0/0 CNAME[domain]
If I look for 94.23.159.62 (my working VM) specifically I don't see anything in tcpdump while generating traffic.
False lead, I needed '-s 0' in tcpdump to get it to work, sniffing again I see this:
21:41:26.905929 00:24:c3:84:04:00 > 00:30:48:be:34:a0, ethertype IPv4 (0x0800), length 88: 8.8.8.8.53 > 94.23.159.64.58890: 19361 1/0/0 A 207.246.126.10 (46)
21:41:30.906353 00:24:c3:84:04:00 > 00:30:48:be:34:a0, ethertype IPv4 (0x0800), length 88: 8.8.8.8.53 > 94.23.159.64.58890: 19361 1/0/0 A 207.246.126.10 (46)
00:30:48:be:34:a0 is the MAC of the ESXi server. So it seems that outbound packets from 94.23.159.64 go out with the mac of the esxi server itself and are then returned to that same port. I guess I need esxi to rewrite incoming stuff to 94.23.159.64 to the mac address of the VM. Does that make any sense? Is my provider crazy with broken switches?
Is the connection expecting to be directly connected to a device, rather than a switch?
I don't think so. I'm not sure I fully understand, the physical box is definitely supposed to be plugged into a switch. I have discovered another guy with a similar problem (but a total different set of circumstances):
Only when he set the vswitch to promisc (via setting a vm to promisc, it requires both to go promisc) did his traffic start flowing too. I almost wish there was a way to just make the vSwitch act like a vHub for now.