VMware Cloud Community
t3chsp0t
Contributor
Contributor
Jump to solution

inter esxi host ipv6 multicast traffic are not seen by the destination VM

Hi,

Disclaimer, I do not have in depth knowledge of vmware, so please excuse the wrong wording, misconception, and ignorance in the post below.

The current topology is as follows:

esxi network.png

Each esxi 4.1 update 3 host (DL380 G8) is connect to both layer 2 switch.

on each host, the vswitch has the two NICs configured as active/active, with the default NIC teaming method (route based on the originating virtual port ID).

Everything else is default.

The switches are connected via a trunk link (not stacked)

I have two Windows Server 2008 R2 VMs in the same subnet and have enabled ipv6 on them (default setting)

When the two VMs are on the same physical host, ping -6 destination_ipv6_address works (I just use the link local address)

When the two VMs are on different host, ping fails with "destination unreachable" message, which usually means neighbor discovery process fails (analogous to arp in ipv4 where source VM can't get the destination vm's mac address)

When the two VMs are on the same physical host, packet capture shows that neighbor solicitation message is sent over an ipv6 multicast address

When they are not on the same physical hosts, packet capture on the destination VM shows that the destination VM never gets the ipv6 multicast packet.

I then connect two DL380 G8 in a similar way to the switches, and install windows server 2008 R2 directly without virtualisation on them, and ping -6 works perfectly.

My questions are:

- Am I missing a configuration somewhere to allow ipv6 multicast to work? Or even to remove any "logic" and just treat it as a broadcast?

On network switches, you can do this by turning off IGMP which will then treat multicast as broadcast packets.

Though I can't find a similar setting under esxi anywhere.

- I saw an "enable ipv6" option on esxi, but i assume this is only useful if the host itself wants to participate in ipv6 and therefore not applicable to my particular case?

The only similar issue I found from searching is on the link below, which suggest hardcoding the neighbor table on the VMs, which is not ideal.

I can confirm though, hard coding the neighbor table on both VMs work. So problem seems to lie on how esxi vswitches handle ipv6 multicast traffic

ESX4 and multicast

Ideas, insights are greatly appreciated

Ed

1 Solution

Accepted Solutions
MKguy
Virtuoso
Virtuoso
Jump to solution

I'm not sure whether this will really solve your specific problem, but it's worth a try to update the NIC firmware and driver.

Seems like this is a HP NC 331FLR NIC (DL gen8 default 4-port NIC with the BCM5719 chip).

There's no update binary you can run from 4.1, but you can update all firmware components with the current HP Service Pack for Proliant image:

HP Service Pack for ProLiant

Or boot the server into a live-Linux of your choice and use the Linux update binary:

http://www.hp.com/swpublishing/MTX-ec0e18db6a8e4d978b57aa95d1

These will update the 331FLR NIC to Boot Code version 1.37/NCSI 1.2.37.

Then update the tg3 driver in ESXi with this offline bundle to 3.129d.v40.1:

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI4X-BROADCOM-TG3-3129DV401&productId=...

You need the offline bundle file (BCM-tg3-3.129d.v40.1-offline_bundle-1033618.zip) from this package. You can import that in the vCenter update manager for easier deployment or (probably) install it from the ESXi shell with esxupdate --bundle=/tmp/BCM-tg3-3.129d.v40.1-offline_bundle-1033618.zip

I'm a bit rusty in the ESXi 4.1 CLI department though, you might have to resort to the vihostupdate utility or with PowerCLI Install-VMHostPatch remotely:

https://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.upgrade.doc_41/esx...

-- http://alpacapowered.wordpress.com

View solution in original post

0 Kudos
6 Replies
MKguy
Virtuoso
Virtuoso
Jump to solution

- Am I missing a configuration somewhere to allow ipv6 multicast to work? Or even to remove any "logic" and just treat it as a broadcast?

On network switches, you can do this by turning off IGMP which will then treat multicast as broadcast packets.

Though I can't find a similar setting under esxi anywhere.

No, it should work out of the box. The vSwitch makes forwarding decisions based on the layer 2 addresses and shouldn't care about the protocols above.

Can both your VMs communicate via IPv4/layer2-broadcasts when spread across hosts?

I've tested this myself on ESXi 5.1 U1 with two Windows 2008 R2 VMs with the vmxnet3 vNIC, and ICMPv6 Neighbor Discovery for link-local IPv6s works fine on the same host as well as across hosts.

The destination IPv6 of the solicitation message is the usual solicited-node address of the destination, and the layer 2 destination MAC is derived from it and in the IPv6 multicast MAC range prefix 33:33:

pastedImage_8.png

- I saw an "enable ipv6" option on esxi, but i assume this is only useful if the host itself wants to participate in ipv6 and therefore not applicable to my particular case?

Correct, that is only for the host management network itself and not related to VM traffic.

-- http://alpacapowered.wordpress.com
t3chsp0t
Contributor
Contributor
Jump to solution

Hi,

Thank you for looking at the issue.

Over the weekend, I managed to test the issue on two other environment.

- Current stage environment with problem:

Physical DL380 G8 servers

Catalyst switches

ESXi 4.1 update 3

Failure in inter host ICMPv6


- Dev environment

Physical DL380 G7 servers

Catalyst switches

ESXi 5.5

Success in inter host ICMPv6

- Prod environment

Dl380G6 blades with HP Virtual Connect

Catalyst switches

ESXi 4.1 update 3

Success in inter host ICMPv6


I've also found a possible culprit:

A combination of DL380G8 and vmware:

https://communities.vmware.com/message/2321555


For now, I am creating a startup script on source and target VMs with each other's MAC address.

What worries me is we just purchased DL380 G8 blades that will replace the production blades, luckily the issue only affects ipv6 multicasts.

On all environment I can confirm that ipv4 multicast is working, this is how the first part of "ping -6 target_host" work when resolving DNS over LLMNR

The second part where Neighbor Solicitation message is sent over ipv6 multicast is the broken part.

I'll keep hunting, and I'll try to test this on our prod cluster as well

0 Kudos
MKguy
Virtuoso
Virtuoso
Jump to solution

- Prod environment

Dl380G6 blades with HP Virtual Connect

Catalyst switches

ESXi 4.1 update 3

Success in inter host ICMPv6

we just purchased DL380 G8 blades

A bit confusing here. I suppose you're referring to HP BL blades? A DL (380) system is a rack server and not a blade.

- Current stage environment with problem:

Physical DL380 G8 servers

Catalyst switches

ESXi 4.1 update 3

Failure in inter host ICMPv6

Looking at the other thread, did you make sure the firmware and drivers are up to date? What NICs do you have in your 380 gen8 servers where you experience the problem?

Check and post the output of esxcfg-nics -l and ethtool -i vmnicX for each physical NIC on these hosts.

You can find more recent drivers for ESXi 4.1 here or on the HP support site:

https://my.vmware.com/group/vmware/info?slug=datacenter_cloud_infrastructure/vmware_vsphere/4_1#driv...

-- http://alpacapowered.wordpress.com
0 Kudos
t3chsp0t
Contributor
Contributor
Jump to solution

Hi,

MKguy wrote:

- Prod environment

Dl380G6 blades with HP Virtual Connect

Catalyst switches

ESXi 4.1 update 3

Success in inter host ICMPv6

we just purchased DL380 G8 blades

A bit confusing here. I suppose you're referring to HP BL blades? A DL (380) system is a rack server and not a blade.

It is indeed BL380, apologies for the shoddy copy and paste job

- Current stage environment with problem:

Physical DL380 G8 servers

Catalyst switches

ESXi 4.1 update 3

Failure in inter host ICMPv6

Looking at the other thread, did you make sure the firmware and drivers are up to date? What NICs do you have in your 380 gen8 servers where you experience the problem?

Check and post the output of esxcfg-nics -l and ethtool -i vmnicX for each physical NIC on these hosts.

You can find more recent drivers for ESXi 4.1 here or on the HP support site:

https://my.vmware.com/group/vmware/info?slug=datacenter_cloud_infrastructure/vmware_vsphere/4_1#driv...

Hmm, they are actually DL360 G8s which have the same Broadcom NICs as DL380s

Here are the output:

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

vmnic0  0000:03:00.00 tg3         Up   1000Mbps  Full   3c:4a:92:b2:2b:fc 1500   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic1  0000:03:00.01 tg3         Up   1000Mbps  Full   3c:4a:92:b2:2b:fd 1500   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic2  0000:03:00.02 tg3         Up   1000Mbps  Full   3c:4a:92:b2:2b:fe 1500   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic3  0000:03:00.03 tg3         Up   1000Mbps  Full   3c:4a:92:b2:2b:ff 1500   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

~ # ethtool -i vmnic0

driver: tg3

version: 3.123b.v40.1

firmware-version: 5719-v1.29 NCSI v1.0.80.0

bus-info: 0000:03:00.0

~ # ethtool -i vmnic1

driver: tg3

version: 3.123b.v40.1

firmware-version: 5719-v1.29 NCSI v1.0.80.0

bus-info: 0000:03:00.1

~ # ethtool -i vmnic2

driver: tg3

version: 3.123b.v40.1

firmware-version: 5719-v1.29 NCSI v1.0.80.0

bus-info: 0000:03:00.2

Let me check the drivers, I assume this is something I have to install from the console?

Cheers

Ed

~ # ethtool -i vmnic3

driver: tg3

version: 3.123b.v40.1

firmware-version: 5719-v1.29 NCSI v1.0.80.0

bus-info: 0000:03:00.3

0 Kudos
MKguy
Virtuoso
Virtuoso
Jump to solution

I'm not sure whether this will really solve your specific problem, but it's worth a try to update the NIC firmware and driver.

Seems like this is a HP NC 331FLR NIC (DL gen8 default 4-port NIC with the BCM5719 chip).

There's no update binary you can run from 4.1, but you can update all firmware components with the current HP Service Pack for Proliant image:

HP Service Pack for ProLiant

Or boot the server into a live-Linux of your choice and use the Linux update binary:

http://www.hp.com/swpublishing/MTX-ec0e18db6a8e4d978b57aa95d1

These will update the 331FLR NIC to Boot Code version 1.37/NCSI 1.2.37.

Then update the tg3 driver in ESXi with this offline bundle to 3.129d.v40.1:

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI4X-BROADCOM-TG3-3129DV401&productId=...

You need the offline bundle file (BCM-tg3-3.129d.v40.1-offline_bundle-1033618.zip) from this package. You can import that in the vCenter update manager for easier deployment or (probably) install it from the ESXi shell with esxupdate --bundle=/tmp/BCM-tg3-3.129d.v40.1-offline_bundle-1033618.zip

I'm a bit rusty in the ESXi 4.1 CLI department though, you might have to resort to the vihostupdate utility or with PowerCLI Install-VMHostPatch remotely:

https://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.upgrade.doc_41/esx...

-- http://alpacapowered.wordpress.com
0 Kudos
t3chsp0t
Contributor
Contributor
Jump to solution

Awesome,

Thank for you for detailing the upgrade process.

Will get this scheduled, if it ever gets approved (not sure we can get by with having one less host for now, may have to be postponed til we get another box)

Thanks again for your help, I now have enough workaround as well as plan to try to fix it.

Ed

0 Kudos