VMware Cloud Community
bmeadows
Enthusiast
Enthusiast

VMkernel issue after reboot.

I'm having an issue with vmkernels after rebooting a 5.0 host.  It's currently with VMware support now, but I wanted to see if anyone else has experienced this issue.

After reboot, vmkernels do not respond to vmkpings (in or out) if there is an unused adapter in the port group settings.  This is on a fresh installation (not upgraded from 4.1).  I've even gone so far as to take switch configuration out of the equation by hooking two hosts' vMotion ports together via crossover cable.  Here is the testing I'm currently doing to recreate the issue.

ESXi 5.0 host

vSwitch0 - vmnic0

     VMNetwork

     Management Network

vSwitch1 - vmnic1, vmnic3

     vMotion = vmk3 = 10.79.9.4/16 (configured with vmnic1 Active, vmnic3 Unused)

ESXi 4.1 U1 host

vSwitch1 - vmnic6

     vMotion = vmk4 = 10.79.250.6/16

vmnic1 in the 5.0 host is connected to vmnic6 in the 4.1 host.  If I reboot the ESXi 5.0 host with this configuration, vmkping 10.79.9.4 from the 4.1 host fails.  Both vmnics are showing 1000 Full.  If I unplug the crossover cable, then plug it back in, vmkpings between the hosts are successful.  Likewise, if I removed the vmnic3 from vSwitch1 of the 5.0 host, I can reboot the 5.0 host and vmkpings respond as expected.  Anyone have any ideas?

Edit:  Hardware is a Dell 2900 with Broadcom NetExtreme II BCM5708 and BCM5709 NICs (issue occurs on both types) on the 5.0 host with all latest firmware. The 4.1 host is a Dell R610.

0 Kudos
3 Replies
bmeadows
Enthusiast
Enthusiast

I have some additional information, thanks to a blog post by Chris Wahl (@chriswahl): http://wahlnetwork.wordpress.com/2011/10/11/explicit-failover-shenanigans-when-upgrading-to-esxi-5-x...

Even though the vmkernel is set to explicitly use vmnic1 and vmnic3 is set to Unused, the vmkernel is still trying to use vmnic3 as its active adapter:

PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX
  33554436                 vmk3     vmnic3 vSwitch1              0.00    0.00       0.00    0.00   0.00   0.00

0 Kudos
P2Ver
Contributor
Contributor

Hi bmeadows,

I'm experiencing the same issue on my 4.1 hosts.

They're functioning just fine, but after a reboot the vmkernel network isn't reachable in or out.

What I do to fix this is delete the vmkernel portgroup, re-add it, and check all the boxes in the settings of the portgroup.

(Leave all options standard) and than it works again.

But, this is not how it's supposed to be.

Can you tell me what support said to you?

With my environment, the problem seems to have started after STP had been enabled on the physical switches.

But I'm not really sure whether this is the actual cause.

Hope to here from you soon.

0 Kudos
P2Ver
Contributor
Contributor

After testing it definately has something to do with vmic order on the switch you use for the vmkernel portgroup.

I have two dedicated vmnics on 1 vswitch for the vmkernel portgroup, both set to Active. After reboot the connectivity gets lost.

I changed one of the vmnics to standby, on the vswitch (portgroup settings remained greyed out) but no result.

Than, I tried to other vmnic and set that one to standby and there we go; connectivity is back.

Does this mean that vmkernel traffic only goes through óne vmnic at a time? So that teaming both vmnics to active isnt quite ok??

I'm testing this again now. Rebooting... hoping for the host to pic up the right vmnic for the vmkernel traffic.

Message was edited by: P2Ver My theory worked out just fine; find out which one of the two adapters is active and set the other one to standby. After a reboot my host picked up the vmkernel network perfectly! Connectivity is back. But... this is not what I want... I want both vmnics set to Active. I assume bmeadows wants this too?

0 Kudos