VMware Cloud Community
scogen
Enthusiast
Enthusiast
Jump to solution

vMotion failed criteria 128

I posted this is the Dell forums.  Maybe someone here can help.

I have over a dozen PowerEdge M610 and M910 servers across different blade chassis, all with x520-k 10GB mezzanine cards in them and all experiencing the same problem. I've recently upgraded the servers to ESXi 6.0 U3, using Dell's custom .iso version A10.  After upgrade, I had to manually remove the Mellanox nmlx5-core driver, otherwise EsxUpdate failed with an error 99.  I don't think this is related but mentioning it here for completeness.

The main issue is since the upgrade, initiating a vMotion will often cause the nics to go offline due to "Failed Criteria: 128." vMotion and VM networks share the same nics, so this also means all my VMs on that host drop off the network. Below is the exact error:

"2018-07-25T13:50:19.871Z: [netCorrelator] 508215881911us: [vob.net.pg.uplink.transition.down] Uplink: vmnic4 is down. Affected portgroup: vMotion-1. 1 uplinks up. Failed criteria: 128"

BTW I am using multinic vMotion.  I haven't seen the nics go offline due to normal vm traffic, just vMotion.  The failing nics are Dell x520-k (Intel 82599 10GB) using the ixgen 1.6.5 driver.   This is listed on the VMware HCL. VMware support said it's most likely a driver/firmware issue and to contact Dell.  The drivers -provided by VMware- are current and on the HCL.  Firmware is n/a; nevertheless I found and applied a newer firmware to one of the devices but it didn't help.

A quick "fix" is to administratively cycle the nics:

     esxcli network nic down -n vmnicX

     esxcli network nic up -n vmnicX

     esxcli network nic down -n vmnicY

     esxcli network nic up -n vmnicY

The equipment is so old, support is no longer offered.  Everything I've checked shows this equipment is supported and should work on ESXi 6.0, and I have all the recommended firmware/driver versions.

1 Solution

Accepted Solutions
SupreetK
Commander
Commander
Jump to solution

Can we change to using the async driver (ixgbe) and see if the issue persists?

Command to check if the async driver is installed - <esxcli software vib list | grep -i ixgbe>

Commands to disable the native module and enable the async module -->

esxcli system module set -e true -m ixgbe

esxcli system module set -e false -m ixgben

Reboot the host for the changes to take effect. Once rebooted, check if the NICs are still using the native driver or have changed to using the native one.

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

View solution in original post

3 Replies
SupreetK
Commander
Commander
Jump to solution

Can we change to using the async driver (ixgbe) and see if the issue persists?

Command to check if the async driver is installed - <esxcli software vib list | grep -i ixgbe>

Commands to disable the native module and enable the async module -->

esxcli system module set -e true -m ixgbe

esxcli system module set -e false -m ixgben

Reboot the host for the changes to take effect. Once rebooted, check if the NICs are still using the native driver or have changed to using the native one.

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

scogen
Enthusiast
Enthusiast
Jump to solution

That may have done it.  After switching to the async driver, I've performed a dozen vMotions with no network drops.

I'm going to push this out to a few more servers but so far very promising.

Reply
0 Kudos
SupreetK
Commander
Commander
Jump to solution

Good to hear that! Smiley Happy

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

Reply
0 Kudos