iSCSI SW Initiator balanced with different MTU Siz...

RLI2013 · ‎08-21-2017

Imaging the following situation exists 🙂 ...

We have iSCSI Storage Connections in our ESXi V6.5U1 environment as follows:

Fabric1, iSCSI1: vswitch1 (mtu1500) -> VMKernel Port1 on vmnic1 (mtu1500) -> physical iSCSI Switch1 (Support mtu9000 enabled) -> storage nic1 (mtu1500)

Fabric2, iSCSI2: vswitch2 (mtu1500) -> VMKernel Port2 on vmnic2 (mtu1500) -> physical iSCSI Switch2 (Support mtu9000 enabled) -> storage nic2 (mtu1500)

Both Adapters are configured for iSCSI (SW Initiator) and ESXi is load balancing over the 2 vSwitches (PSP=Nimble NCM Plugin, which is Kind of round Robin...). Everything was working fine in this config (even we restartet a physical Switch for maintenance).

Now we shut down iSCSI Switch1 in Fabric1 -> everything still works fine, because all storage traffic goes via iSCSI2. While iSCSI Switch1 was shut down, we reconfigured all iSCSI1 Interfaces (VMKernel Port1 on vmnic1 and storage nic1) to MTU=9000. We did not change mtu for vswitch1 (yes, this is a mistake... so don't do it, just imagine...).

So we had the following conditions:

iSCSI1: vswitch1 (mtu1500) -> VMKernel Port1 on vmnic1 (mtu9000) -> physical iSCSI Switch1 (Support mtu9000 enabled) -> storage nic1 (mtu9000) ->poweredOff

iSCSI2: vswitch2 (mtu1500) -> VMKernel Port2 on vmnic2 (mtu1500) -> physical iSCSI Switch2 (Support mtu9000 enabled) -> storage nic2 (mtu1500) ->poweredOn

After powering on again iSCSI Switch1 in Fabric1 all iSCSI1 Interfaces were up again and we had a complete production down... ESXi on 100%CPU, unavailable in vCenter, VMs online but "freezed", etc...

I'm not sure if HA events caused a "problem chain" - do not think it is the root cause... seems this was more an action maybe because of a APD (All-Path-Down) condition.

But how can this happen? What is ESXi doing when a load balanced iSCSI SW Initiator sees 2 fabrics (miss-)configured like described above? What happens on Fabric1 if vSwitch1 is configured with MTU=1500 and all Interfaces on MTU=9000? Will this result in not seeing the storage at all, even Fabric2 is still configured correctly? Is ESXi fine with the storage path balancing over a missconfigured Network or could this generate a all-path-down (APD) condition? What is the ESXi doing in this condition?

Any Input much appreciated!

RLI2013 · ‎08-24-2017

Dear all

As of VMwares Support I got the answers to the magic that happened.

The vSwitch is not able to fragment packages at all. So if endpoints wants to communicate with MTU=9000 and vSwitchs MTU is set to less (1500 in my case) the path is not working and therefore should be down. But according to Support it is worse than that...: if you generate a condition like this the ESXi starts high CPU cycles for this fragmenting actions their not possible. This can end in CPU panic what we have seen on some servers during the outage. So missconfiguration was our fault - with that we would not have achieved our goals to go to MTU=9000 - but the behaviour of the ESXi Server in that case is a bug. Support agreed that it should not be possible to set a MTU on a VMKernelPort higher than the actual set MTU on the vSwitch. This is possible at the Moment (V6.5U1), so pay attention to this.

I hope this information helps others not hitting the same bug by mistake.

Regards

RParker · ‎08-24-2017

This can end in CPU panic what we have seen on some servers during the outage. So missconfiguration was our fault - with that we would not have achieved our goals to go to MTU=9000 -

Also 9000 packet size is overrated.. we are talking less than 7% difference between 1500 and 9000 in overall performance this is for files and vMotion migrations.. on the order of magnitude 120seconds vs 129 seconds to migrate the same VM, things like that.. how 'efficient' is a few seconds or save a little bit of traffic.

So you may not even NEED 9000 jumbo frames in the first place.. I know you were told something else, but google 'MTU 1500 vs 9000 vmware' and see for yourself there are PLENTY of people that have done research and come to the same conclusion.. there is a negligible difference.

The idea for higher MTU is to benefit the switch larger packets means less switch CPU (because less packet traffic) but if the switches are not utilized highly MTU 9000 won't even be felt at the switch level.

All

iSCSI SW Initiator balanced with different MTU Size question