Hi. We are thinking of enabling Jumbo frames on a VSAN vmKernel port (MTU change 1500 to 9000).
Currently we have a 3 node VSAN Cluster that has the ESXi vmKernel VSAN ports set up on 1500. The idea is to change this on each host to enable Jumbo frames (the change has already been made on the physical switches, for the corresponding vmnics (uplinks) and the vDS that has the vmKernel portgroup set up).
The VSAN cluster and vsandatastore has already been set up, and cosequently it has already a number of VMs and vShield Edges (total: 191) running.
We would like to determine impact on making this change.
(Because this change has to be made on a per host basis, we will first changing the value on one host after the other. This means that up until all hosts have the correct MTU value they will be as a group in a inconsistent status. Will we be experiencing vsandatastore disconnection?)
Would it be recomended to power of all VMs before making this change (they all have a VSAN Storage policy of host fault tolerance = 1)?
As per the Network Design document, I would ask why you are going through this exercise? Are you hoping to improve performance in some way?
This document states "Otherwise, jumbo frames are not recommended as the operational cost of configuring jumbo frames throughout the network infrastructure could outweigh the limited CPU and performance benefits."
Yes, I understand that there may be some doubts / discussion as to the overall benefits to be achieved. At the moment I don't want to get in to the discussion wether the overall benefits justify the change. We are in the process of implementing a "Best Practice". As mentioned we believe the change is only needed and will be implemented on the VSAN vmkernel ports, and end to end conectivity for these ports will be guranteed to be only the switches the vmincs are connected.
Our main concern at the moment, is the actual impact of making the change (rather than the actual results). As mentioned the VMs have a Storage Policy of FT=1, even thou this... during the actual change will some VM's loose IO access to their vmdk's?
Would this warrant power in off the VM, or will the Storage Policy guarantee IO access during the change?
I have tested this in my lab and VM objects remained accessible during the change in MTU of one of the hosts to 9000. I will do further testing, but ESXi should be able to fragment the larger packets where required, which will lead to a performance hit.
The statsDB object went inaccessible temporarily during the MTU however. The host I chose to enable for the Jumbo Frames happened to be the Master for the StatsDB (i.e. the host responsible for collecting the stats). This issue resolved itself, however, after a very short time once the election completed.
Currently, the only health check that is failing is the MTU ping test, obviously, but all data tests are passing and all objects and components remain ACTIVE.
As a matter of fact, the MTU size I chose was not supported by the virtual switch my ESXi hosts were connected to either. So, even without end-to-end support, my objects remained accessible.
Loss of communication to a single host should not cause FTT=1 Object to become inaccessible provided that they are healthy and compliant with their Storage Policy before this occurs.
However, any VMs running on a host that loses communication will become inaccessible so I would advise to vMotion all running VMs off the host being configured, then check communication (e.g. host is a vSAN-cluster member, all components still showing as state:5 / ACTIVE, Storage Policy Compliant etc.), then if all looks good, proceed to do the same on the next host.