What could be the cause for this warning that recently showed up in a clients environment. There has been no recent configuration changes.
Double checked vmk MTU on all hosts and they are all set to 1500 which according to KB2108285 is fully supported and should not cause any warnings:
"What can cause a failure is if the vmknic has an MTU of 9000 and then the physical switch enforces an MTU of 1500. This is because the source does not fragment the packet and the physical switch will drop the packet. However, if there is an MTU of 1500 on the vmknic and an MTU 9000 on the physical switch (for example, there is also an iSCSI running which is using 9000) then there is no issue and the test passes."
That might very well be so, it's a RoBo office communicating with remote witness and the alarm triggers at the time when backups are running.
Since the latency requirements on a witness site is >100ms and the alarm sets off at >10ms this trigger a lot of unnecessary alarms. Is there any way around this?
I will bump this since it's still a nuance and after looking deeper into the configuration I can't find a reason for the alarm to trigger.
Got two clusters with similar issue, located on completely different sites.
One cluster is connected with 10GB-links and no witness site. vmk MTU is set to 1500 and manual testing show that ping with large packets is fragmented as expected with 0.2-0.3ms latency.
Still, the MTU check triggers a warning every time the test is run.
The second cluster do have an offsite witness with limited bandwidht so that might be a reason for it to fail but why would cluster#1 with 10GB-links fail in the same way?
There is a bug or some sort of known issue with the way this test works through the VIC card. The virtualization layer between the physical connection and the hypervisor's connection will generate random MTU check failures. It also causes a lot of ARP traffic.
I wish I had a link to it, but if you shake the VMware and Cisco support trees they may have an official answer.
I recommend disabling the MTU check with anything using the VIC cards. I've experienced it multiple times in different C series deployments just like yours.
Interesting, I have just like you seen this on more than one C-series installation. Thanks for the input!
Is there a way to disable the MTU check now? Pretty sure there wasn't when I created this thread.
Edit: Here's how to disable certain health checks:
The easier option to disable the MTU check is just to modify the Health Check options on the network switch.
To enable or disable vSphere Distributed Switch health check in the vSphere Web Client:
I used the RVC command to silence the MTU check and that worked fine.
vsan.health.silent_health_check_configure -a largeping <CLUSTER>
RVC is proving more and more useful every day.
Would be nice to figure out why the Cisco VIC triggers this error though.
I get similar MTU errors using Dell FX2s. Could it be a more generic relationship to IOAs or converged adapters? Do you have a link to the Cisco notes and I could maybe see if it correlates.
same problem here using vxrail E series and a remote witness. I was using a local witness at first and the error was not showing up. Could it be because the max payload to talk to the witness is not 1500 but 1410 due to the VPN overhead ?