VMware Cloud Community
CygAL
Enthusiast
Enthusiast

MTU Check warning

What could be the cause for this warning that recently showed up in a clients environment. There has been no recent configuration changes.

MTU.JPG

Double checked vmk MTU on all hosts and they are all set to 1500 which according to KB2108285 is fully supported and should not cause any warnings:

"What can cause a failure is if the vmknic has an MTU of 9000 and then the physical switch enforces an MTU of 1500. This is because the source does not fragment the packet and the physical switch will drop the packet. However, if there is an MTU of 1500 on the vmknic and an MTU 9000 on the physical switch (for example, there is also an iSCSI running which is using 9000) then there is no issue and the test passes."

Regards

Tags (2)
12 Replies
admin
Immortal
Immortal

That alert can be triggered by a spike in latency. Ping speed of >10ms during the large frame test will cause this warning status.

0 Kudos
CygAL
Enthusiast
Enthusiast

That might very well be so, it's a RoBo office communicating with remote witness and the alarm triggers at the time when backups are running.

Since the latency requirements on a witness site is >100ms and the alarm sets off at >10ms this trigger a lot of unnecessary alarms. Is there any way around this?

0 Kudos
CygAL
Enthusiast
Enthusiast

I will bump this since it's still a nuance and after looking deeper into the configuration I can't find a reason for the alarm to trigger.

Got two clusters with similar issue, located on completely different sites.

One cluster is connected with 10GB-links and no witness site. vmk MTU is set to 1500 and manual testing show that ping with large packets is fragmented as expected with 0.2-0.3ms latency.

Ping.JPG

Still, the MTU check triggers a warning every time the test is run.

The second cluster do have an offsite witness with limited bandwidht so that might be a reason for it to fail but why would cluster#1 with 10GB-links fail in the same way?

Kind Regards

0 Kudos
JasonTh_C
Contributor
Contributor

Are these UCS C series servers using 10Gbps VIC cards?

0 Kudos
CygAL
Enthusiast
Enthusiast

Hi Jason,

Yes it certainly is!

0 Kudos
JasonTh_C
Contributor
Contributor

There is a bug or some sort of known issue with the way this test works through the VIC card. The virtualization layer between the physical connection and the hypervisor's connection will generate random MTU check failures. It also causes a lot of ARP traffic.

I wish I had a link to it, but if you shake the VMware and Cisco support trees they may have an official answer.

I recommend disabling the MTU check with anything using the VIC cards. I've experienced it multiple times in different C series deployments just like yours.

0 Kudos
CygAL
Enthusiast
Enthusiast

Interesting, I have just like you seen this on more than one C-series installation. Thanks for the input!

Is there a way to disable the MTU check now? Pretty sure there wasn't when I created this thread.

Edit: Here's how to disable certain health checks:

How to silence VMware vSAN Health Checks | Virten.net

//A

0 Kudos
JasonTh_C
Contributor
Contributor

The easier option to disable the MTU check is just to modify the Health Check options on the network switch.

Enabling vSphere Distributed Switch health check in the vSphere Web Client (2032878) | VMware KB

To enable or disable vSphere Distributed Switch health check in the vSphere Web Client:

  1. Browse to a vSphere distributed switch in the vSphere Web Client.
  2. Click the Manage tab.
  3. Click Settings and then click Health check.
  4. To enable or disable health check, click Edit.
  5. Select from the dropdown to enable or disable health check options.

    The options include:
    • VLAN and MTU: Reports the status of distributed uplink ports and VLAN ranges
    • Teaming and Failover: Checks for any configuration mismatch between ESXi and the physical switch used in the teaming policy.
0 Kudos
Bleeder
Hot Shot
Hot Shot

The problem there is that you cannot disable the MTU check separately from the VLAN check Smiley Sad

0 Kudos
CygAL
Enthusiast
Enthusiast

I used the RVC command to silence the MTU check and that worked fine.

vsan.health.silent_health_check_configure -a largeping <CLUSTER>

RVC is proving more and more useful every day.

Would be nice to figure out why the Cisco VIC triggers this error though.

0 Kudos
timalexanderINV
Enthusiast
Enthusiast

I get similar MTU errors using Dell FX2s.  Could it be a more generic relationship to IOAs or converged adapters?  Do you have a link to the Cisco notes and I could maybe see if it correlates.

0 Kudos
lux209
Contributor
Contributor

same problem here using vxrail E series and a remote witness. I was using a local witness at first and the error was not showing up. Could it be because the max payload to talk to the witness is not 1500 but 1410 due to the VPN overhead ?

0 Kudos