VMware Cloud Community
Gl_Proxy
Contributor
Contributor

ESXi 6.5 VSAN crash 6 host

Hello guys,

We have 6 servers Lenovo x3650 M5 united in VSAN cluster.

Time to time, all server in cluster down in purple screen, image attach bellow.

Any ideas? We don't have vmware support on this case, plz help!

Same error: "exception 14 vmnic5" on all host.

10 Replies
dalbyit
Contributor
Contributor

you have network problems.

From my point of view, if affordable, throw away the useless broadcom lan adapters and install Intel adapters and the majority of your nightmare (now and futures) will go away.

0 Kudos
TheBobkin
Champion
Champion

Hello Gl_Proxy,

Can you share a screenshot of the whole backtrace?

You seem to have just snipped of the top of the image which doesn't tell much at all.

Bob

0 Kudos
RAJ_RAJ
Expert
Expert

Could you please update the firmware and install Latest Drivers

RAJESH RADHAKRISHNAN VCA -DCV/WM/Cloud,VCP 5 - DCV/DT/CLOUD, ,VCP6-DCV, EMCISA,EMCSA,MCTS,MCPS,BCFA https://ae.linkedin.com/in/rajesh-radhakrishnan-76269335 Mark my post as "helpful" or "correct" if I've helped resolve or answered your query!
0 Kudos
Gl_Proxy
Contributor
Contributor

Thank you for answer

All servers in cluster have same error, i attached two full crash screen

0 Kudos
Gl_Proxy
Contributor
Contributor

Yeah, VSAN iformation show full capability on every service, we even downgrade HBA driver for full comliance

0 Kudos
Gl_Proxy
Contributor
Contributor

We use Intel network adapter with latest driver

0 Kudos
TheBobkin
Champion
Champion

Hello Gl_Proxy

You are likely hitting one of the known issues with these NICs as is called out in the backtrace which appears to be matching for the relevant elements of the backtrace seen here:

https://kb.vmware.com/s/article/2126909

Do as the kb advises and either use native drivers or disable TSO/LRO to prevent these PSODs from occurring.

Also, your vSAN controller is not causing this - I would advise having these on the latest driver/firmware pair as per the vSAN HCL listing for this device.

Bob

Gl_Proxy
Contributor
Contributor

Ok, thank you Bob

I think this is good advice, i truly understand we need replace driver "i40e" to "i40en"?

0 Kudos
TheBobkin
Champion
Champion

Hello GI_Proxy,

Correct.

Or if you are not pinned for CPU resources, consider disabling TSO/LRO (~5% more CPU usage configuredl ike this).

Bob

Gl_Proxy
Contributor
Contributor

Ok, thank you for reply    

We install new network driver and i update this topic later, when we make sure what all host works fine, maybe next week

0 Kudos