VMware Cloud Community
timalexanderINV
Enthusiast
Enthusiast

ESXi 6.5u1 netdump failing

So am currently rolling out new VCSAs (6.5u1) and new clusters running ESXi 6.5u1.  Historically we have used autodeploy so netdump has been a must to retain info from a PSOD.  Although we are now using SD Cards the centralised source for dump info provided by netdump is too good to lose.

The issue I am hitting is that netdump starts but immediately fails with a :

(via NMI in the iDRAC)

Starting network coredump from IP-HOST to IP-VCSA

Cannot continue NetDump from DF/NMI/MCE stack

(via vsish -e set /reliability/crashMe/Panic 1)

Starting network coredump from IP-HOST to IP-VCSA

Netdump: FAILED: Couldn't attach to dump server at IP-HOST IP-VCSA.

Stopping Netdump.

It then proceeds to coredump instead.

I have made sure the netdumper service is started (and set to auto), /storage/netdump partition increased from 1GB to 10GB, file size limit changed in /etc/sysconfig/netdumper and ESXi host is configured to use the vCenter and the correct VMK (in our case vmk2).  I can test connectivity from the ESXi hosts' SSH and that passes, as well as being able to send UDP on 6500 to the target (nc -z -w 1 -s VMkernelIPAddress -u DumpCollectorIPAddress DumpCollectorPortNumber). I have even unloaded the firewall on the ESXi host and get the same result.  Baffled and not sure where to go from here.  /var/log/vmware/netdumper/netdumper.log also contains no information (except confirmation that the check command has compelted).

EDIT:  Ok, so looking at the netdumper.log I can see the following entry when netdumper is started;

2017-11-23T22:49:12.060Z| netdumper| I125: Configured size limits: 5 GB per file, 10 GB per host, 20 GB for all

Looking at a coredump on a host these are some 7.4GB.  Is the issue the fact that the single file is greater than 5GB?  If so anyone know where this parameter is configured?

Message was edited by: Tim Alexander - new detail

0 Kudos
4 Replies
hussainbte
Expert
Expert

share the esxi N/W configuration for management n/w.

is it DVS or standard switch

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
0 Kudos
timalexanderINV
Enthusiast
Enthusiast

Management is on a dvswitch over vmk2.  The netdump configuration on the host reflects this.

0 Kudos
msripada
Virtuoso
Virtuoso

When you issue panic, does the screenshot shows that the dump is not successful or space issue etc ?

you can also check the vobd.log after reboot to see if it says anything related to configured space issue. Even if it is space issue, i would still assume it should put a partial dump but i have not tried this anytime to confirm.

Thanks,

MS

0 Kudos
mattieh
Contributor
Contributor

Did you manage to get this working?

We are also running 6.5u1 and also have exactly the same error message (Cannot continue NetDump from DF/NMI/MCE stack)

# esxcli system coredump network check

Verified the configured netdump server is running

Also, checking from ESXi to VCSA with nc seems work fine

No errors displayed on the VC service side.

0 Kudos