Hi experts,
1. We are running
vmware -v
VMware ESX Server 3.5.0 build-110268
2. and we have a sc core
#file cos-core-XX
cos-core-XX: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV)
3. and also the corresponding vmlinux image
file vmlinux-2.4.21-57.ELvmnix
vmlinux-2.4.21-57.ELvmnix: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
4. along with System.map file
5. Installed crash
crash -v
crash 4.0
...
GNU gdb 6.1
...
This GDB was configured as "i686-pc-linux-gnu".
6. When we try to debug the core, crash itself crashes with a segmentation fault.
crash vmlinux-2.4.21-57.ELvmnix cos-core-HWSVR02
crash 4.0
...
..
Segmentation fault
7. Then we compiled crash 4.1.0 from src files and ran it through gdb
Program received signal SIGSEGV, Segmentation fault.
0x08124860 in dump_Elf32_Nhdr (offset=4096, store=1) at netdump.c:1734
1734 netdump_print("%08lx ", *uptr++);
It looks like when crash is trying to dump ELF 32 headers, it crashes.
We are aware the vmkdump utility exists for debugging vmkernel crash dumps but what is the best way of debugging the cos core.
We intend to open a SR for this with vmware but many times it has helped to post questions on VMTN.
So far a many of our questions have been answered and hope to get a response for this asap.
Many thanks in advance.
Hi do you have these files ?
vmlinux-2.4.21-57.ELvmnix and vmlinux-2.4.21-57.ELvmnix.debug
It looks like a malformed header.
Please use the netdump.c which is attached to this post and recompile the crash utility and try.
Note this is not the patch it is actuall netdump.c
$ tar xvzmf crash-4.1.0.tar.gz
...
$ cd crash-4.1.0
$ cp path/to/attached/netdump.c netdump.c
$ make
Many Thanks, let me try out netdump.c asap. Yes,I can provide all the required files - vmlinux-2.4.21-57.ELvmnix and vmlinux-2.4.21-57.ELvmnix.debug
Where do I upload these files? We are really in need of help here.
You can upload some where and let me know the location privately.
What we have are files
System.map
vmlimnux-2.4.21-57.ELvmnix.21-752-594
vmlinuz-2.4.21-57.ELvmnix-281-752-594
cos-core-HWSVR02
vmlinux-2.4.21-57.EL.debug
We specifically don't have the files you mentioned. I compiled it with netdump.c you provided and no seg fault now but get this error
crash vmlinux-2.4.21-57.EL.debug vmlimnux-2.4.21-57.ELvmnix.21-752-594 cos-core-HWSVR02
crash 4.1.0
crash: vmlimnux-2.4.21-57.ELvmnix.21-752-594: not a supported file format
I think the
syntax might be wrong
crash [-h ][-v][-s][-i file][-d num]
This patch http://download3.vmware.com/software/esx/ESX350-200808201-UG.zip contains the kernel source rpm
may be use the -g flag to rebuild the debug image and then try .
kernel-source-350.2.4.21-57.EL.110268.i386.rpm
please try :
crash
May be need the debug image from the vmware support of build one from the
source. also make sure that the same version of the esx server is used to build the debug image.
Thanks for the update. I'll upload the core I could reproduce on my test system (ESX 3.5.4) and send you a private note.
Meanwhile, let me try compiling the kernel.
Another point I want to add is that I couldn't find the *.debug file under /usr/lib/debug/boot.
1. Compiled crash with the new netdump.c
2. Recompiled ESX 3.5 with -g option
3. Generated a cos core file
4. # pwd
/usr/src/linux-2.4.21-47.0.1.EL
5.crash -S System.map vmlinux /vmfs/volumes/469e8d3d-5bda02b2-5a67-0015c5eaf19d/cos-core-thorpc205
Again, got segmentation fault
crash 4.1.0
crash: overriding /boot/System.map with System.map
WARNING: possibly corrupt ELF32_Nhdr: n_namesz: 0 n_descsz: 0 n_type: 0
Segmentation fault
Will upload the files and provide a location soon.
have you checked this
http://www.vmware.com/support/vi3/doc/vi3_esx35u4_rel_notes.html#knownissues
see if anything is applicable to the cause of the panic.
If the core is not debuggable there is nothing we can do as it is a propeietory
os with some modification . VMware support is the one that needs to be actively involved.
May be this is applicable:
ESX Server hosts become unresponsive during a network broadcast storm
When a network broadcast storm occurs, ESX Servers might become unresponsive due to an issue with the tg3 network driver. During this time, service console or virtual machines that use the tg3 NIC might lose network connectivity. Rebooting the machine or unloading/loading the driver restores connectivity, but does not resolve the issue.
ESX hosts with tg3 port cannot send or receive packets after being subjected to a broadcast storm. The following error messages might be logged in VMkernel:
1. WARNING: Net: 1082: Rx storm on 1 queue 3501>3500, run 321>320
2. VMNIX: WARNING: NetCos: 1086: virtual HW appears wedged (bug number 90831), resetting
Core dumps are lost when multiple ESX hosts share a dump partition
If multiple ESX hosts that share a dump partition fail and save core dumps simultaneously, the core dumps might be lost.
what is the messages output ?
May be something bad with the hardware . Is it a VMware HA environment ?
Does the panic happen on all nodes ? and unable to debug the others ?
From what i see the core is not valid , Even with gdb you might get an unable to do anything
do you get any error when running
gdb do you get the core file is truncated
or
Cannot access memory at address 0x0 ??
We might not be able to do anything further without vmware support assistance.
is there a difference of the readelf -a output of the core files you generated and the bad core file
you are dealing with ?
Many thanks once again. I will read thru your post and respond. I did check the ESX 3.5.4 release notes for another issue but its not applicable here.
Meanwhile, here is the location of the core files and I'm pretty sure these core files are not corrupt.
Name (ftp.veritas.com:support): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
230 Logged in anonymously.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> bin
200 Type okay.
ftp> hash
Hash mark printing on (8192 bytes/hash mark).
ftp> prompt
Interactive mode off.
ftp> cd /pub/support/281-752-594
250 "/pub/support/281-752-594" is new cwd.
ftp> ls
200 PORT command successful.
150 Opening ASCII mode data connection for /bin/ls.
System.map
cos-core-thorpc205
vmlinux
#
226 Listing completed.
41 bytes received in 0.0067 seconds (5.95 Kbytes/s)
ftp>
Could you please post the readelf -a output of this file and the other core file that
you created in the lab on which you were able to run the crash ?