When running Log Browser ESXi Crashes

efeurich · ‎03-24-2014

Hi I am pretty new to VMWare and ESXi with vSphere Web Client.

Lately i have been having some performance issues with my ESXI server. I wanted to investigate the logs and went to Host TAB in the Web Client and clikked Log Borwser and Retrieve now.

After several minutes the system was not responding and after walking to the server I saw that there was a crash.

Now I want to investigate what the hel is wrong but I don't know where to start.

I have investigated some log file but can't seem to find it.

I am using ESXi 5.5

Can anyone point me in the right direction?

Thanks in advanced.

Eric

zXi_Gamer · ‎03-24-2014

You can post the PSOD Screen and take vmsupport and in the file you can find vmkernel.log file which was captured during the PSOD. The file can give you more insight what happened before and during the crash. You can upload it too, if you want multiple eyes to look at it :smileycool:

efeurich · ‎03-24-2014

Thanks for the reply and the direction.

Attached you'll find the vmkernel.log. AS far as I can see there are problems with the SCSIDviceIO

I can't attach the PSOD because I didn't catch the screen and I don't want to make the system crash again actually :smileyconfused:

zXi_Gamer · ‎03-24-2014

I don't want to make the system crash again actually

Actually, if it crashes more, then there are better chances to root cause it rather than to await an unexpected crash at critical time.

Attached you'll find the vmkernel.log

during crash or reboot, the vmkernel.log would be rotated, so the attached vmkernel.log would have only the details of the system as of booted and would not contain much information on the crash.

1. Check under /var/core

2. If you find a ***zdump.1 file, then

3. vmkdump-extract -l *** zdump.1

4. There you will get a vmkernel.log.1 file.

5. Can you upload that file?

On a side note, it is also good to raise a support request if you have valid contracts

efeurich · ‎03-24-2014

I'll look at our contracts and try to create a support call.

Attached there are 2 files. There where 2 kernel dumps.

Thanks for the help man. Much appreciated!

zXi_Gamer · ‎03-24-2014

Your crash resembles the same issue mentioned here VMware KB: ESXi 5.5 host fails with a purple diagnostic screen and the error: Usage error in dlmallo...

But if you are not using bnx2 driver, then I would strongly suggest you to raise a support request.

OTOH, there are indeed a lots of storage errors happening with your vmhba0:0:0, and luckily there are also kbs for working on it.VMware KB: vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using I...

Hope it helps, and PS: looking at the logs, I am really sorry to say that you might hit this issue more often

efeurich · ‎03-24-2014

Thanks for the articles.

I will have a look at this. Think this will help.

efeurich · ‎03-27-2014

Hi zXi_Gamer,

I have updated the bnx2 driver and the system seems much more responsive and faster. I s till have to check if when I want to pull out the logs the same error occurs.

I still have a reoccurring message in the vmkernel.log:

2014-03-27T10:44:57.367Z cpu4:33485)WARNING: LinScsi: SCSILinuxQueueCommand:1207: queuecommand failed with status = 0x1056 Unknown status vmhba33:0:0:0 (driver name: ahci) - Message repeated 17 times

2014-03-27T10:44:57.367Z cpu2:32789)ScsiDeviceIO: 2324: Cmd(0x412e80878e00) 0x2a, CmdSN 0x864 from world 32779 to dev "t10.ATA_____SAMSUNG_HD103SJ_________________________S246J9AB512738______" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

I have looked it up and this is the article that I have found.http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103038...

It states that this is temporary condition and will resolve it self.

VMK_SCSI_DEVICE_BUSY = 0x8

vmkernel: 1:02:02:02.206 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x410005078e00) to NMP device "naa.6001e4f000105e6b00001f14499bfead" failed on physical path "vmhba1:C0:T0:L100" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.

This status is returned when a LUN cannot accept SCSI commands at the moment. As this should be a temporary condition, the command is tried again.

Can you comment on this?

Eric

zXi_Gamer · ‎03-27-2014

Glad to hear about the increase in performance after the new driver. I doubt that you might hit the PSOD again. OTOH,

2013-11-08T17:41:22.839Z cpu0:2056)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1056 Unknown status vmhba0:0:0:0 (driver name: ahci) - Message repeated 162 times

Interestingly, as posted above, I do have the same messages in my server. But looks like in your case, the controller is vmhba33. Can you let me know, what is vmhba33?

On a side note, I am trying with the option of

esxcli system settings kernel set --setting=iovDisableIR -v TRUE

I will update you if the errors are disappeared after the setting

efeurich · ‎03-27-2014

vmhga33 is 1 off the scsi devices in the server. See screenshot attached.

It is the scsi device for datastore6 in my server. see screenshot

efeurich · ‎03-27-2014

The iovDisableIR seeting is already enabled on my system.

See screenshot

efeurich · ‎03-28-2014

Hi zXi_Gamer,

anything after altering the setting?

zXi_Gamer · ‎03-31-2014

No Luck there.. Still getting the messages...

zXi_Gamer · ‎04-03-2014

Upgraded to ESXi 5.5 Update 1, messages are no longer spewing..

efeurich · ‎04-03-2014

I'll try the update. Just had yesterday a very slow system while cloning a Server.

Thanks for the update.

All

When running Log Browser ESXi Crashes

VMK_SCSI_DEVICE_BUSY = 0x8