Hi I am pretty new to VMWare and ESXi with vSphere Web Client.
Lately i have been having some performance issues with my ESXI server. I wanted to investigate the logs and went to Host TAB in the Web Client and clikked Log Borwser and Retrieve now.
After several minutes the system was not responding and after walking to the server I saw that there was a crash.
Now I want to investigate what the hel is wrong but I don't know where to start.
I have investigated some log file but can't seem to find it.
I am using ESXi 5.5
Can anyone point me in the right direction?
Thanks in advanced.
Eric
You can post the PSOD Screen and take vmsupport and in the file you can find vmkernel.log file which was captured during the PSOD. The file can give you more insight what happened before and during the crash. You can upload it too, if you want multiple eyes to look at it :smileycool:
I don't want to make the system crash again actually
Actually, if it crashes more, then there are better chances to root cause it rather than to await an unexpected crash at critical time.
Attached you'll find the vmkernel.log
during crash or reboot, the vmkernel.log would be rotated, so the attached vmkernel.log would have only the details of the system as of booted and would not contain much information on the crash.
1. Check under /var/core
2. If you find a ***zdump.1 file, then
3. vmkdump-extract -l *** zdump.1
4. There you will get a vmkernel.log.1 file.
5. Can you upload that file?
On a side note, it is also good to raise a support request if you have valid contracts
Your crash resembles the same issue mentioned here VMware KB: ESXi 5.5 host fails with a purple diagnostic screen and the error: Usage error in dlmallo...
But if you are not using bnx2 driver, then I would strongly suggest you to raise a support request.
OTOH, there are indeed a lots of storage errors happening with your vmhba0:0:0, and luckily there are also kbs for working on it.VMware KB: vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using I...
Hope it helps, and PS: looking at the logs, I am really sorry to say that you might hit this issue more often
Thanks for the articles.
I will have a look at this. Think this will help.
Hi zXi_Gamer,
I have updated the bnx2 driver and the system seems much more responsive and faster. I s till have to check if when I want to pull out the logs the same error occurs.
I still have a reoccurring message in the vmkernel.log:
2014-03-27T10:44:57.367Z cpu4:33485)WARNING: LinScsi: SCSILinuxQueueCommand:1207: queuecommand failed with status = 0x1056 Unknown status vmhba33:0:0:0 (driver name: ahci) - Message repeated 17 times
2014-03-27T10:44:57.367Z cpu2:32789)ScsiDeviceIO: 2324: Cmd(0x412e80878e00) 0x2a, CmdSN 0x864 from world 32779 to dev "t10.ATA_____SAMSUNG_HD103SJ_________________________S246J9AB512738______" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
I have looked it up and this is the article that I have found.http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103038...
It states that this is temporary condition and will resolve it self.
vmkernel: 1:02:02:02.206 cpu3:4099)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x410005078e00) to NMP device "naa.6001e4f000105e6b00001f14499bfead" failed on physical path "vmhba1:C0:T0:L100" H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
This status is returned when a LUN cannot accept SCSI commands at the moment. As this should be a temporary condition, the command is tried again.
Can you comment on this?
Eric
Glad to hear about the increase in performance after the new driver. I doubt that you might hit the PSOD again. OTOH,
2013-11-08T17:41:22.839Z cpu0:2056)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1056 Unknown status vmhba0:0:0:0 (driver name: ahci) - Message repeated 162 times
Interestingly, as posted above, I do have the same messages in my server. But looks like in your case, the controller is vmhba33. Can you let me know, what is vmhba33?
On a side note, I am trying with the option of
esxcli system settings kernel set --setting=iovDisableIR -v TRUE
I will update you if the errors are disappeared after the setting
Hi zXi_Gamer,
anything after altering the setting?
No Luck there.. Still getting the messages...
Upgraded to ESXi 5.5 Update 1, messages are no longer spewing..
I'll try the update. Just had yesterday a very slow system while cloning a Server.
Thanks for the update.