I installed ESXi 6 on one IBM server, but the host keeps reboot randomly for unknown reason.
Before I install ESXi 6, windows server 2008 is running on this machine and it has no such problem.
I search all the logs but cannot find the root cause, can anyone advise how I can investigate this issue.
Thanks
Anyone have idea about this?
Is the host on VMware's HCL list?
Have you checked the hostd.log on the host? /var/log/hostd.log
Have you checked your server is on the HCL? (unsupported hardware can cause that)
Has the host generated a coredumps ?
What is in the vmksummary log? you can check in http://HostIP/host
Make and model of Hardware ?
in var/core are you seeing and core dump files.
If it is HP is ASR enabled at ILO ?
What IDRAC or ILO logs says.
Thanks & Regards
Arjun Dooti
Server model is IBM x3550 M4 xeon E5-2620, it's in the HCL list.
There is no core dump file found.
No useful information in the hostd.log and vmksummary.log.
some message in the vmkwarning.log, but may not be related.
0:00:00:05.508 cpu0:32768)WARNING: VMKAcpi: 2448: Bus 13 (81) is already defined
2016-12-31T01:07:46.756Z cpu21:33266)WARNING: LinuxSignal: 541: ignored unexpected signal flags 0x2 (sig 17)
2016-12-31T01:07:50.032Z cpu1:33291)WARNING: LinNet: LinNet_CreateDMAEngine:4011: vusb0, failed to get device properties with error Not supported
2016-12-31T01:07:50.032Z cpu1:33291)WARNING: LinNet: LinNet_ConnectUplink:11920: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device
2016-12-31T01:07:50.481Z cpu6:33291)WARNING: LinNet: LinNet_CreateDMAEngine:4011: vusb0, failed to get device properties with error Not supported
2016-12-31T01:07:50.481Z cpu6:33291)WARNING: LinNet: LinNet_ConnectUplink:11920: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device
2016-12-31T01:07:50.728Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T9:L0 : Not found
2016-12-31T01:07:50.730Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T11:L0 : Not found
2016-12-31T01:07:50.732Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T12:L0 : Not found
2016-12-31T01:07:50.735Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T13:L0 : Not found
2016-12-31T01:07:50.737Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T14:L0 : Not found
2016-12-31T01:07:50.739Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T15:L0 : Not found
2016-12-31T01:07:50.741Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T16:L0 : Not found
2016-12-31T01:07:50.743Z cpu19:33355)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T17:L0 : Not found
2016-12-31T01:07:53.005Z cpu2:33210)WARNING: NetDVS: 659: portAlias is NULL
2016-12-31T01:08:01.854Z cpu3:33404)WARNING: RDT: RDTModInit:1074: Kernel is not configured for IPv6
2016-12-31T01:08:02.874Z cpu8:33528)WARNING: Supported VMs 171, Max VSAN VMs 400, SystemMemoryInGB 32
2016-12-31T01:08:02.874Z cpu8:33528)WARNING: MaxFileHandles: 5130, Prealloc 1, Prealloc limit: 32 GB, Host scaling factor: 1
2016-12-31T01:08:02.874Z cpu8:33528)WARNING: DOM memory will be preallocated.
2016-12-31T01:08:05.404Z cpu5:33583)WARNING: FTCpt: 476: Using IPv4 address to start server listener
2016-12-31T01:08:09.449Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T9:L0 : Not found
2016-12-31T01:08:09.451Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T11:L0 : Not found
2016-12-31T01:08:09.453Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T12:L0 : Not found
2016-12-31T01:08:09.455Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T13:L0 : Not found
2016-12-31T01:08:09.457Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T14:L0 : Not found
2016-12-31T01:08:09.459Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T15:L0 : Not found
2016-12-31T01:08:09.461Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T16:L0 : Not found
2016-12-31T01:08:09.463Z cpu11:33775)WARNING: ScsiScan: 1643: Failed to add path vmhba1:C0:T17:L0 : Not found
2016-12-31T01:08:12.590Z cpu14:34232)WARNING: PCI: 157: 0000:06:00.0: Bypassing non-ACS capable device in hierarchy
2016-12-31T01:08:12.590Z cpu14:34232)WARNING: PCI: 157: 0000:06:00.1: Bypassing non-ACS capable device in hierarchy
2016-12-31T01:08:12.591Z cpu14:34232)WARNING: PCI: 157: 0000:06:00.2: Bypassing non-ACS capable device in hierarchy
2016-12-31T01:08:12.591Z cpu14:34232)WARNING: PCI: 157: 0000:06:00.3: Bypassing non-ACS capable device in hierarchy
2016-12-31T01:08:30.274Z cpu14:35400)WARNING: NetDVS: 659: portAlias is NULL
Some other message in the vmkernel.com:
2017-01-12T23:37:38.784Z cpu0:32807)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a18508d5c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:38:34.260Z cpu10:34416)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x439d8695ec40, 0) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
2017-01-12T23:38:34.283Z cpu10:34416)ScsiDeviceIO: 2651: Cmd(0x439d8695ec40) 0x1a, CmdSN 0x3d39 from world 0 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2017-01-12T23:38:34.283Z cpu10:34416)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x85 (0x439d8695ec40, 34416) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:38:34.283Z cpu10:34416)ScsiDeviceIO: 2651: Cmd(0x439d8695ec40) 0x4d, CmdSN 0x9b5 from world 34416 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2017-01-12T23:38:34.283Z cpu10:34416)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x439d8695ec40, 34416) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
2017-01-12T23:38:34.283Z cpu10:34416)ScsiDeviceIO: 2651: Cmd(0x439d8695ec40) 0x1a, CmdSN 0x9b6 from world 34416 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2017-01-12T23:38:34.284Z cpu10:34416)ScsiDeviceIO: 2651: Cmd(0x439d8695ec40) 0x1a, CmdSN 0x3d3e from world 0 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2017-01-12T23:38:34.284Z cpu10:34416)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x85 (0x439d8695ec40, 34416) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:42:38.782Z cpu15:32822)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a180310700, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:47:38.781Z cpu21:32828)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a18036bf40, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:52:38.778Z cpu21:32828)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a1870e1f00, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-12T23:57:38.783Z cpu15:32783)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a180308240, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-13T00:01:02.009Z cpu19:35388)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a18036c9c0, 0) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
2017-01-13T00:01:02.037Z cpu19:35388)ScsiDeviceIO: 2651: Cmd(0x43a18036c9c0) 0x1a, CmdSN 0x3d48 from world 0 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2017-01-13T00:01:02.522Z cpu10:32860)ScsiDeviceIO: 2651: Cmd(0x439d85532500) 0x1a, CmdSN 0x3d4d from world 0 to dev "naa.600605b0054459d01fbde6d065e4443f" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2017-01-13T00:01:38.775Z cpu15:33298)NMP: nmp_ResetDeviceLogThrottling:3349: last error status from device naa.600605b0054459d01fbde6d065e4443f repeated 2 times
2017-01-13T00:02:38.782Z cpu18:32825)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a1870d5dc0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-13T00:02:38.782Z cpu18:32825)ScsiDeviceIO: 2635: Cmd(0x43a1870d5dc0) 0x1a, CmdSN 0x3d4e from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2017-01-13T00:07:38.784Z cpu10:32817)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x43a184f737c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2017-01-13T00:08:34.292Z cpu10:34416)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x1a (0x439d802f4900, 0) to dev "naa.600605b0054459d01fbde6d065e4443f" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
Last reboot time is around 12/31/2016 9:08:21 AM
The disk is configured as raid 00, no redundancy.
Looks to me like there is a problem with your USB (controller?)
2016-12-31T01:07:50.032Z cpu1:33291)WARNING: LinNet: LinNet_CreateDMAEngine:4011: vusb0, failed to get device properties with error Not supported
2016-12-31T01:07:50.032Z cpu1:33291)WARNING: LinNet: LinNet_ConnectUplink:11920: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device
2016-12-31T01:07:50.481Z cpu6:33291)WARNING: LinNet: LinNet_CreateDMAEngine:4011: vusb0, failed to get device properties with error Not supported
2016-12-31T01:07:50.481Z cpu6:33291)WARNING: LinNet: LinNet_ConnectUplink:11920: vusb0: Failed to create DMA engine with error Not supported, it maybe a pseudo device
Again, is your server and its components on the HCL for ESXi 6.0? And did you install ESXi with an IBM customized ISO?
Thanks for your reply. Yes, it's in the HCL.
It's not IBM customized ISO, only general version, I don't know there is customized version.
I don't use the USB on this server, the USB failure will cause the server reboot?
Can I just disable the USB device?
Hi,
Is it possible to upload last vmksummary.log and vmkernel.log files before the reboot.
cd /var/lrun/log
out put of below command
less vmksummary.log | grep boot
Verify is core dump partition configured if yes follow the below the below kb and generate the core dump file.
Configuring ESXi coredump to file instead of partition (2077516) | VMware KB
If core dump file not configured follow the below kb and configure
Generating a VMkernel zdump manually from a dump file in ESXi host (2081902) | VMware KB
Thanks & Regards
Arjun Dooti
Sorry I didn't see your reply.
You can download a customized ISO on Lenovo's website there : https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5098036
Maybe try reinstalling with it might be a driver issue.
Thanks Arjun.
I have attached the vmkernel.log and vmksummary.log, but I don't know how to find the vmkernel.log before the reboot, it only has the latest information.
I followed your links and found that there is no coredump generated on partition, now I have created a file for coredump, there is also no coredump message found in the vmksummary.log.
Here is the message from "less vmksummary.log | grep boot"
2016-11-15T19:03:47Z bootstop: Host has booted
2016-11-15T19:08:09Z bootstop: Host is rebooting
2016-11-15T19:11:42Z bootstop: Host has booted
2016-11-18T15:41:03Z bootstop: Host is rebooting
2016-11-18T07:51:42Z bootstop: Host has booted
2016-11-20T12:38:12Z bootstop: Host has booted
2016-11-20T12:54:33Z bootstop: Host has booted
2016-11-21T09:46:26Z bootstop: Host has booted
2016-11-21T13:18:27Z bootstop: Host has booted
2016-11-28T04:16:23Z bootstop: Host has booted
2016-11-28T04:31:43Z bootstop: Host has booted
2016-11-28T06:53:09Z bootstop: Host has booted
2016-11-28T18:31:56Z bootstop: Host has booted
2016-11-29T19:23:56Z bootstop: Host has booted
2016-11-29T22:56:02Z bootstop: Host has booted
2016-12-03T00:07:02Z bootstop: Host has booted
2016-12-03T13:27:01Z bootstop: Host has booted
2016-12-05T20:16:45Z bootstop: Host has booted
2016-12-06T16:17:42Z bootstop: Host has booted
2016-12-09T23:18:14Z bootstop: Host has booted
2016-12-19T02:39:22Z bootstop: Host is powering off
2016-12-19T03:27:27Z bootstop: Host has booted
2016-12-19T03:31:44Z bootstop: Host is powering off
2016-12-19T03:42:24Z bootstop: Host has booted
2016-12-26T13:59:29Z bootstop: Host has booted
2016-12-27T20:07:23Z bootstop: Host has booted
2016-12-30T21:01:43Z bootstop: Host has booted
2016-12-30T21:23:24Z bootstop: Host has booted
2016-12-31T01:08:27Z bootstop: Host has booted
Thanks vxav.
I have downloaded the customized ISO.
The file is only several MB and I don't know how to use it, the readme file is not clear either.
Can you help to explain it, thanks.
You have uploaded the kernel logs from 18th till 20th Jan, I dont see any reboot event occurred. When was the last reboot occurred, I can see it is on December based on one of your reply. Is the server running fine now ? for more than 20 days ?
Hi,
Have you noticed any recent reboots if yes upload the vmkerne files from /var/run/log folder
Hi ,
Esxi runs with UTC time zone. Attached vmkernel log files seems to be after reboot.
Please upload vobd.log file and vmkernel files from /var/run/log.
You will find similar files with vmkernel.01.gz upload all files.
Thanks & Regards
Arjun Dooti
Last time I believe you have configured core dump partition. Verify /var/core/log folder, If you notice any core files let me know other wise follow the below KB and extract the core dump and let me know
Is esxi configured with VSAN ?
Seems to be Local disk error in the vmkernel file. Run intensive Hardware Diagnostic test on the server and open ticket with HW vendor.
If Core dump files exist let me know I will send you next steps.
reboot Time
2017-01-28T18:15:50.613Z: [UserLevelCorrelator] 49090900us: [esx.audit.host.boot] Host has booted.
2017-01-28T18:15:16.430Z: [netCorrelator] 14908101us: [esx.audit.net.firewall.port.hooked] Port vmk0 is now protected by Firewall.
2017-01-28T18:15:16.430Z: An event (esx.audit.net.firewall.port.hooked) could not be sent immediately to hostd; queueing for retry.
2017-01-28T18:15:20.625Z: [scsiCorrelator] 19102463us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba1:C2:T0:L0 changed state from dead
2017-01-28T18:15:20.638Z: [scsiCorrelator] 19115806us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba0:C0:T0:L0 changed state from dead
2017-01-28T18:15:25.114Z: [GenericCorrelator] 23591060us: [vob.user.coredump.configured2] At least one coredump target is enabled.
2017-01-28T18:15:25.114Z: [UserLevelCorrelator] 23591060us: [vob.user.coredump.configured2] At least one coredump target is enabled.
2017-01-28T18:15:25.114Z: [UserLevelCorrelator] 23591568us: [esx.clear.coredump.configured2] At least one coredump target has been configured. Host core dumps will be saved.
2017-01-28T18:15:25.114Z: An event (esx.clear.coredump.configured2) could not be sent immediately to hostd; queueing for retry.
2017-01-28T18:15:31.507Z: [GenericCorrelator] 29984024us: [vob.user.dcui.enabled] The DCUI has been enabled
2017-01-28T18:15:31.507Z: [UserLevelCorrelator] 29984024us: [vob.user.dcui.enabled] The DCUI has been enabled
2017-01-28T18:15:31.507Z: [UserLevelCorrelator] 29984554us: [esx.audit.dcui.enabled] The DCUI has been enabled.
2017-01-28T18:15:31.507Z: An event (esx.audit.dcui.enabled) could not be sent immediately to hostd; queueing for retry.
2017-01-28T18:15:50.613Z: [UserLevelCorrelator] 49090465us: [vob.user.host.boot] Host has booted.
2017-01-28T18:15:50.613Z: [GenericCorrelator] 49090465us: [vob.user.host.boot] Host has booted.
2017-01-28T18:15:50.613Z: [UserLevelCorrelator] 49090900us: [esx.audit.host.boot] Host has booted.
Thanks & Regards
Arjun Dooti
Hi, Arjun
There is no VSAN. I cannot find the core dump on the partition either, it was not generated.
Do you mean the hardware diagnostic tools from server firmware?