VMware Cloud Community
cmartins88
Contributor
Contributor

ESXi crash every week

Hi, I'm appealing to your experience, because I'm desperate! On the last year, i have an esxi machine restarting randomly but at least twice a month...it starts on esxi 6.5, but now on 6.7 keeps the issue. Before put all the machine on garbage, I'd like you to see the logs, and see something I have not seen. I cut logs right before crash, and after reboot. Thanks

== vmkwarning ==

[...]

0:00:00:05.879 cpu0:2097152)WARNING: VMKAcpi: 318: \_SB_.PC00.LPC0.TMR_: skipping GSIV 0 conflict

0:00:00:05.938 cpu0:2097152)WARNING: Chipset: 396: Bus 4 (03) is already defined

2019-07-03T07:28:46.146Z cpu36:2097958)WARNING: ScsiPath: 8915: Adapter Invalid does not exist

2019-07-03T07:28:46.146Z cpu22:2097960)WARNING: PCI: 1209: 0000:00:14.0 is nameless

2019-07-03T07:28:57.472Z cpu58:2097943)WARNING: Failed to init interrupt.

2019-07-03T07:28:57.621Z cpu58:2097943)WARNING: Failed to init interrupt.

2019-07-03T07:28:58.155Z cpu11:2098053)WARNING: etherswitch: PortCfg_ModInit:910: Skipped initializing etherswitch portcfg for VSS to use cswitch and portcfg module

2019-07-03T07:29:01.900Z cpu50:2098197)WARNING: FBFT not enabled

2019-07-03T07:31:52Z mark: storage-path-claim-completed

2019-07-03T07:29:10.846Z cpu37:2097888)WARNING: NFS: 1227: Invalid volume UUID 5b0fb779-1e819ada-6579-ac1f6b0a8e14

2019-07-03T07:29:10.904Z cpu37:2097888)WARNING: NFS: 1227: Invalid volume UUID 5d11e186-2192402e-d957-ac1f6b0a8e14

2019-07-03T07:29:10.942Z cpu37:2097888)WARNING: NFS: 1227: Invalid volume UUID 5d11e19a-1c628786-d349-ac1f6b0a8e14

2019-07-03T07:31:49.194Z cpu38:2099014)WARNING: APEI: 319: Could not initialize EINJ

2019-07-03T07:31:54.867Z cpu37:2099342)WARNING: NTPClock: 1561: system clock synchronized to upstream time servers

=============

==== vmkernel ====

[...]

2019-07-03T07:13:28.568Z cpu46:2142585)nvme_SetupPrps: prp2, next prp not a page boundary:2d581fbe10 completed:512 remaining len:1536

2019-07-03T07:13:35.632Z cpu26:2097977)NMP: nmp_ThrottleLogForDevice:3781: Cmd 0x93 (0x459ade55af80, 2142588) to dev "t10.NVMe____INTEL_SSDPEDMX012T7_CVPF725100W11P2JGN__00000001" on path "vmhba2:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5$

2019-07-03T07:13:35.632Z cpu26:2097977)ScsiDeviceIO: 3082: Cmd(0x459ade55af80) 0x93, CmdSN 0x7f5d19 from world 2142588 to dev "t10.NVMe____INTEL_SSDPEDMX012T7_CVPF725100W11P2JGN__00000001" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2019-07-03T07:13:35.957Z cpu36:2142585)nvme_SetupPrps: prp2, next prp not a page boundary:2d581fbe10 completed:512 remaining len:1536

2019-07-03T07:13:56.987Z cpu59:2097995)NMP: nmp_ThrottleLogForDevice:3781: Cmd 0x93 (0x459a9f390340, 2097694) to dev "t10.NVMe____INTEL_SSDPEDME800G4_CVMD434500FY800BGN__00000001" on path "vmhba5:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5$

2019-07-03T07:13:56.987Z cpu59:2097995)ScsiDeviceIO: 3082: Cmd(0x459a9f390340) 0x93, CmdSN 0x28a1bd from world 2097694 to dev "t10.NVMe____INTEL_SSDPEDME800G4_CVMD434500FY800BGN__00000001" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2019-07-03T07:14:09.802Z cpu46:2098954)SunRPC: 1099: Destroying world 0x20b2ae

2019-07-03T07:14:17.021Z cpu59:2097995)ScsiDeviceIO: 3082: Cmd(0x459aa29f4f00) 0x93, CmdSN 0x28a1d9 from world 2140248 to dev "t10.NVMe____INTEL_SSDPEDME800G4_CVMD434500FY800BGN__00000001" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2019-07-03T07:14:53.801Z cpu15:2098071)NMP: nmp_ResetDeviceLogThrottling:3575: last error status from device t10.NVMe____INTEL_SSDPEDME800G4_CVMD434500FY800BGN__00000001 repeated 1 times

2019-07-03T07:15:12.803Z cpu46:2098954)SunRPC: 1099: Destroying world 0x20b2b0

2019-07-03T07:16:14.800Z cpu42:2098954)SunRPC: 1099: Destroying world 0x20b2bc

2019-07-03T07:17:15.801Z cpu42:2098954)SunRPC: 1099: Destroying world 0x20b2be

2019-07-03T07:18:16.799Z cpu36:2098954)SunRPC: 1099: Destroying world 0x20b2c0

2019-07-03T07:18:58.874Z cpu12:2097986)NMP: nmp_ThrottleLogForDevice:3781: Cmd 0x93 (0x459aa29511c0, 2140248) to dev "t10.NVMe____INTEL_SSDPEDKE020T7_BTLE72920DJ62P0IGN__00000001" on path "vmhba3:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5$

2019-07-03T07:18:58.874Z cpu12:2097986)ScsiDeviceIO: 3082: Cmd(0x459aa29511c0) 0x93, CmdSN 0x4bc900 from world 2140248 to dev "t10.NVMe____INTEL_SSDPEDKE020T7_BTLE72920DJ62P0IGN__00000001" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2019-07-03T07:19:08.740Z cpu41:2142585)nvme_SetupPrps: prp2, next prp not a page boundary:252fb7c5a0 completed:512 remaining len:3584

2019-07-03T07:19:17.797Z cpu40:2098954)SunRPC: 1099: Destroying world 0x20b2c2

2019-07-03T07:20:03.806Z cpu36:2142585)nvme_SetupPrps: prp2, next prp not a page boundary:252fb7c5a0 completed:512 remaining len:2048

2019-07-03T07:20:18.795Z cpu40:2098954)SunRPC: 1099: Destroying world 0x20b2c4

2019-07-03T07:21:19.794Z cpu56:2098954)SunRPC: 1099: Destroying world 0x20b2d0

2019-07-03T07:22:04.930Z cpu16:2142588)nvme_SetupPrps: prp2, next prp not a page boundary:252fb7c5a0 completed:512 remaining len:2560

2019-07-03T07:22:21.793Z cpu41:2098954)SunRPC: 1099: Destroying world 0x20b2d2

2019-07-03T07:23:22.792Z cpu42:2098954)SunRPC: 1099: Destroying world 0x20b2d4

2019-07-03T07:23:51.611Z cpu2:2097986)NMP: nmp_ThrottleLogForDevice:3781: Cmd 0x93 (0x459aa55bb400, 2140248) to dev "t10.NVMe____INTEL_SSDPEDKE020T7_BTLE72920DJ62P0IGN__00000001" on path "vmhba3:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 $

2019-07-03T07:23:51.611Z cpu2:2097986)ScsiDeviceIO: 3082: Cmd(0x459aa55bb400) 0x93, CmdSN 0x4bc9e6 from world 2140248 to dev "t10.NVMe____INTEL_SSDPEDKE020T7_BTLE72920DJ62P0IGN__00000001" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

2019-07-03T07:24:24.793Z cpu42:2098954)SunRPC: 1099: Destroying world 0x20b2d6

2019-07-03T07:24:40.246Z cpu39:2142585)nvme_SetupPrps: prp2, next prp not a page boundary:2d581fbe10 completed:512 remaining len:1536

VMB: 66: Reserved 4 MPNs starting @ 0x4a0

VMB: 113: mbMagic: 1badb005, mbInfo 0x600000

VMB: 106: Changed PAT MSR from 0x7040600070406 to 0x7010600070106

VMB_ACPI: 597: No SPCR table found.

VMB_SERIAL: 264: Serial port set to default configuration.

VMB_MEMMAP: 2317: memmap[0]: addr 0, len 9b000, type 1

VMB_MEMMAP: 2317: memmap[1]: addr 9b000, len 5000, type 2

VMB_MEMMAP: 2317: memmap[2]: addr e0000, len 20000, type 2

VMB_MEMMAP: 2317: memmap[3]: addr 100000, len 6a61a000, type 1

VMB_MEMMAP: 2317: memmap[4]: addr 6a71a000, len 2100000, type 2

VMB_MEMMAP: 2317: memmap[5]: addr 6c81a000, len 17c000, type 1

VMB_MEMMAP: 2317: memmap[6]: addr 6c996000, len ab7000, type 4

VMB_MEMMAP: 2317: memmap[7]: addr 6d44d000, len 1efb000, type 2

VMB_MEMMAP: 2317: memmap[8]: addr 6f348000, len 4b8000, type 1

VMB_MEMMAP: 2317: memmap[9]: addr 6f800000, len 20800000, type 2

VMB_MEMMAP: 2317: memmap[10]: addr fd000000, len 1800000, type 2

VMB_MEMMAP: 2317: memmap[11]: addr fed20000, len 25000, type 2

VMB_MEMMAP: 2317: memmap[12]: addr ff000000, len 1000000, type 2

VMB_MEMMAP: 2317: memmap[13]: addr 100000000, len 3f80000000, type 1

[...]

0:00:00:22.294 cpu0:2097152)VMKernel loaded successfully.

2019-07-03T07:28:31.388Z cpu24:2097301)ScsiCore: 175: Starting taskMgmt watchdog world 2097301

2019-07-03T07:28:31.388Z cpu12:2097302)ScsiCore: 175: Starting taskMgmt watchdog world 2097302

2019-07-03T07:28:31.388Z cpu30:2097737)VSCSI: 2974: Starting reset watchdog world 2097737

2019-07-03T07:28:31.388Z cpu28:2097736)VSCSI: 2776: Starting reset handler world 2097736/1

2019-07-03T07:28:31.388Z cpu61:2097803)ScsiCore: 104: Starting taskmgmt handler world 2097803/1

2019-07-03T07:28:31.388Z cpu46:2097804)ScsiCore: 104: Starting taskmgmt handler world 2097804/1

2019-07-03T07:28:31.393Z cpu0:2097152)Power: 1545: Current power management policy was set to "Balanced"

2019-07-03T07:28:31.396Z cpu0:2097152)Boot: 572: 24815 symbols, 566275

2019-07-03T07:28:35.426Z cpu0:2097152)BootModule: 1066: Loading kernel boot module chardevs.b00

2019-07-03T07:28:35.426Z cpu0:2097152)Loading module chardevs ...

2019-07-03T07:28:35.427Z cpu0:2097152)Elf: 2101: module chardevs has license VMware

2019-07-03T07:28:35.428Z cpu0:2097152)TTY: 262: Allocated 4 out of 4 ttys

2019-07-03T07:28:35.429Z cpu0:2097152)SerialDev: 237: Added serial device uart0, index 15, id 0

2019-07-03T07:28:35.429Z cpu0:2097152)SerialDev: 237: Added serial device uart1, index 16, id 1

2019-07-03T07:28:35.429Z cpu0:2097152)Mod: 4962: Initialization of chardevs succeeded with module ID 1.

2019-07-03T07:28:35.429Z cpu0:2097152)chardevs loaded successfully.

2019-07-03T07:28:35.430Z cpu0:2097152)BootModule: 1066: Loading kernel boot module user.b00

2019-07-03T07:28:35.430Z cpu0:2097152)Loading module user ...

2019-07-03T07:28:35.433Z cpu0:2097152)Elf: 2101: module user has license VMware

2019-07-03T07:28:35.476Z cpu0:2097152)Mod: 4962: Initialization of user succeeded with module ID 2.

2019-07-03T07:28:35.476Z cpu0:2097152)user loaded successfully.

2019-07-03T07:28:35.476Z cpu0:2097152)BootModule: 1066: Loading kernel boot module procfs.b00

2019-07-03T07:28:35.476Z cpu0:2097152)Loading module procfs ...

2019-07-03T07:28:35.477Z cpu0:2097152)Elf: 2101: module procfs has license VMware

2019-07-03T07:28:35.477Z cpu0:2097152)FSS: 1147: Registered fs procfs, module 3, fsTypeNum 0xdff6

2019-07-03T07:28:35.477Z cpu0:2097152)Mod: 4962: Initialization of procfs succeeded with module ID 3.

2019-07-03T07:28:35.477Z cpu0:2097152)procfs loaded successfully.

Reply
0 Kudos
6 Replies
daphnissov
Immortal
Immortal

What hardware is ESXi running on?

Reply
0 Kudos
cmartins88
Contributor
Contributor

esxi10.png

The hard drives are intel SSD DC Series PCIe.

Reply
0 Kudos
RajeevVCP4
Expert
Expert

What is back end storage

is it EMC XtremIO storage arrays ?

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
Reply
0 Kudos
cmartins88
Contributor
Contributor

thanks for your answer, but i don't know if I understand it well. all storage that are used are independent three intel ssd dc pcie, mapped witch one on a datastore.

Reply
0 Kudos
daphnissov
Immortal
Immortal

First thing I'd try is to bring this host fully up-to-date which, I see since it's connected to a vCenter, means you must update the vCenter first. Also make sure this Supermicro host has updated firmware and BIOS.

Reply
0 Kudos
RajeevVCP4
Expert
Expert

What is ESXi build number

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
Reply
0 Kudos