Highlighted
Contributor
Contributor

bootstop: Host has booted / unexpected reboot : how to debug ?

Hello guys,

For some time now I've been having problems with unexpected rebooting on my esxi.

I have looked at the article :

https://kb.vmware.com/s/article/1019238#:~:text=If%20your%20VMware%20ESXi%20host,faulty%20components....

Without finding anything special.

I guess the last reboot is between : 2020-09-19T08:42:24.481Z and 2020-09-19T08:51:14.365Z2020-09-19T08:42:24.481Z

I don't see anything in "hostd.log" :

2020-09-19T08:41:51.482Z verbose hostd[2099022] [Originator@6876 sub=Default opID=esxui-8c0c-17bc user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdatesEx'

2020-09-19T08:41:51.560Z verbose hostd[2099058] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 7. Sent notification immediately.

2020-09-19T08:41:51.561Z verbose hostd[2099131] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 7. Sent notification immediately.

2020-09-19T08:41:53.849Z verbose hostd[2099059] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest, 7. Sent notification immediately.

2020-09-19T08:41:53.849Z verbose hostd[2099059] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: summary.guest, 7. Sent notification immediately.

2020-09-19T08:41:54.484Z verbose hostd[2099023] [Originator@6876 sub=Default opID=esxui-1cf5-17bd user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdatesEx'

2020-09-19T08:42:20.553Z verbose hostd[2099518] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 2. Sent notification immediately.

2020-09-19T08:42:20.615Z verbose hostd[2099024] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest, 2. Sent notification immediately.

2020-09-19T08:42:20.615Z verbose hostd[2099024] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: summary.guest, 2. Sent notification immediately.

2020-09-19T08:42:21.485Z verbose hostd[2099059] [Originator@6876 sub=Default opID=esxui-ae15-17be user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdatesEx'

2020-09-19T08:42:21.561Z verbose hostd[2099062] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 7. Sent notification immediately.

2020-09-19T08:42:21.561Z verbose hostd[2099515] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 7. Sent notification immediately.

2020-09-19T08:42:23.833Z verbose hostd[2099517] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest, 7. Sent notification immediately.

2020-09-19T08:42:23.833Z verbose hostd[2099517] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: summary.guest, 7. Sent notification immediately.

2020-09-19T08:42:24.481Z verbose hostd[2099522] [Originator@6876 sub=Default opID=esxui-d074-17bf user=root] AdapterServer: target='vmodl.query.PropertyCollector:ha-property-collector', method='waitForUpdatesEx'

2020-09-19T08:51:14.365Z - time the service was last started, Section for VMware ESX, pid=2098958, version=6.7.0, build=16316930, option=Release

2020-09-19T08:51:14.365Z warning -[2098958] [Originator@6876 sub=Default] Failed to load vsansvc configuration file: N7Vmacore22AlreadyExistsExceptionE(Already Exists)

--> [context]zKq7AVICAgAAAAL6+AAJLQAALE42bGlidm1hY29yZS5zbwAAsL4bAL6dFwCeShcBbX7JaG9zdGQAAYc9yQGbs2ICfRkCbGliYy5zby42AAGt1mI=[/context]

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Default] Supported VMs 334

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Handle checker] Setting system limit of 3740

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Handle checker] Set system limit to 3740

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Default] Setting malloc mmap threshold to 32 k

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Default] getrlimit(RLIMIT_NPROC): curr=4096 max=8192

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Default] Glibc malloc guards disabled.

2020-09-19T08:51:14.365Z info -[2098958] [Originator@6876 sub=Default] Initialized SystemFactory

Same for the "vmksummary.log". :

2020-09-19T07:00:00Z heartbeat: up 3d20h57m50s, 2 VMs; [] []

2020-09-19T08:00:00Z heartbeat: up 3d21h57m50s, 2 VMs; [] []

2020-09-19T08:51:23Z bootstop: Host has booted

2020-09-19T09:00:01Z heartbeat: up 0d0h12m8s, 1 VM; [] []

What else can I look at to understand and solve this?

The host is a Dell R710, latest bios / firmware updated.

Current esxi version: 6.7.0 Update 3 (Build 16316930) --> (same reboot problem with previous versions)

I have few VMs on this esxi, it happens when there are only 2 VMs started.

An idea ?

Thanks !

Best regards,

Bob

NB : Logs attached

0 Kudos
6 Replies
Highlighted
User Moderator
User Moderator

ESXi usually stops with a POSD on errors, i.e. doesn't reboot automatically.

Unless already done, check the logs within iDRAC to find out whether they contain any hints about the reason for the reboot.


André

0 Kudos
Highlighted
Contributor
Contributor

Absolutely nothing on the iDRAC side.

I just changed the PSU. Just in case Smiley Happy

Wait and see...

0 Kudos
Highlighted
Enthusiast
Enthusiast

Looks like a Hardware issue

from the vmkernel.log  ->

2020-09-19T08:42:07.413Z cpu12:2097897)DVFilter: 6054: Checking disconnected filters for timeouts

2020-09-19T08:42:14.415Z cpu12:2098339)SunRPC: 1099: Destroying world 0x20a4eb

VMB: 66: Reserved 4 MPNs starting @ 0x4a0

VMB: 113: mbMagic: 1badb005, mbInfo 0x600000

VMB: 106: Changed PAT MSR from 0x7040600070406 to 0x7010600070106

VMB_SERIAL: 264: Serial port set to default configuration.

KB - VMware Knowledge Base

Determine if the VMware ESX host hardware abruptly rebooted. When the VMware ESX host hardware abruptly reboots, it generates a series of events similar to:

localhost logger: (1265803308) hb: vmk loaded, 1746.98, 1745.148, 0, 208167, 208167, 0, vmware-h-59580, sfcbd-7660, sfcbd-3524
localhost vmkhalt: (1268149486) Starting system...
localhost logger: (1268149540) loaded VMkernel

If your VMware ESX host has experienced an outage and it was not the result of a kernel error, deliberate reboot, or shut down, then the physical hardware may have abruptly restarted on its own. Hardware is known to reboot abruptly due to power outages, faulty components, and heating issues. To investigate further, engage the hardware vendor.

Highlighted

Hey, hope you are doing fine

Usually when a ESXi host reboots with "no reason" is because of a PSOD (purple screen of death) mostly those are caused because of a hardware or driver faillure (Same as Windows's)

Do you have some sort of Management console (iLO/iDRAC/BMC)?
Usually those kind of consoles show you if the host was rebooted via hardware or if you have a failing piece of hardware

Another thing you should check is if you have ESXi coredumps configured and available

this article might help VMware Knowledge Base

In case you don't, configure them and check the core dump's next time server fails.

Please visit my blog at https://nachogonzalez.com.ar
0 Kudos
Highlighted
Contributor
Contributor

Hi,

Many thanks for your answer.

According to the iDrac there is no specific problem.

But this is an idrac 6 and the diag is really limited!

I also think about hardware.

I've been activated dump on Esxi, and I've also changed the server power supply.

I'm monitoring...

I'll keep my eyes open Smiley Happy

0 Kudos
Highlighted
Commander
Commander

Regardless of disk dump or net dump configuration, not bad to investigate the following log files too: vmkernel.log and vmksummary.log

Because you may not understand the dump information easily Smiley Wink

Please mark my comment as the Correct Answer if this solution resolved your problem
0 Kudos