VMware Cloud Community
Adrian0
Contributor
Contributor
Jump to solution

vSphere ESXi 6.7 unexpected reboot

Hello,

I've recently installed a new vSphere host. There are several VMs running fine, but sometimes the host reboots out of a sudden. There is no PSOD on the console. From the logs I could see that it happen sometimes between every 2 days and up to 3 times/day. Number of running VMs or load does not seem to play any role.

The host is running build 11675023. What can I do to capture the reboot event? The logs are not telling me anything.

0 Kudos
1 Solution

Accepted Solutions
minivlab
Enthusiast
Enthusiast
Jump to solution

Sounds like a hardware issue more than an ESXi problem (unless everything gracefully shutsdown and reboots).

In the case of hardware I would check power supplies and operating temps. CPU could be overheating causing the host to shutdown/reboot. A few questions would be...

- What server hardware are you running on?

- Is the hardware on the HCL?

- Does your server have lights out/out of band management (Dell has iDRAC, HP has iLO, etc.)? Check those logs if so

View solution in original post

0 Kudos
7 Replies
a_p_
Leadership
Leadership
Jump to solution

Welcome to the Community,

most server vendors have integrated logs (independent on the operating system) through e.g. iLO, iDRAC, ... which may be helpful in such a case.

André

0 Kudos
pragg12
Hot Shot
Hot Shot
Jump to solution

Hi,

Can you give more information regarding the underlying hardware ?

Is the hardware certified by vendor to run ESXi 6.7 ?

Has the hardware vendor provided any best practices or BIOS settings for VMware ESXi 6.7 ? If Yes, have same been configured ?

Are the firmware and driver combinations for hardware components (HBA/storage controller/NIC) in line with VMware HCL ?

You can check in ESXi logs; hostd, vmkernel, vmkwarning, vobd at /var/run/log location. If logs are getting wiped after ESXi reboot, you can configure syslog server to push ESXi logs to. VM KB for same here.

Consider marking this response as "Correct" or "Helpful" if you think my response helped you in any way.
0 Kudos
minivlab
Enthusiast
Enthusiast
Jump to solution

Sounds like a hardware issue more than an ESXi problem (unless everything gracefully shutsdown and reboots).

In the case of hardware I would check power supplies and operating temps. CPU could be overheating causing the host to shutdown/reboot. A few questions would be...

- What server hardware are you running on?

- Is the hardware on the HCL?

- Does your server have lights out/out of band management (Dell has iDRAC, HP has iLO, etc.)? Check those logs if so

0 Kudos
Adrian0
Contributor
Contributor
Jump to solution

Thanks for the welcome and all the suggestions.

The server is a self made system with two Xeon Silver 4114 and a Asus Z11PA-D8 mainboard. It has a ASMB9-iKVM module integrated witch I've configured to send syslog data to another hardware server. The same syslog server also receives data from the ESXi host. I thought that should make it easy to find out what was happening on the host prior to the crash/reboot, but it's not. At least I counld't figure it out. Maybe you guys can read the log better?

2019-04-01 13:36:54    Local4.Info    192.168.0.44    2019-04-01T11:36:09.614Z vm-server-iv.dc01.local Hostd: info hostd[2099821] [Originator@6876 sub=Solo.HTTP server /host user=root] Sent OK response for GET /host/vmkwarning.log

2019-04-01 13:36:54    Local4.Debug    192.168.0.44    2019-04-01T11:36:10.021Z vm-server-iv.dc01.local Rhttpproxy: verbose rhttpproxy[2100709] [Originator@6876 sub=Proxy Req 00104] Resolved endpoint : [N7Vmacore4Http16LocalServiceSpecE:0x000000a58710a1c0] _serverNamespace = /vpxa action = Allow _port = 8089

2019-04-01 13:36:55    Local4.Debug    192.168.0.44    2019-04-01T11:36:10.461Z vm-server-iv.dc01.local Rhttpproxy: verbose rhttpproxy[2099149] [Originator@6876 sub=Proxy Req 00098] Resolved endpoint : [N7Vmacore4Http16LocalServiceSpecE:0x000000a587103830] _serverNamespace = /sdk action = Allow _port = 8307

2019-04-01 13:36:55    Local4.Debug    192.168.0.44    2019-04-01T11:36:10.981Z vm-server-iv.dc01.local Rhttpproxy: verbose rhttpproxy[2100708] [Originator@6876 sub=Proxy Req 00091] Resolved endpoint : [N7Vmacore4Http16LocalServiceSpecE:0x000000a5871035b0] _serverNamespace = /host action = Allow _port = 8309

2019-04-01 13:36:55    Local4.Info    192.168.0.44    2019-04-01T11:36:10.992Z vm-server-iv.dc01.local Hostd: info hostd[2099823] [Originator@6876 sub=Solo.HTTP server /host user=root] Sent OK response for GET /host/vmkeventd.log

2019-04-01 13:36:57    Local4.Error    192.168.0.44    2019-04-01T11:36:13.267Z vm-server-iv.dc01.local Hostd: error hostd[2099834] [Originator@6876 sub=Hostsvc.NsxSpecTracker] Object not found/hostspec disabled

2019-04-01 13:36:59    User.Notice    192.168.0.44    Injector: Sleeping!

vm-server-iv.dc01.local sdrsInjector:

2019-04-01 13:37:01    Cron.Info    192.168.0.66    1 2019-04-01T13:37:01.728046+02:00 192 CROND 30805 - -  (root) CMD (. /etc/profile.d/VMware-visl-integration.sh; /usr/lib/applmgmt/backup_restore/scripts/SchedulerCron.py >>/var/log/vmware/applmgmt/backupSchedulerCron.log 2>&1)

2019-04-01 13:37:01    Cron.Info    192.168.0.66    1 2019-04-01T13:37:01.735139+02:00 192 CROND 30806 - -  (root) CMD ( test -x /usr/sbin/vpxd_periodic && /usr/sbin/vpxd_periodic >/dev/null 2>&1)

2019-04-01 13:37:02    Local4.Debug    192.168.0.44    2019-04-01T11:36:17.443Z vm-server-iv.dc01.local Rhttpproxy: verbose rhttpproxy[2100708] [Originator@6876 sub=Proxy Req 00084] Resolved endpoint : [N7Vmacore4Http16LocalServiceSpecE:0x000000a58710a1c0] _serverNamespace = /vpxa action = Allow _port = 8089

2019-04-01 13:37:02    Local4.Info    192.168.0.44    2019-04-01T11:36:17.444Z vm-server-iv.dc01.local Vpxa: info vpxa[2100170] [Originator@6876 sub=vpxLro opID=PollQuickStatsLoop-74856499-f1] [VpxLRO] -- BEGIN lro-210 -- vpxa -- vpxapi.VpxaService.fetchQuickStats -- 52f46758-89a9-823b-c2af-9541947c6b40

2019-04-01 13:37:02    Local4.Info    192.168.0.44    2019-04-01T11:36:17.445Z vm-server-iv.dc01.local Vpxa: info vpxa[2100170] [Originator@6876 sub=vpxLro opID=PollQuickStatsLoop-74856499-f1] [VpxLRO] -- FINISH lro-210

2019-04-01 13:37:02    User.Info    192.168.0.66    1 2019-04-01T13:37:03.051189+02:00 192 updatemgr - - -  2019-04-01T13:37:03:051Z 'Activation' 140668836292352 INFO  [activationValidator, 368] Leave Validate. Succeeded for integrity.VcIntegrity.retrieveHostIPAddresses on target: Integrity.VcIntegrity

2019-04-01 13:37:02    User.Info    192.168.0.66    1 2019-04-01T13:37:03.058737+02:00 192 updatemgr - - -  2019-04-01T13:37:03:051Z 'VcIntegrity' 140668836292352 INFO  [vcIntegrity, 1519] Getting IP Address from host name: 192

2019-04-01 13:37:02    User.Info    192.168.0.66    1 2019-04-01T13:37:03.064899+02:00 192 updatemgr - - -  2019-04-01T13:37:03:064Z 'VcIntegrity' 140668836292352 INFO  [vcIntegrity, 1536] Cannot get IP address for host name: 192

2019-04-01 13:37:03    User.Debug    192.168.0.66    1 2019-04-01T13:37:03.499180+02:00 192 updatemgr - - -  2019-04-01T13:37:03:499Z 'JobDispatcher' 140668863104768 DEBUG  [JobDispatcher, 415] The number of tasks: 0

**********************************

*** I think reset happend here ***

**********************************

2019-04-01 13:37:07    Local0.Warning    192.168.0.45    Apr  1 12:37:07 vm-server-iv-kvm IPMIMain: [640 : 704 WARNING][IPMBIfc.c:727]IPMBIfc.c : Error sending IPMB packet to Slave 0x16

2019-04-01 13:37:12    Local0.Critical    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm IPMIMain: [640 : 734 CRITICAL][PnmTask.c:520]NMAPI.c : Error fetching messages from NM_IPMB_MSG_Q

2019-04-01 13:37:12    Local0.Critical    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm IPMIMain: [640 : 735 CRITICAL][NMAPI.c:152]PnmTask.c : Error fetching messages from NM_RESPONSE_MSG_Q

2019-04-01 13:37:12    Kernel.Warning    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.790000] NCSI(eth1): Link is Down

2019-04-01 13:37:12    Kernel.Warning    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.790000] NCSI(eth1): Unknown Speed and Duplex

2019-04-01 13:37:12    Kernel.Warning    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.800000] NCSI(eth1): Link is Down

2019-04-01 13:37:12    Kernel.Warning    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.800000] NCSI(eth1): Unknown Speed and Duplex

2019-04-01 13:37:12    Kernel.Debug    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.810000] NCSI(eth1): Channel 0.0 Disabled

2019-04-01 13:37:12    Kernel.Debug    192.168.0.45    Apr  1 12:37:12 vm-server-iv-kvm kernel: [1134290.820000] NCSI(eth1): Channel 1.0 Disabled

2019-04-01 13:37:13    Kernel.Info    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm kernel: [1134291.120000] LPC RESET

2019-04-01 13:37:13    Kernel.Warning    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm kernel: [1134291.120000] Reset ioctl unlocked

2019-04-01 13:37:13    Local0.Critical    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 686 CRITICAL][BTIfc.c:68] LPC Reset Occurred

2019-04-01 13:37:13    Local0.Critical    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 726 CRITICAL][SensorEvent/SensorDevice/Sensor.c:1956]Error in getting TLS data 640

*** 20 repetition of previous line

2019-04-01 13:37:13    Local0.Critical    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 726 CRITICAL][SensorEvent/SensorDevice/Sensor.c:1480]Error in getting TLS data 640

2019-04-01 13:37:13    Local0.Warning    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 701 WARNING][IPMBIfc.c:727]IPMBIfc.c : Error sending IPMB packet to Slave 0x16

2019-04-01 13:37:13    Local0.Warning    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 701 WARNING][IPMBIfc.c:727]IPMBIfc.c : Error sending IPMB packet to Slave 0x16

2019-04-01 13:37:13    Local0.Critical    192.168.0.45    Apr  1 12:37:13 vm-server-iv-kvm IPMIMain: [640 : 726 CRITICAL][SensorEvent/SensorDevice/Sensor.c:1956]Error in getting TLS data 640

*** 110 repetition of previous line

2019-04-01 13:37:14    Local0.Critical    192.168.0.45    Apr  1 12:37:14 vm-server-iv-kvm IPMIMain: [640 : 726 CRITICAL][SensorEvent/SensorDevice/Sensor.c:1956]Error in getting TLS data 640

2019-04-01 13:37:14    Local0.Warning    192.168.0.45    Apr  1 12:37:14 vm-server-iv-kvm IPMIMain: [640 : 701 WARNING][IPMBIfc.c:727]IPMBIfc.c : Error sending IPMB packet to Slave 0x16

2019-04-01 13:37:14    Kernel.Debug    192.168.0.45    Apr  1 11:36:29 vm-server-iv-kvm kernel: [1134292.840000] NCSI(eth1): Channel 0.0 Enabled

2019-04-01 13:37:14    Kernel.Debug    192.168.0.45    Apr  1 11:36:29 vm-server-iv-kvm kernel: [1134292.850000] NCSI(eth1): Channel 1.0 Enabled

2019-04-01 13:37:15    Cron.Info    192.168.0.45    Apr  1 11:36:29 vm-server-iv-kvm /usr/sbin/cron[5655]: (CRON) INFO (pidfile fd = 4)

2019-04-01 13:37:15    Cron.Info    192.168.0.45    Apr  1 11:36:29 vm-server-iv-kvm /usr/sbin/cron[5657]: (CRON) STARTUP (fork ok)

2019-04-01 13:37:15    Cron.Info    192.168.0.45    Apr  1 11:36:30 vm-server-iv-kvm /usr/sbin/cron[5657]: (CRON) INFO (Running @reboot jobs)

The message sources are:

192.168.0.44: ESXi Host

192.168.0.45: iKVM

192.168.0.66: VCSA

Hardware issues might be possible. Memtest86 is currently running on the machine. (though its ECC memory)

0 Kudos
Adrian0
Contributor
Contributor
Jump to solution

Hi minivlab,

your guess was right. It was the power supply. After replacing this part the problem disapeared. Thanks a lot!

0 Kudos
serveradminist2
Contributor
Contributor
Jump to solution

good

0 Kudos
serveradminist2
Contributor
Contributor
Jump to solution

when every you face this type issue then you have to featch,IML and DSET log . these log help to diagone the isse this is hardware issue of software.

always remmember that.

0 Kudos