VMware Cloud Community
ewedemeyer
Contributor
Contributor

Help! ESX has unexpectedly rebooted multiple times

Where in the logs can I try and find a solution?

Logs are included

the hardware is a SUN M2 4600 and we have 6 other running fine. 128 GB RA, *8 CPU

0 Kudos
17 Replies
kooltechies
Expert
Expert

Hi,

You should check in /var/log/vmkernel , /var/log/messages , /var/log/vmksummary.

Thanks,

Samir

P.S : If you think that the answer is helpful please consider rewarding points.

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.
0 Kudos
TCronin
Expert
Expert

What hardware do you have it installed on?

Tom Cronin, VCP, VMware vExpert 2009 - Co-Leader Buffalo, NY VMUG

Tom Cronin, VCP, VMware vExpert 2009 - 2021, Co-Leader Buffalo, NY VMUG
0 Kudos
ewedemeyer
Contributor
Contributor

vmksummary attached

0 Kudos
kooltechies
Expert
Expert

Hi,

Gone through the logs and I can see the entries where you got the reboot multiple times. But the other log files are not providing a conclusive proof of what is the problem. Can you check and remember the sequence of events before the reboot that can help in troubleshooting.

Mar 13 00:00:16 vhaiswvsh10 logger: (1236852016) loaded VMkernel

Mar 13 00:25:42 vhaiswvsh10 vmkhalt: (1236853543) Starting system...

Mar 13 00:26:07 vhaiswvsh10 vmkhalt: (1236853567) Rebooting system...

Mar 13 00:29:13 vhaiswvsh10 vmkhalt: (1236853753) Starting system...

Mar 13 00:29:18 vhaiswvsh10 logger: (1236853758) loaded VMkernel

Mar 13 00:45:08 vhaiswvsh10 vmkhalt: (1236854708) Starting system...

Mar 13 00:45:13 vhaiswvsh10 logger: (1236854713) loaded VMkernel

Thanks,

Samir

P.S : If you think that the answer is helpful please consider rewarding points.

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.
0 Kudos
ewedemeyer
Contributor
Contributor

There is basically nothing happening on the box at this point. All vms are down and it still reboots. the only event is that I am tring to scp off one of the vital systems to no avail.

0 Kudos
kooltechies
Expert
Expert

Are you getting any errors at the boot up , I have observed few errors related to iSCSI.

Thanks,

Samir

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.
0 Kudos
ewedemeyer
Contributor
Contributor

There are no boot up errors. However the portion "Restoring S/W iscsi volumes" takes a long time to complete. And at this point in time we are not using iscsi connections.

0 Kudos
ewedemeyer
Contributor
Contributor

No boot up errors. The "Restoring S/W iscsi volumes" takes a very long time. At this point we have no ISCI connections.

0 Kudos
kooltechies
Expert
Expert

Can you disable swisci if you are not using it when you have a chance to login to ESX. Also consider opening a support ticket with VMware for troubleshooting this.

Thanks,

Samir

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.
0 Kudos
kchawk
Contributor
Contributor

Has the server been physically moved? We moved a new server to our dr site after building it and it rebooted daily. Reseat memory, cpus, pci cards and two months of bliss.

0 Kudos
ewedemeyer
Contributor
Contributor

The server has not been moved at all.

0 Kudos
Erik_Zandboer
Expert
Expert

Hi,

I have seen reboots caused by hostd's memory leak. You could checkout my blog on the subject and see if that is your problem:

Visit my blog at

Visit my blog at http://www.vmdamentals.com
0 Kudos
ewedemeyer
Contributor
Contributor

Could you point out to me where and in which log you saw the hostd memory issues?

0 Kudos
mike_laspina
Champion
Champion

Hi,

Looks like she lost connectivity to it's iSCSI datastores, then a rescan was performed.

Mar 11 09:33:34 vhaiswvsh10 watchdog-cimserver: '/var/pegasus/bin/cimserver daemon=false' exited after 117 seconds

Mar 11 09:33:34 vhaiswvsh10 watchdog-cimserver: Executing '/var/pegasus/bin/cimserver daemon=false'

Mar 11 09:33:40 vhaiswvsh10 cimserver: trying to popen /sbin/modprobe edd 2>&1

Mar 11 09:33:40 vhaiswvsh10 cimserver: trying to popen /sbin/modprobe edd 2>&1

Mar 11 09:33:40 vhaiswvsh10 vmware-hostd[1899]: Accepted password for user root from 127.0.0.1

Mar 11 09:33:41 vhaiswvsh10 cimserver: created VICimInstanceBuilder

Mar 11 09:33:41 vhaiswvsh10 cimserver: created VICimMethodMgr

Mar 11 09:35:02 vhaiswvsh10 vmkiscsid[22622]: cannot make connection to 10.208.55.70:3260: Connection refused

Mar 11 09:35:02 vhaiswvsh10 vmkiscsid[22622]: Connection to Discovery Address 10.208.55.70 failed

Mar 11 09:35:03 vhaiswvsh10 vmkiscsid[22622]: cannot make connection to 10.208.55.70:3260: Connection refused

Mar 11 09:35:03 vhaiswvsh10 vmkiscsid[22622]: Connection to Discovery Address 10.208.55.70 failed

Does it still have any connectivity to that datastore?

Did the iSCSI client port get closed locally or was it a remote event?

vExpert 2009

http://blog.laspina.ca/ vExpert 2009
0 Kudos
ewedemeyer
Contributor
Contributor

Actually there were no iscsi datastores connected. The firewall was on and swiscsi was enabled, but there was no connection.

0 Kudos
mike_laspina
Champion
Champion

Ok, Then as already advised disable it for now.

I would also disable the pegasus cimserver

The reset occurs after this event

Mar 13 00:42:13 vhaiswvsh10 cimserver: trying to popen /sbin/modprobe edd 2>&1

Mar 13 00:45:16 vhaiswvsh10 syslogd 1.4.1: restart.

Mar 13 00:45:16 vhaiswvsh10 syslog: syslogd startup succeeded

Every time!

Use

service pegasus stop

vExpert 2009

http://blog.laspina.ca/ vExpert 2009
0 Kudos
bluedrake
Contributor
Contributor

we had a similar problem with one of our esx server, no errors in logs besides what you had. Was a production esx server so did not have time to troubleshoot, or chance it rebooting randomly. We backed up the logs and reinstalled the server.

Went through the logs again but found nothing, after the reinstall the esx server never rebooted again.

0 Kudos