My Dell PowerEdge 2900 III server restarts every few weeks. Previous reasons are due to a UPS battery overload. The problem I have is that yesterday, my server restarted but I dont see any events in my UPS.
How do I narrow down the reason on why my server restarted?
Which log files do I review?
How do I know from where the restart was initiated (ESX, VCenter)?
Would a syslog server help me in the future to easily narrow down the problem? Which version do you recommend?
Thanks alot guys.
Start with the host. look for indications of issues with system in
Most ESX host restarts are caused by an issue at the host level whether hardware or software. You should concentrate you investigation on the host itself.
I have not worked with Dells for some time but I presume that they have Server Agent software similar to HP systems that can help troubleshoot hardware failures and can integrate with SNMP to provide extended monitoring (Although the HP agents can cause a fair amount of trouble on a systems, so it is a bit of a trade off).
If you find no errors or indications of cause of failure you may want to run a memory test on system () or if you cannot afford the downtime just swap out the RAM. Bad DIMMS often cause sudden failures that generate little in the way of logging.
Also update your firmware on system hardware.
Finally if you have any third party agents running on the host (i.e. Backup agents) then check their logs and also check support pages of app for list of any known bugs.
I would recommend you installing Dell Open Manage application on your ESX host.
We have it installed on 18 hosts so far and it does not give us any issues, we find it extremely useful as we pull the logs int Dell Management station, as well when you need to proceed with warranty
claims Dell needs this application installed.
Install DRAC (Dell Remote Access Card) module as well.
Then you will be able to review all hardware logs using web browser,
You will need to open ESX firewall for this port.
Here is a documentation from Dell you can follow, you need to review page 10,
.................Please rate if helpful................