VMware Cloud Community
ZkZk
Contributor
Contributor

ESX memory failure

Hi everybody,

I suaspect there's a memory failure (on 1 memory module of 4) on an ESXserver which is far away from me.

Before opening a hardware ticket I should find out the exact origin of the proble.

Is there any command to check hardware in my ESX 3.0.1?

Thanks in advance for answering.

Bye,

Maurizio

Reply
0 Kudos
6 Replies
harryc
Enthusiast
Enthusiast

Have you checked /var/log/messages for error messages ?

Reply
0 Kudos
Goodspd
Enthusiast
Enthusiast

Hi,

What's the server brand? Normally brands like HP, IBM and others have hardware diagnostic tools that you can run.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Reply
0 Kudos
ZkZk
Contributor
Contributor

Thanks to both of you

Yes, i read the message log but it said nothing interesting; I try now to look inside the other logs. The problem is that I need hardware configuration to find out which bank fails.

As regards the tools (it is an IBM Xserver 3950) the problem is that I can't switch off the host. I've to find out the problem from command line, and the question is is it possible?

Bye,

Maurizio

Reply
0 Kudos
harryc
Enthusiast
Enthusiast

Probing memory is usually done from the BIOS level ( OpenBootProm on SPARC ). I would be very hesitant to poke the memory on a running system that I wanted to stay running.

If you do find a tool please let us know, it would be very convienient to be able to test this without bringing down ESX ( after moving all the important VMs off ).

In the SPARC world I have never seen a memory error that was not reflected in the "messages" log.

Reply
0 Kudos
mittim12
Immortal
Immortal

The last time we had some memory issues there were some entires in the messages log and vmkernel log. If you don't have any type of management software that you can run online then the only course of action is to reboot and run something like memtest.

If you found this or any other post helpful please consider the use of the Helpful/Correct buttons to award points

Goodspd
Enthusiast
Enthusiast

If you don't have any kind of error entries in logs, then you've to evacuate all vm's to another host if you've this possibility or plan a downtime to check with your manufactor tools/ 3rd party tools problem with your hardware.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!