VMware Cloud Community
ekimneems
Contributor
Contributor

Newbie here - how to pinpoint where a crash is happening

Hey all,

I recently inherited taking over the support of an ESXi 4 server with two virtual machines and I'm relatively new. I have more than enough resources available for these two, however for the third time now one of the VMs went down and was unreachable. The other VM, after this happens, slows down to a crawl. The only way I'm able to fix this is to reboot the ESXi server.

How can I pinpoint exactly what's causing this? Which system logs should I be looking at? The ones I downloaded through the vSphere Client didn't seem to have any information from before the reboot (it's as if a reboot removes all previous logs). Is this true, and if so is there a way for me to make sure it keeps the logs?

Thanks so much.

0 Kudos
4 Replies
Techstarts
Expert
Expert

  1. check VM's log first,if this is Windows check if there is memory created. If no memory dump using perfmon tool of WS2008
  2. What is the relation between VM1 and VM2?
  3. Check what performance chart of ESX when these VM's slow down.
With Great Regards,
0 Kudos
ekimneems
Contributor
Contributor

1. Both are Windows servers. The one in question is running WS2003 so I guess I can't use perfmon. I don't see anything in the Event Viewer from the time of the crash to indicate what it might be, and there is nothing in c:\Windows\Minidump either. My guess was that some program was causing CPU usage to max out for a long period of time, which then brought down the other server to a crawl and forced the restart. But I'm not sure how to verify this is the truth.

2. The one crashing is just a file server for proprietary salon software (which is poorly written and certainly could be the cause). The other one is the primary domain controller and Exchange server. I'm not sure about VM-specific relationships if any, as I am new to ESXi.

3. My issue is, this is in a production environment, so when this happens we have to immediately reboot to get the salon up and running again (no time to necessarily review logs). I manage this remotely, too, so it can be difficult. I was hoping there was some way where I can get the logs from when it happens but it appears they are all deleted after I reboot. Or am I just looking in the wrong place?

0 Kudos
logiboy123
Expert
Expert

ESX stores its logs in memory, when you reboot you lose your logs. Consider creating a vMA syslog server to house these logs for you. I have created a walk through for this purpose.

http://vrif.blogspot.com/2011/07/configure-vmware-vma-as-syslog-server.html

Once your system is logging you can browse to the log files using WinSCP to the syslog server directories.

Regards,

Paul

0 Kudos
ekimneems
Contributor
Contributor

Thanks for that. It seems like it will take me some time to get that configured.

In the meantime, I did get this error message:

msg.hbacommon.corruptredo:The RedoLog of Server-000001.vmdk has been detected to be corrupt. The virtual machine needs to be powered off. If the problem still persists, you need to discard the RedoLog.

Could this have something to do wtih it?

0 Kudos