VMware Cloud Community
ccboe
Contributor
Contributor

esx server "freezes"

I have an esx 4.0 server that "freezes" randomly.

When I look at performance I can see blank gaps in all the "switch to" areas.

Don't see any errors or warnings.

Server

Hp dl 380 g6

2 5520 procs

48gb ram

I have another one just like it but 36gb ram with no problems.

Both are running the same version esx.

Kinda looks like hardware, but I get no errors or warnings.

Thanks

Tom

Reply
0 Kudos
6 Replies
JCMorrissey
Expert
Expert

Hi Tom,

Have you any monitoring software eg Openmanage, openview etc on the system? can you check in their logs if there is

some sign of a hardware issue. Also check the iLO homepage for any sign of hardware faults - check the logs in there to see if there is sudden

resets.

A good test (maybe not possible in your environment) is if the freezes are very frequent to leave the machine in the BIOS overnight - if it freezes/reboots by the morning you are definitely looking at hardware but chances are you certainly are by the looks of things.

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/
Reply
0 Kudos
Sreejesh_D
Virtuoso
Virtuoso

check the log file /var/log/message for the events during the host freeze. It can give us details on whats going on at that time.

There are various reasons for a host to enter zombie state. 

Reply
0 Kudos
ccboe
Contributor
Contributor

No monitoring software.

ILO logs look good.

I guess I did not explain well.

The "freeze" only last 1-5 minutes. Then every thing goes back to working.

Tom

Reply
0 Kudos
ccboe
Contributor
Contributor

log looks fine.

see many auth lines coming from my client

and some syslog restarts

Tom

Reply
0 Kudos
ccboe
Contributor
Contributor

I have tailed messages while this "freeze" happens. Nothing is being written.

Should syslog be restarting often?

Tom

Reply
0 Kudos
ccboe
Contributor
Contributor

I think I got it figured out.

Looking at var/log/vmkernel I saw many errors. Turned out to be dead luns that I removed from SAN.

I refreshed the hba's and not seeing the problems now,

MANY THANKS for the help.

Tom

Reply
0 Kudos