Hi Guys
I am having a very strange issue. I have 2 RHEL 5.5 VMs in my environment. The VMs freeze without any reason for 5 seconds and then resume. One VM is running Oracle on it and the other is running a business application. Both of the VMs are running on separate hosts on the cluster. Hosts are esxi 4.1 and resources are not the issue here because nothing else is using the cluster. Each host has 12 cores and 96 GB RAM. The storage is on SAN.
Application VM has 8 GB RAM and 1 vCPU and database vm has 16 GB ram and 1 vCPU.
I don't see anything in the /var/log/messages regarding this freeze.
Can someone please provide some guidence for troubleshooting?
Regards
Shail
Where are you noticing that its freezing and how often is it doing it??
We tried many ways.
1. Pings drop
2. console hangs
3. Application client disconnects
could it be that during that time a snapshot is being created?
From that its all sounding like a Network issue.
When you do a Ping -t are you seeing request timed out or does the contact just take longer?
Thats a thought or some element of VMotion going on, but i'd doubt that as its been mentioned they are on different hosts.
It's timing out. We suspected the network but it is not. well looks like it is not because I can see the vm freezing.
No. Although both of the VMs have snapshots attached to them but no new snapshot activity.
Is there nothing in the Log to indicate an issue?
try to look at the vsphere client event logs on the VM, see what vm level operation has being done to create such scenario. if you have a support with vmware, you could get the logs and create a case with them it might be faster to nail down the root cause.
unfortunately nothing in the log. I checked /var/log/messages, dmesg. ran tcpdump but no network activity.
the kernel log module is not loaded. redhat decided to take that off without notice. I will load the module tomorrow so that i can see the kernel messages easily. but for the time being dmesg is my friend.
yeah that sounds reasonable
Thanks for the advise guys.
I solved it myself
Thats cool, interested to know what resolved it in case anyone else comes a croper with it.
Hi JS
I think the problem is within my HP VC Connect Flex modules. I installed MEZZ cards on one of the esxi hosts and used the passthrough modules on the enclosure to connect to SAN. Moved the VMs to this new host and the freezing issue is gone.
I monitored the old hosts in the cluster and noticed huge number of vmkernel NMP errors. looks like there is some path flapping is happening. I am using HDS AMS2500 storage array which is an active / active array. I have seen those vmkernel errors before with EMC Clariion. I have told this to hp and vmware. Because the enclosure uses FCoE within the enclosure it might be an issue with the vc flex firmware. Vmware thinks it might be a bug also.
My vms are running fine on the other host and I am doing some more troubleshooting on this. Will keep you posted if I get something.
