VMware Cloud Community
HUNGAMA
Enthusiast
Enthusiast

Linux VM freezes

Hi Guys

I am having a very strange issue. I have 2 RHEL 5.5 VMs in my environment. The VMs freeze without any reason for 5 seconds and then resume. One VM is running Oracle on it and the other is running a business application. Both of the VMs are running on separate hosts on the cluster. Hosts are esxi 4.1 and resources are not the issue here because nothing else is using the cluster. Each host has 12 cores and 96 GB RAM. The storage is on SAN.

Application VM has 8 GB RAM and 1 vCPU and database vm has 16 GB ram and 1 vCPU.

I don't see anything in the /var/log/messages regarding this freeze.

Can someone please provide some guidence for troubleshooting?

Regards

Shail

Reply
0 Kudos
14 Replies
Jsfair
Contributor
Contributor

Where are you noticing that its freezing and how often is it doing it??

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

We tried many ways.

1. Pings drop

2. console hangs

3. Application client disconnects

Reply
0 Kudos
idle-jam
Immortal
Immortal

could it be that during that time a snapshot is being created?

Reply
0 Kudos
Jsfair
Contributor
Contributor

From that its all sounding like a Network issue.

When you do a Ping -t are you seeing request timed out or does the contact just take longer?

Reply
0 Kudos
Jsfair
Contributor
Contributor

Thats a thought or some element of VMotion going on, but i'd doubt that as its been mentioned they are on different hosts.

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

It's timing out. We suspected the network but it is not. well looks like it is not because I can see the vm freezing.

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

No. Although both of the VMs have snapshots attached to them but no new snapshot activity.

Reply
0 Kudos
Jsfair
Contributor
Contributor

Is there nothing in the Log to indicate an issue?

Reply
0 Kudos
idle-jam
Immortal
Immortal

try to look at the vsphere client event logs on the VM, see what vm level operation has being done to create such scenario. if you have a support with vmware, you could get the logs and create a case with them it might be faster to nail down the root cause.

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

unfortunately nothing in the log. I checked /var/log/messages, dmesg. ran tcpdump but no network activity.

the kernel log module is not loaded. redhat decided to take that off without notice. I will load the module tomorrow so that i can see the kernel messages easily. but for the time being dmesg is my friend.

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

yeah that sounds reasonable

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

Thanks for the advise guys.

I solved it myself

Reply
0 Kudos
Jsfair
Contributor
Contributor

Thats cool, interested to know what resolved it in case anyone else comes a croper with it.

Reply
0 Kudos
HUNGAMA
Enthusiast
Enthusiast

Hi JS

I think the problem is within my HP VC Connect Flex modules. I installed MEZZ cards on one of the esxi hosts and used the passthrough modules on the enclosure to connect to SAN. Moved the VMs to this new host and the freezing issue is gone.

I monitored the old hosts in the cluster and noticed huge number of vmkernel NMP errors. looks like there is some path flapping is happening. I am using HDS AMS2500 storage array which is an active / active array.  I have seen those vmkernel errors before with EMC Clariion. I have told this to hp and vmware. Because the enclosure uses FCoE within the enclosure it might be an issue with the vc flex firmware. Vmware thinks it might be a bug also.

My vms are running fine on the other host and I am doing some more troubleshooting on this. Will keep you posted if I get something.

Reply
0 Kudos