VMware Cloud Community
caddo
Enthusiast
Enthusiast

Major issue with storage performance and vSphere 4.1

Hi everyone,

i hope someone can help me with this since i am no expert in troubleshooting storage issues.

Here's the thing: i have a IBM BladeCenter configuration with 2 HS21 XM (esx 3.5 update 4 Enterprise) and a 2GB Fiber Channel DS4300 storage subsystem. I set this up a couple of years ago and everything is working like a charm; yesterday i added 2 HS22, upgraded to vCenter 4.1, i made a second 2 node cluster, enabled EVC to vMotion every VM there and it all worked real good, except for the fact that very ofter all the VMs experience VERY slow storage performance; it happens randomly because at times they work normally and at times they are almost frozen for seconds or even minutes.

The machine that has the biggest problems is the machine with the biggest virtual disks, the fileserver. It's not that big actually but with 1 250 GB it's the biggest around. It took almost an hour just to boot it while it takes 7 minutes if i move it back to the old 3.5 ESX servers.

Since my storage subsystem is old i'm not expecting huge performance but i would like to have at least similar performance. I'm thinking it could be related to SCIO or to a different queue depth setting but i am not sure how to identify the problem.

I admit that my storage subsystem is certified with vSphere 4.0 Update 1 and not vSphere 4.1, i decided to try and see if i could use the last version because i intended to install View 4.5.

Anyone can suggest some troubleshooting steps at least to identify where the problem is? The fact that it's a storage issue is just my pure assumption.

Thanks.

Reply
0 Kudos
1 Reply
caddo
Enthusiast
Enthusiast

Following this article (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008205) i think i found what happened when the virtual machines look frozen.

Sometimes on the 4.1 machines the DAVG/cmd value GAVG/cmd value are above 5000 and this means scsi commands time out.

On the 3.5 machines those values never go above 15.

Maybe this additional info will make possible for someone to help me. I attach screens of the values i mentioned.

Reply
0 Kudos