chadwickking
Expert
Expert

Troubleshooting DISK Latency vscsiStats

Jump to solution

Hello forums.

Could use some expert insight for a problem I am having with some VM's. Resources utilization is not at capacity in any way. No CPU contention but VM's are running slow - and often types its bad or sometimes just slow. The VM's didn't have this but for a while now I know we have been adding more and more VMs to our lab so this could explain the Latency

I would like for someone to offer some assistance on using vscsiStats for troubleshooting vmdisk related problems. On the vm I was looking at in perfmon i noticed a somewhat long average disk queue length. I ran vscsiStats and got the following after 30 minutes. Please let me know if I understand this correct.

vscsiStats ran with latency string only included latency of IOs not read/write. Both disk are on the same datastore.

VMDISK 1

Histogram: latency of IOs in Microseconds (us) virtual

machine worldGroupID

min 121

max 14440236

mean 327165

count 15524

Frequency Histogram

Bucket Limit

0 1

0 10

0 100

1488 500

1929 1000

1453 5000

1618 15000

1688 30000

1070 50000

1546 100000

4732 100000

VMDISK 2 -

Histogram: latency of IOs in Microseconds (us) virtual

machine worldGroupID

min 159

max 14245418

mean 478629

count 3446

Frequency Histogram

Bucket Limit

0 1

0 10

0 100

1722 500

146 1000

136 5000

65 15000

59 30000

46 50000

88 100000

1184 100000

I really dont understand our storage layout very well and I apologize for that I am trying to learn it. We do use NetAPP but I dont think we use all NetAPP storage. These Disk are attached with 4 GBPS HBA that are SATA disk.

My Understanding from the above is the following:

4732 Writes => .1 tenth of a second

1546 Writes<= .1 tenth of a second

1070 Writes <= .05 hundredths of a second

My math may be a bit off but is this a lot of latency for disk i/o? Your help is appreciated.

Cheers,

Chad King

VCP-410 | Server+

Twitter:

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Cheers, Chad King VCP4 Twitter: http://twitter.com/cwjking | virtualnoob.wordpress.com If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership

You are correct this is the latency per I/O. It means that large portion of your IOs have latency higher than 100 miliseconds, this is fairly high to be honest and I am not surprised if you would notice overall sluggishness. I wonder if you for instance have checked if your disks are aligned or not? This could contribute to this level of latency. I also wonder how many links you have going back to your filer?

Another thing worth investigating is the utilization of the processors of the filer.



Duncan

VMware Communities User Moderator | VCDX

-


Now available: <a href="http://www.amazon.com/gp/product/1439263450?ie=UTF8&tag=yellowbricks-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=1439263450">Paper - vSphere 4.0 Quick Start Guide (via amazon.com)</a> | <a href="http://www.lulu.com/product/download/vsphere-40-quick-start-guide/6169778">PDF (via lulu.com)</a>

Blogging: http://www.yellow-bricks.com | Twitter: http://www.twitter.com/DuncanYB

View solution in original post

0 Kudos
5 Replies
chadwickking
Expert
Expert

If anyone could help it would be appreciated! thanks.

Cheers,

Chad King

VCP-410 | Server+

Twitter:

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Cheers, Chad King VCP4 Twitter: http://twitter.com/cwjking | virtualnoob.wordpress.com If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
depping
Leadership
Leadership

You are correct this is the latency per I/O. It means that large portion of your IOs have latency higher than 100 miliseconds, this is fairly high to be honest and I am not surprised if you would notice overall sluggishness. I wonder if you for instance have checked if your disks are aligned or not? This could contribute to this level of latency. I also wonder how many links you have going back to your filer?

Another thing worth investigating is the utilization of the processors of the filer.



Duncan

VMware Communities User Moderator | VCDX

-


Now available: <a href="http://www.amazon.com/gp/product/1439263450?ie=UTF8&tag=yellowbricks-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=1439263450">Paper - vSphere 4.0 Quick Start Guide (via amazon.com)</a> | <a href="http://www.lulu.com/product/download/vsphere-40-quick-start-guide/6169778">PDF (via lulu.com)</a>

Blogging: http://www.yellow-bricks.com | Twitter: http://www.twitter.com/DuncanYB

View solution in original post

0 Kudos
chadwickking
Expert
Expert

Thanks Duncan,

Sorry for getting back so very late. I have been very busy at work and have little time to do the VM troubleshooting. I supposed by link you mean possibly the amount of paths? I know we use two 4Gbps cards per host connected back to the SAN. This is also in our lab but I am glad to see that I am reading vSCSIstats correctly.

Apparently they added more memory to the VM and they are not having a problem anymore. I dont think this is what fixed their problem because when they get busy again I think it will occur again. Plus they also booted the VM. Either way I am going to do some further research in the lab and see what I can gather from others host.

Thanks again Duncan!






Cheers,

Chad King

VCP-410 | Server+

Twitter: http://twitter.com/cwjking

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Cheers, Chad King VCP4 Twitter: http://twitter.com/cwjking | virtualnoob.wordpress.com If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
chadwickking
Expert
Expert

Considering this is one VM in general. I supposed I could get a larger sample and see what its like. Do you have any other recommendations?






Cheers,

Chad King

VCP-410 | Server+

Twitter: http://twitter.com/cwjking

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Cheers, Chad King VCP4 Twitter: http://twitter.com/cwjking | virtualnoob.wordpress.com If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
chadwickking
Expert
Expert

They have two different types of storage. All are connect by FC but the one that is getting hammered is the SATA NetAPP storage they are using. It appears they have turned off deduplication because of the problems with I/O being high - though I would like more detail in this and would love read into it. They have a lot of VMs hitting that storage and moved them over to a SAS-FC storage and since have not had near as many issues. You would think after going through this hell in their VDI environment they would know this. Thanks again.

Cheers,

Chad King

VCP-410 | Server+

Twitter:

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

Cheers, Chad King VCP4 Twitter: http://twitter.com/cwjking | virtualnoob.wordpress.com If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos