Solved: Re: ESXI 6.0 round robin multipathing latency prob...

Ugur_Demirciogl · ‎03-04-2020

Hi All,

I am using ESXi 6.0 and ı have changed my storage system. So when ı move all vm datastore to new storage ı faced latency issue, ı have tried to find out that problem and I figure out that when ı change path policy from round-robin to fixed path the latency becomes normal. I want to use round-robin policy but at this point ı cant, has anyone experienced the same problem before? or I would be glad if someone shares an idea about the subject. Thanks.

Ardaneh · ‎03-04-2020

Hi

First of all, your storage must support RR, so check this first. Keep it in mind that round-robin only stopped using a path when it was dead. so If the latency on one path was 50 ms and 1 ms on the other one, it would see and use each path equally.

When you are using multipathing and you have this kind of issue, you need to validate all physical path components (cables, Modules, Ports).

In ESXi 6.7 there is a specific storage policy for RR, When enabled, ESXi will sample the paths every 3 minutes with 16 I/Os. It will then calculate the average latency for those I/Os and decide (in comparison to the other paths) whether or not to use that path. If it is deemed too unhealthy, it will be excluded until the next sampling period begins in 3 minutes where it will be re-evaluated.

You can find more information about how to config this policy in this link.

Hope this could be helpful

View solution in original post

MikeStoica · ‎03-04-2020

You have iSCSI storage? If yes, have you checked this Best Practices For Running VMware vSphere On iSCSI | VMware ?

sjesse · ‎03-04-2020

If the above guide doesn't help, you should actually reach out to your storage vender or at the very least let us know what that vendor is, as each has different requirements that you need to follow. For example nimble arrays have a plugin you need to install that handles the multipath options for your, and other vendors have similar.

Ugur_Demirciogl · ‎03-04-2020

Hi Mike,

I am using FC protocol not ISCSI , buy the way ı gave datastore as ISCSI just for test, So on ISCSI protocol no any issue, but ı need to use FC

Ugur_Demirciogl · ‎03-04-2020

I am using FAS Series Netapp Storage

sjesse · ‎03-04-2020

Contact Netapp and let them know, if your doing fixed and it works, one of the paths are causing problems. We saw something similar when a sfp went bad which caused the light levels to drop out of range. There isn't much on VMware's side configuration side you can do.

MikeStoica · ‎03-04-2020

Check NetApp documentation https://community.netapp.com/fukiw75442/attachments/fukiw75442/fas-and-v-series-storage-systems-disc... , FAS Storage Systems Resources | NetApp Documentation

andres_prieto_a · ‎03-04-2020

Hi

i would suggest, like other has done, to contact with vendor and make sure what is the best policy to apply for storage. Have in mind that needs to be aligned between the Storage and the ESXi otherwise as you are experience can cause problems: because storage is expecting one policy and vmware is using other.

We suffer this one with EMC in the past

Regards

Ardaneh · ‎03-04-2020

Hi

First of all, your storage must support RR, so check this first. Keep it in mind that round-robin only stopped using a path when it was dead. so If the latency on one path was 50 ms and 1 ms on the other one, it would see and use each path equally.

When you are using multipathing and you have this kind of issue, you need to validate all physical path components (cables, Modules, Ports).

In ESXi 6.7 there is a specific storage policy for RR, When enabled, ESXi will sample the paths every 3 minutes with 16 I/Os. It will then calculate the average latency for those I/Os and decide (in comparison to the other paths) whether or not to use that path. If it is deemed too unhealthy, it will be excluded until the next sampling period begins in 3 minutes where it will be re-evaluated.

You can find more information about how to config this policy in this link.

Hope this could be helpful

Ugur_Demirciogl · ‎03-04-2020

Hi,

Yesterday, I have check all physical path components ı found that on brocade san switch one port have so many crc error ı have replaced that port fc cable than round robin work well.

thank you all.

All

ESXI 6.0 round robin multipathing latency problem