VMware Cloud Community
seamusobr1
Enthusiast
Enthusiast
Jump to solution

High latency on vsan stretched cluster

Good afternoon

We are running a stretched cluster with 16 nodes and 10gbps uplinks

The version is 6.5 update 3

We had an alarm raised because some of the VMs experienced read/write latencies of about 800ms

I think I have traced the issue back to a disk group

All of the disks in the disk group have been showing results that like that below

pastedImage_0.png

Not seeing any issues with cache destage rates

Does anyone know why there would be high physical/firmware layer latency on all disks in the group

Thanks in advance

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Zifu_invzion​,

I don't see how you might get a correlation between dedupe and such issues - for a start, the issue impacted multiple Disk-Groups and vSAN dedupes only per Disk-Group. If you mean device latency/strain from the extra load that enabling/disabling dedupe would do (as it basically has to read and re-write all data), this would also be ruled out from the fact that the graphs indicate the issue occurred over the course of a few minutes not a prolonged duration (and OP likely would have mentioned this if they were performing such activities).

My assumption would still be a controller issue or potentially a knock-on issue on the controller caused by some misbehaving attached device.

I also don't really see how wiping and recreating a Disk-Group would help in anyway - the issue doesn't appear to have been prolonged and thus likely was dealt with by automated functions as opposed to human intervention.

Bob

View solution in original post

0 Kudos
4 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello Seamus,

Is there only one Disk-Group on that host? If so it could be an issue on the controller.

If there are multiple Disk-Groups on the host then it is more likely an issue with the Cache-tier or if dedupe is enabled then potentially one Capacity-tier device.

What do you see in vmkernel.log and vobd.log at the time of the latency occurring?

Bob

0 Kudos
seamusobr1
Enthusiast
Enthusiast
Jump to solution

Thanks I will take a look

0 Kudos
Zifu_invzion
Enthusiast
Enthusiast
Jump to solution

Hi,

As TheBobkin , in vSAN 6.5 dedup could be a reason for high latency. If you have the possibility of put the host in maintenance mode and re-create the disk group maybe could resolve the problem.

BR!

0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Zifu_invzion​,

I don't see how you might get a correlation between dedupe and such issues - for a start, the issue impacted multiple Disk-Groups and vSAN dedupes only per Disk-Group. If you mean device latency/strain from the extra load that enabling/disabling dedupe would do (as it basically has to read and re-write all data), this would also be ruled out from the fact that the graphs indicate the issue occurred over the course of a few minutes not a prolonged duration (and OP likely would have mentioned this if they were performing such activities).

My assumption would still be a controller issue or potentially a knock-on issue on the controller caused by some misbehaving attached device.

I also don't really see how wiping and recreating a Disk-Group would help in anyway - the issue doesn't appear to have been prolonged and thus likely was dealt with by automated functions as opposed to human intervention.

Bob

0 Kudos