VMware Cloud Community
TRottig
Enthusiast
Enthusiast

De-Staging causing write speed drop on all NVME setup

Hi,

I've been wondering like forever why vSAN was performing rather poorly for me. I have a 4 node setup, with Xeon E3-1245 v6s, a single diskgroup per host (P4800x 375G/ P3600/P4510 2TBs).

Read performance was quite good, especially if I started multiple vMotions at the same time; I was able to reach several GB/s (moving to a nfs filer).

However moving VMs back to vSan was always kinda meh. Today I actually found a cause (though no explanation).

Moving a single VM (480G) from NFS to vSAN was performing quite good for a while until suddenly it became slow.

I checked the vSAN performance pages and found that basically the performance tanked when the cache disk was full and De-Staging was beginning.

 

TRottig_1-1614202944074.png

 

 

TRottig_0-1614202883034.png

 

TRottig_2-1614203026248.png

Now the question is o/c, why an Intel P4510 NVME drive would not be able to ingest data at a fast enough rate to prevent the data transfer to come to down to some 100-200 MB/s ?

O/c Firmware is at the recommended level, inter Node connectivity is at 56GBit, cpu had at least some headroom (Peak usage was 12GHz, cpu has 4 cores x3,7GHz).

 

I have no idea of the inner workings of vSan, so maybe this has a very good explanation; I'd love to hear it in order to understand what the issue is.

 

Thanks

0 Kudos
4 Replies
TristramJ
Contributor
Contributor

Hi,

 

Came here to say the same thing actually, EPYC 2 x 32c Platform, 2TB ram, P4800X with P4510 drives and when vmotioning to it bam it sucks:

 

TristramJ_0-1614293337820.pngTristramJ_1-1614293391022.png

 

Looks like when the write buffer is full it really starts to chug, I was thinking it might be the EPYC platform and NVME issues but to see you having the same issues I wonder if it's a P4510 bug?

 

0 Kudos
TRottig
Enthusiast
Enthusiast

Hi,

unfortunately this does not seem to be limited to the P4510's.

I have P3600s in my other nodes and they exhibit similar issues.

TRottig_0-1614529229469.png

 

Is there anybody who reaches better DeStage performance with a single disk group? 

If so what hw do you run?

 

Thanks

TristramJ
Contributor
Contributor

Did you end up opening a case? I got hit with this again last night. I've tested the base platform with just file copy from Cache Tier drive to Capacity Tier drive and see no issues. As soon as we setup vsan and the write buffer drops to 15-30% free the latency goes thru the roof

0 Kudos
TRottig
Enthusiast
Enthusiast

Hi,

can't, VMUG only, so no way of opening a case for this.

If you can it would be great to get somebody to look at this, happy to support with further data.

 

Cheers

0 Kudos