VMware Horizon Community
Yasser_
Contributor
Contributor

VDI Slowness

Hi

I have the following problem

We have VDI solution 3 Servers ESXi version 6 , and Horizon view version 6 with Fujitsu storage

We have two thin provision pool in the storage

1st TTP0 have 13 disk 300 GB --> this TTP0 have 4 volumes

2nd TTP1 have 9 disk 900 GB --> this TTP1 have 6 volume

the VMs in TTP0 volumes are OK

the VMs in TTP1 volumes  are face a latency and low performance, we can see these latency using ESXTOP in LAT/wr counter

When we try to migrate VM from TTP0 volume to TTP1 volume we face this latency

Can anyone help in this issue

0 Kudos
4 Replies
jasnyder
Hot Shot
Hot Shot

First, I'm not familiar with any Fujitsu devices, and I don't believe I've come across the TTP0 terminology.

But let's assume that TTP0 and TTP1 are similiar to disk groups or some kind of concept.  And that within each disk group you have physical spindles, lilke you mention one has 13 and the other 9.  From there you usually specify a RAID configuration.  Depending on the RAID configuration and the read/write profile of the disks, I'm guessing this is likely the cause for your issue.  You have more spindles in TTP0 and less volumes as well as less total storage.  Generally you can guess the total IOPS available in a disk group from the type of drive * number of drives.  The RAID will either induce a performance penalty or a storage penalty.  Certain RAID types are better suited for certain workloads.

So without more information, it's hard to say what your problem definitely is, but if I had to guess you don't have enough spindles in TTP1 to handle the total load required by the number of workloads running off that disk group.  If these are 7200RPM SATA drives, you're getting maybe 900 IOPS out of that group.  If you're running RAID5 or RAID6 you're taking a some performance penalty as well.  You didn't say how many desktops you're running, but I would guess you'd max out at 18-36 total desktops before you really see it tank.  It would also be contending with any server workloads you might be also hosting out of those groups.

From Wikipedia, here are some good guesses for IOPS per spindle for common drive types:

5,400 rpm SATA drivesHDD~15-50 IOPS[2]SATA 3 Gbit/s
7,200 rpm SATA drivesHDD~75-100 IOPS[2]SATA 3 Gbit/s
10,000 rpm SATA drivesHDD~125-150 IOPS[2]SATA 3 Gbit/s
10,000 rpm SAS drivesHDD~140 IOPS[2]SAS
15,000 rpm SAS drivesHDD~175-210 IOPS[2]SAS

Hopefully you're using 15K SAS  or better yet SSD.  I would expect you wouldn't have a problem if you were using SSD unless you really had a lot of desktops running.  On a small array like I am imagining you have, there may be a small SSD/NVMe cache layer, which you might be exceeding the size/performance of (maybe you can upgrade that piece).

Another question is how the storage is mounted - NFS or iSCSI?  (I am assuming not Fibre Channel based on the size).  You may need some network optimization there as well.  NIC speed may matter.  Having jumbo frames enabled may matter.  Having more interfaces off the array attached and spreading the load across interfaces may help.

So, there are a lot of variables at play.  I would try to look at what the differences between TTP0 and 1 are in terms of disk speed, configuration, # of workloads running on each, and performance profile of workloads running on each.

0 Kudos
Yasser_
Contributor
Contributor

Sorry for less information

Here it is the story

We have 3 servers for VDI and 3 different servers to host virtual servers for infrastructure like AD , SQL , Exchange , the workload of VDI servers (hosts) is separated from infrastructure servers

I 300GB and 900GB  disks are 10K rpm and I use RAID 5 for both configuration

The number of VDI is around 200 VMs

The history of the problem:

in the past we had 100 VM was working fine without problem in 13 x 300 GB disks RAID5 --> TTP0 , then we increase the number of VMs so we add another TTP1 with 5 disks 900GB 10K as RAID5 then we face a low performance problem  then we add 5 disks 900GB as another RAID5 and expand TTP1 and rebalance all the VMs in TTP1 to both RAID groups

But the problem not solved

Hint , I mention in pervious post that TTP1 is 9 disks , the correct information is 10 disks

0 Kudos
Yasser_
Contributor
Contributor

also the SAN is FCOE is not NAS

0 Kudos
RyanHardy
Enthusiast
Enthusiast

If TTP0 is a RAID5 with 13 disks, then you have quite some disks working for acceptable performance - see RAID Performace Calculator - WintelGuy.com

13 disks (10k FC, each 150 IOps) in one RAID5 would net 780 IOps (50% read).

Now your TTP1 was one RAID5 with 5 bigger disks - but despite the bigger size you still have to calculate about only 150 IOps for each disk. So 5 disks in one RAID5 result in only 300 IOps (again, 50% read). Even adding another RAID5 group only adds 300 IOps.

Now first I'd like to state that in my experience 50% read in VDI environments aren't necessarily true - there may be a lot more writing involved (resulting in way lower performance on HDDs).

The other thing is that more RAID groups with less disks is going to be a problem on performance. VMs on one of the two 5-disk-groups will only be able to use max. 300 IOps, while VMs on the old RAID5 could use 780 IOps max. A VM might need more disk performance but only for a short time (booting comes to mind), then the TTP0 with its 13 disks may be more than twice as fast as your new TTP1.

You can save yourself a lot of headache with SSDs (if they are built to sustain a lot of writes). When using HDDs, carefully design your storage.

0 Kudos