We seem to have started to get some screen lag, especially when typing, it feels slow and sometimes doesnt appear for a second or two. Noticeably after upgrading to View 7.5 but cant confirm if that is the cause.
We have 1GB lines between our offices to our datacenter.
One office is suffering a lot more than the rest but all have the symptoms.
I have been monitoring the bandwidth on the sites and at the data center and the lines are barely doing anything, we are not coming anywhere near 1GB by a huge amount, that is the same for all the lines. I have also been monitoring user session stats like Memory and CPU usage, most of the time the VM is flat lining at a very low amount so their not even making the machine sweat. We are pretty new and powerful hardware and storage so i dont have a clue what can be causing this.
Some days it is better than others.
All the boxes are HP T310 Zero Clients using PCoIP. Non Persistent Linked Clones.
I would start with esxtop to confirm the ESXi host and storage are performing as expected. Since you mentioned you have new hosts/storage you could have a bad connection between the host/storage, a misconfigured setting on the host or high ready times.
Well the hosts and storage are about a year and a half old but they are quite high end. Unfortunately it was all setup by another company and i have found constant issues with the config but am fairly new to quite a lot of it. Its always a lot harder to fix someone else config screw ups!
I have run ESXTOP and looks like the %RDY time is pretty high according to the cheatsheet. The current base image setup is 2 CPU and 4 cores on each, so 8 in total.
We have two sets of servers, one is for standard users. See below for ESXTOP screenshot -
The other is for CAD users which are slightly different hardware but the same CPU config. -
I'm guessing i need to lower the CPU/Cores? Which is the best way to go about it? Remove a CPU or remove some cores?
From watching the CPU usage for each user, it is not very high throughout the day on either the standard or CAD hosts.
that %rdy seems a tad high...
We had this problem and our issue was we did not have our Dell Poweredge server BIOS set to performance, it was performance per watt. Also be sure you have the vSphere power setting to High Performance.
We are also on 7.5 with non persistent linked clones. We have around 17-18 users per host, some have less users, this is due to us using Nvidia GRID cards as well. W7 x64 8GB RAM
We are using HP 3PAR with SSD storage.
I have also noticed that our datastores are showing as 'Non-SSD', is this something to worry about, or does it not matter a great deal? I found out yesterday that this can be changed in the hosts by running a couple of commands, but we will have to power down the whole enviroment to do it.
It really doesn't make a difference unless you want to use the storage for vFRC (vSphere Flash Read Cache). I would open a ticket with HP, they aren't sending the correct T10 information to let the host know it's SSD storage. This could be a setting on the array or a code upgrade that's needed to resolve it.
At a glance your CPU are over provisioned and would explain your initial issue. 8 vCPU for a virtual desktop is significant, we find that 2 vCPU works for most users and in rare cases 4 vCPU or more is needed. Can you share the full screenshot which includes the host information? What model CPU do the hosts have?
Agreed - Our bottleneck is RAM. Right now we are on 32bit W7 with 4GB of ram but View uses some of that sot he users get 3GB. With W10 soon we will go 2xpu and 8GB of RAM - we just beefed up our servers to 512gb of RAM and 2TB Intel NVMe per host.
These users better be happy....:)
I dropped it down to 1 CPU and 4 cores last night and that has improved things quite a bit. Have only done it on the standard base image/hosts so far. I will see how things are for a week and if there are no complaints and the screen lag clears up, then i will leave it as is. If it continues i will try dropping down to 2 vCPU's. The CAD hosts have 2x Intel Xeon E5-2643 v4 @3.4ghz. The standard hosts have 2x Intel Xeon E5-2690 v4 @2.6ghz
RAM is definitely not an issue here we have way more than we need, we barely use half of what we have in the standard hosts (400gb) and just over half in the CAD hosts (800gb)
Those ready time numbers look much better but be aware they will change based on the CPU demand of the VMs (e.g. If all of your VMs are heavily utilized the numbers will be higher). I'm going to ignore relaxed co-scheduling and other scheduler improvements for the sake of this conversation. I aim to keep our vCPU:pCPU ratio under 5:1 and as we approach 6:1 we look at adding capacity or shifting things around. For your environment you may find a higher or lower number works for your users.
Your 2 x Intel Xeon E5-2643 v4 only net you 12 physical cores (2x6). When you had 8 vCPU you really could only run execute 1 VM at a time which explains your high ready times. With your VMs at 4 vCPU you can have 3 VMs using the CPU at once with the other VMs waiting their turn. If you get down to 2 vCPU you will be able to execute 6 VMs at once. For a 5:1 ratio I would try to limit these hosts to 60 vCPU.
Your 2 x Intel Xeon E5-2690 v4 on the other hand net you 28 physical cores (2 x 14). So at 4 vCPU you can simultaneously execute 7 VMs and at 2 vCPU you can do 14 VMs. For a 5:1 ratio I would try to limit these hosts to 140 vCPU.
I would recommend ordering a copy of the VMware VSphere 6.5 Host Resources Deep Dive and reading through it.
Thanks for all the help and replies guys, much appreciated.
I have dropped our CAD base image down to x4 vcpu's for now to see if this improves things and that there is not a negative effect to the user, if not, i i will drop them down a little more.
So, we have been testing this for a week now. At first it all seemed very smooth, but now the screen lag is back again and is really bad today. We have 1GB lines between our datacenter and our site offices and they are using about 100mb of the 1GB with little to no spikes. If they do spike its barely over 200mb-300mb. We ran a transfer yesterday over 1TB and maxed the line out. This didnt seem to effect the VM's in any way either.
What else can i look at to get this extremely frustrating problem resolved. Is this something VMware could help with, if i raised a support ticket with them?
I'd open a ticket, but what do the ready numbers look like now, they were higher before. Do you see this issue from windows based clients, or just the zero clients. I didn't notice it in any of the comments, have you tried updating the firmware on the zero clients.
The ready values are between 2-4 at the moment.
We don't really have many Windows clients they are all on zero clients. My zero client has the latest firmware available and still has the lag issues.