I am hoping to get some feedback and/or advice to help make some VDI performance related decisions. Below is some information about what our current setup looks like:
- ESXi hosts have - Intel 2 socket 12 cores/socket HT enabled. 2.6GHZ CPU / 512GB RAM
- All Flash Storage SAN using 16Gb storage switches etc...
- Horizon View 7 / vCenter 6.5
- Windows 10 1809/1709 virtual desktops. (running VMware tools/Horizon View desktop agent)
- Linked Clone desktop pools
Currently we are running 2vCPU + 6GB RAM. This is under-provisioned as I see our CPU Load AVG is usually between .5-.6, and RAM is way under-provisioned. We are noticing that with all our security and Office 2016, Google Chrome etc...that our performance is starting to suffer. By this I mean that we are seeing, at the end user, CPU staying at 100% longer when launching browsers/apps (it comes down but not a fast as it used to), higher page file access, and just general "latency." Currently we are running 3:1 vCPU to pCPU consolidation ratio. Our storage metrics show that we in no way suffering (IOPS, Service Time, etc...).
We are recomposing this weekend to 8GB RAM (and reserving 4GB) as we have enough RAM per host to handle this easily. I expect this will help certain workloads, but I doubt it will help handle all the additional security and such that have been added to the environment. I have been able to get Endpoint Security to follow the vendors best practices for scanning and exclusions, but it is still taking a toll. I am currently planning on testing 3 different virtual machine configurations, but I am wondering what others may be running, or if anyone has some additional feedback on settings that I can try to adjust for better overall performance.
Test 1: (Running on 1 host separate cluster/storage LUN for testing)
4 vCPUs (2 socket/2core) + 8GB RAM (4GB reservation)
- This configuration is showing "spikes" of high CPU RDY and CO-STOP. By spikes I mean that when I have ESXTOP refreshing at 2 second intervals, I will see this spike occur for an interval and then "recover" when the next refresh cycles. The CPU Load AVG is usually around .90-1.09. I changed out the LSI scsi adapater to a PVSCSI adapter to see if I could save some CPU cycles. Using this setup it takes my vCPU:pCPU ratio to 6:1 @ 40 VMs per host. I also see pretty bad NUMA numbers (like 8-14 of the 40 VMs will be below 70% of the N%L metric..with some of those down below 20%). I can not go below 40 VMs per host or I will not have enough hardware to support my environment.
Test 2: (going to start next week)
3vCPU (1 socket/3 core) + 8GB RAM (4GB reservation)
- Has anyone run 3vCPU with Win10 1709/1809? Any thoughts on this configuration? It would allow me to go to, at worst I hope, a 5:1 vCPU to pCPU ratio at 40 VMs per host. I am considering using a PVSCSI adapter for this as well just to see if I can save any CPU cycles at all. Thoughts?
4vCPUs (1socket/4 core) + 8GB RAM (4GB reservation)
- I know that cores per socket etc...doesn't make much difference anymore. This will use the 6:1 vCPU to pCPU ratio again as I need at least 40VMs/host. Undecided on the SCSI adapter type. I don't expect this one to show any better results than Test 1 honestly. Thoughts?
Bottom line is that in my simple mind, 2vCPU + 6GB RAM is under-provisioned, and based on ESXTOP reviews, 4vCPU is over-subscribing our CPU resources and impacting NUMA. Customers have reported that 4vCPU does appear more responsive, and Stratusphere shows that CPU queues, logon times, and overall CPU performance on average are better. Again ESXTOP is showing that 4vCPU results in spikes of high RDY and CO-STOP occurring at fairly regular intervals, so I am concerned that this will start to show up at the end user side sooner than later. This is why I am thinking that 3vCPU may provide close the same performance gain as 4vCPU with a slightly lower impact to the back-end resources. Still it is hard to use 3vCPU as my binary mind wants to stick with 1-2-4-8 etc...
Any advice, feedback, or any crazy ideas/thoughts that I can try?
We disabled View Storage Accelerator when we switched over to the flash based SAN. I am thinking about turning it back on to see if it will help reduce the CPU utilization, of the hosts, during customer logon. I don't like the idea of slowing down Refresh/Recompose operations, but the flash SAN should make it more tolerable. Does anyone have any metrics on the amount of CPU utilization is reduced by using VSA? I am not worried about logon times as they are well within our expected baseline, it just looks like the majority of stress is during customer logon...for CPU.