extended stun times when snapshotting SSD stored VMs
We use VMware Essential Plus and have 3 hosts all running ESXi 6.5 (build 5310538) deployed from the HP custom ISO
Shared storage is a DotHill AssuredSAN 4824 with dual controllers, each having 4 x 1 Gb ethernet for iSCSI. Each host has 4 x 1 Gb/S ethernet dedicated for iSCSI
The installed disks in the SAN are a RAID 6 array of 11 10k disks and a RAID 1 array of 800GB SSD
Round robin multipathing is setup on each host and performance testing produces expected results with the SSD significantly outperforming the HDD array.
On investigation of a problem of a SQL server vm becoming unresponsive during Veeam backup I narrowed the problem down to when the snapshot was created and consolidated. During these periods the vm is unresponsive for 20-40 seconds. This causes an application server on another vm to raise errors and sometimes fail.
After further investigation work i have noticed it is nothing to do with SQL being on the vm or VSS quiescing times but can be put down to whether the vm is stored on the SSD based datastore or a HDD based one.
When stored on a HDD one snaphotting stun times are 10x shorter. The issue can't be related to I/O performance