VMware Cloud Community
SimnetMSi
Contributor
Contributor

Performance Issues

Hi,

we experiment strange issue since few weeks. We have ESXi 5.5 Essentials with two servers connected by SAS on HDD MSA from HP.

Since few weeks, each hour. extactly at  xx:00 (01:00, 2:00, ...) everithing start to go wrong during five to ten minutes:

  • CPU of both servers grow of 50%
  • Storage latency grow to 500ms

but

  • RAM uage don't change
  • Network traffic stay the same

Environnement is monitored by Veeam One.

Any idea ?

Tags (3)
1 Reply
iiToby
Enthusiast
Enthusiast

Hi SimnetMSi,

Because of the round numbers xx:00 (01:00, 02:00) I would suspect some kind of scheduled task would be taking place. It may be inside of the VMs or outside, I might suggest looking inside the VMs as you may have noticed things in the VM console for outside.

Internal

  • Backups (If only 5-10min, it could possibly be a incremental backup)
  • SQL Maintenance plans (Maybe these are set to dump the DBs to flat files at these hours)
  • Virus Scanning (Servers are set to the do a virus scan, this would explain the high IO and the CPU involvement with no memory)
  • Defrag (Are some of the servers Windows and P2Ved or has some administrator configured defrag)

External

  • Storage System Deduplication (If you storage system has a deduplication process, that runs on the volumes late at night it can also account for high latency)
  • Backups and Snapshots (I have seen some poorly programmed backup tools, causing this type of problem as they release the snapshots)

These are few of the more common things to look out for, because you have Veeam ONE I suggest you dig deeper using its metrics and tools.

  • Find VMs with high IO usage and latency around that time and investigate further.
  • Check your datastore for latency metrics find out if it is all of them or only some of them

If you do find it is a process that runs and needs to run you can setup SIOC Storage IO Control (Assuming you have the vSphere Licensing) which will allow this process to run but mitigate the cascading effects on the other VMs in your environment.

Have fun

@iiToby