VMware Cloud Community
PK3030
Contributor
Contributor

ESXi 5.5 raw disk performance

A few months ago, our vSphere Data Protection Appliance backups started failing and I discovered that our disk performance had tanked (write speeds as slow as 2MB/s).  This is NON-CACHED disk performance I am referring to here.  Operations impacted are backups, cold migrations, etc which all exhaust any cache you have very quickly.  VM operations are all nominal (presumably because caching is doing its job). 

We run three vSphere/ESXi 5.5 hosts connected by two redundant 10Gb/s switches.  Management is via a vCenter Server Appliance.   Our primary storage is Virtual SAN but we have local VMFS datastores on each host and a FreeNAS that all exhibit the same slowness.  The condition presents from within VMs right down to the ESXi CLI so it feels like some ESXi parameter that has changed but we have done no updates to the environment this year.  Found Disk.BandwidthCap and Disk.ThroughputCap but these are all still set at default (2^32).  Likewise, all the VMkernal adapters are running with traffic shaping disabled as they have been since we brought the environment up in 2015.  Just thought I would through this out there before I open a ticket with VMware support...

Reply
0 Kudos
2 Replies
RParker
Immortal
Immortal

This is NON-CACHED disk performance I am referring to here.  Operations impacted are backups, cold migrations, etc which all exhaust any cache you have very quickly.  VM operations are all nominal (presumably because caching is doing its job).

They all share a commonality.. the network or switch.  If you used different storage, hosts and VM's then the problem is likely your network..

Or the storage itself, setup a new NFS export on that storage (or iSCSI) and from a PHYSICAL machine not part of VMware try and copy files to it.

You get same slow performance?  If yes that's your problem, if not, network or switch configuration is the culprit.. I wouldn't put this on VMware just yet...

VMWare is a lot like networking it's easy to blame because it's the backbone for everything but in my experience VMware has never really been the cause, maybe a few bugs here and there but ultimately it is user configuration and or some other problem out of VMware control...

Reply
0 Kudos
PK3030
Contributor
Contributor

Network was my first suspect but I can pump a sustained 1.13GB/s between all 3 vmnics on both switches.  And writing from ESXi command line (the old dd test) directly to a local datastore exhibits the slowness problem with no vmnic activity whatsoever.  These two factors combined have me fairly convinced this has nothing to do with network.   Additionally,  slow I/O from ESXi  to local datastores is consistent across all three ESXi hosts and began at the same time.   It is not reasonable to suspect three simultaneous independent failures of the underlying storage.   Logical troubleshooting would dictate looking for commonality.  And what they have in common (beside the network) is the vCenter Server Appliance so, although I am not ready to indict VMWare, it is certainly my top suspect.

Reply
0 Kudos