I have a very strange issue and that is when DRS (or even manually) migrates some virtual machines to other hosts, the amount CPU IO Wait metric becomes high for that machines. so I have to migrate that machine again and the metric become low. as far as I know this metric is related to a poor performance storage system or if cd rom is attached to the VM but all of my hosts are connected to the same storage systems. Is there any parameter that should I check?
Can any one help me in this regard?
CPU IO Wait is somewhat of a misnomer, it is basically wait - idle - swap wait for vCPU worlds (VMWAIT - SWPWT in esxtop terms). It's basically anything a vCPU / VMM can block on. That could be _non guest IO_ that has to happen like snapshot meta data updates but also resources that are held like a lock / mutex etc. which ultimately might take longer because of under performing storage but it doesn't have to be.
Sadly it is pretty hard to exactly identify what the vCPUs are blocking on and why without detailed debug logs (e.g. stats vmx, schedtraces, custom vprobes), hence why it probably makes more sense to eliminate possible caused in a methodological fashion. Also, just because the hosts share the storage, they don't share all of the fiber nor the IO devices / HBAs.