VMware Cloud Community
TJGalat
Contributor
Contributor
Jump to solution

IO Wait State over 96%

I have a 6 node cluster with DRS running and 2 ESX hosts are experinceing over 96% iowait. How can I read either a "top" or "esxtop" to determine what process(s) are using up all the CPU cycles? Attcahed is the current process running - any help would be appreciated

0 Kudos
1 Solution

Accepted Solutions
nick_couchman
Immortal
Immortal
Jump to solution

You shouldn't be looking for a process using up CPU cycles, you should be looking for a process or VM doing heavy disk, network, or memory writes. I/O wait is the percentage of CPU cycles spent waiting on something else to occur - usually disk transactions. The following process is a bit odd:

root 11645 11266 0 2008 ? 00:00:00

Were you doing something on the service console with vmkfstools? If so, it looks like it may have hung up. Also, you have several copies of the vmbackup script running, along with several defunct crond processes. I'm going to guess your backup routine/script is not working properly, and this is probably at least one of the things contributing to the high I/O wait. Unfortunately, you probably have a process hung on I/O, which are very, very difficult to get rid of. If you're running a cluster, it's time to migrate your VMs over to the other ESX machine and reboot. Then you'll want to spend some time figuring out why your backup script never exits properly.

View solution in original post

0 Kudos
4 Replies
vmid
Contributor
Contributor
Jump to solution

This might help you...

KB Article 1003496

VmVic
drummonds
Hot Shot
Hot Shot
Jump to solution

Did you read the documents on this community? Any questions on them?

Scott

More information on my blog and on Twitter: http://vpivot.com http://twitter.com/drummonds
nick_couchman
Immortal
Immortal
Jump to solution

You shouldn't be looking for a process using up CPU cycles, you should be looking for a process or VM doing heavy disk, network, or memory writes. I/O wait is the percentage of CPU cycles spent waiting on something else to occur - usually disk transactions. The following process is a bit odd:

root 11645 11266 0 2008 ? 00:00:00

Were you doing something on the service console with vmkfstools? If so, it looks like it may have hung up. Also, you have several copies of the vmbackup script running, along with several defunct crond processes. I'm going to guess your backup routine/script is not working properly, and this is probably at least one of the things contributing to the high I/O wait. Unfortunately, you probably have a process hung on I/O, which are very, very difficult to get rid of. If you're running a cluster, it's time to migrate your VMs over to the other ESX machine and reboot. Then you'll want to spend some time figuring out why your backup script never exits properly.

0 Kudos
TJGalat
Contributor
Contributor
Jump to solution

Thanks to all. There were sevreal hung vmbk processes, killes them and now reworking the backup script. Also the document likes provided are oustanding - thanks

0 Kudos