VMware Cloud Community
desperadeo
Contributor
Contributor

ESX hosts disconnected from the vCenter because of poor storage performance ?

Hello there,

We have recently implemented a EMC storage device (in substitution of old SAN devices) and after the migration of all VMs to this new device, we are facing some issues every Friday night (high-IO process such as full backup or antivirus scanning).

Some ESX hosts are disconnected from the vCenter Server because of a locked LUN by an ESX host.

I can use some vCLI command to identify the lock and release it. The ESX hosts are part of a VMware cluster configured with DRS (full-automated) and HA options.

After monitoring the SAN performance datas, I notice that the disks are overloaded (up to 400 IOPS for SAS 10K disks) and obviously the response time is huge.

Last Week-end, I could identify the process leading to the incident. Because of the weekly processus, the full-automated DRS forced some ESX hosts to proceed vMotion operations for some VMs and unfortunately the operation couldn't achieve for one of them and the LUN (which hosted it) remained locked. As consequence, some ESX hosts could not access to this LUN and after a while are disconnected from the vCenter Server.

We actually use a 4.1 ESX version and the ESX host don't manage very well the storage path loss (APD) and use intensively the SCSI-2 reservations for VMFS operations. So I can understand these symptoms.

But my question is : Can the degraded performance of the storage contribute easily to a locked LUN during a process such as a vMotion ?

Thank you for your help.

0 Kudos
0 Replies