VMware Cloud Community
MariusRoma
Expert
Expert

HA and vmhba failure

Imagine a vSphere 5.x Enterprise infrastructure based on 2 ESXi nodes.

Both nodes boot from local disks and are connected vis iSCSI to the same LUNs; HA is enabled.

What are the possible effects if all the vmhba on an ESXi host fail while VMs are running on such host?

I presume that ESXi will go on working on both nodes because ESXi is installed on a local disk and that both nodes will be able to communicate via LAN using other network interfaces.

Can I presume that HA will move the VMs from the host that lose the link to their LUNs to the other host? Is it a standard HA feature?

If one ESXi node fails HA moves VMs to other nodes, but what if an ESXi node simply loses the connection to the LUNs where the VMs reside?

To let the VMs go on working should I implement any additional component and/or feature?

I would like to make a test in a lab, but it’s quite difficult to reproduce such possible problem…

Regards

marius

2 Replies
tomtom901
Commander
Commander

There are some options that you can use for this, depending on your vSphere version. Starting from vSphere 5.0 U1 there are some changes made to the process of handeling a PDL (Permanent Device Loss) in an APD (All paths down) situation, which is basically what happens when a vmhba fails. Without quoting the entire HA whitepaper, page 9 and 10 of the following PDF might be able to give you a better understanding of how this works.

http://www.vmware.com/files/pdf/techpaper/vmw-vsphere-high-availability.pdf

Reference documentation:

http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc%2FGUID-AA39FBEF-...

Gortee
Hot Shot
Hot Shot

In addition to the PDL situations you have to consider that we are talking about iSCSI if you don't have other nic's in play you have have a HA failure.  Since both storage and network will be down your host isolation response will come into play.  (Default in 5.1 is Leave VM powered on)  for iSCSI you really want to have a power off situation.  

What are the possible effects if all the vmhba on an ESXi host fail while VMs are running on such host?

->Vm's state will be in a failed state.  If you still have networking then PDL will take place.  If you don't have networking then HA host isolation will take place.

I presume that ESXi will go on working on both nodes because ESXi is installed on a local disk and that both nodes will be able to communicate via LAN using other network interfaces.

->The operating system ESXi will continue to run which is good in both situations.

Can I presume that HA will move the VMs from the host that lose the link to their LUNs to the other host?

->Depends on if you have networking.  If you networking is still up and HA is configured then restart will be tried.  If you don't have networking then Isolation response again.

Is it a standard HA feature?

->Yes but it depends

If one ESXi node fails HA moves VMs to other nodes, but what if an ESXi node simply loses the connection to the LUNs where the VMs reside?

->Look at PDL Document linked

To let the VMs go on working should I implement any additional component and/or feature?

->It depends.  If you want vm's to keep working without storage that is impossible.  If you want VM's to fail over to another node when storage is gone then look into the PDL document it will tell you what to enable.

I would like to make a test in a lab, but it’s quite difficult to reproduce such possible problem…

->Best way to reproduce this would be run a nested ESXi environment when you can shut down nic's as needed. 

Joseph Griffiths http://blog.jgriffiths.org @Gortees VCDX-DCV #143