If I have 3 hosts, all connected to the same SAN, and the hosts' HBA fails causing it to lose connectivity to the SAN, will VMware automatically start bringing up the VM's on one of the other hosts?
I understand that this works when the host itself has a failure but is HA clever enough to detect SAN connection failures?
Thanks
No. It's not part of HA jobs to monitor SAN connectivity. I believe it's on the wish list for future updates.
I'm about to post a situation that happened to my environment last week along these lines. An FC switch panicked, rebooted itself, this created a zombie state on LUNs connected to SP-A of our EVA8100 array which brought ESX DOWN!! HA did nothing. All of our Windows hosts were able to recover fine but ESX was completely hosed. If you're not prod yet test these scenarios. Reset an FC switch port, reboot a switch, disconnect an array controller (if you can), etc. To answer your question though, the VMs should never go down, if ESX can find an alternate path to the LUNs then you should be fine. If you have redundant HBAs in each host there would be no need to fail to another host.
No. It's not part of HA jobs to monitor SAN connectivity. I believe it's on the wish list for future updates.
Excellent stuff (well not excelletn but you know what I mean
) Thanks guys!
Just one other thought, if say host 1 running 3 VM's loses its connectivity to the SAN, I would have to manually Vmotion them to another host in order to recover the SAN connectivity. However would there still be a disklock in place on the LUNs that host 1 was accessing?
No, this is not a HA feature, if you want the ability for continued running under a HBA failure you will require redundant HBA's, switches etc.
Tom Howarth
VMware Communities User Moderator
Possibly on the disk lock but if a host loses SAN connection you won't be able to vmotion anyway.
I thought you would still be able to manually move the guest VM's to another host as the other host would have visability to the LUNs that the original host could see. If you can't isn't this potentially a major flaw, unless someone has a work around?
Sorry, yes, if HA doesn't move them, there are no locks on the LUN, and the surviving host can see the LUN/VM then you can vmotion to repoint the VM's host.
Same answer as everyone else .... No. But, I've tested it. Not intentially. ![]()
Thanks everyone! I've now persauded our PM to purchase 3 extra HBA's! ![]()
My issue wasn't the HBA. It was the fiber switch on the back-end. I've never seen a fiber card go bad. For me, it's always been the switch at the back end. My "Test" was a switch just dying and the backup switch not being zoned correctly. On Windows servers, I've seen bad ports on the switch or misconfigured ports on the switch but I've never seen an issue with an HBA.
redundant HBA's on there own will not prevent a failure as you describe, you will also need a redundant fabric to your SAN. Therefore Dual Switches and multiple paths to each LUN in the SAN and redundant paths from the SAN to the fabric
Tom Howarth
VMware Communities User Moderator
You can read more details on SAN architecture and design guide here to give you more insights. Generally speaking, nowadays people architected their SAN using Redundant Fabric Channel Switch with multipath HBAs solutions so both end should be covered as well. At a minimum, each ESX host should have 2 HBA connected to 2 different paths to the SAN for redundancy.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
iGeek Systems Inc.
VMware, Citrix, Microsoft Consultant
Thanks - I realise you need redundant switches and multiple paths to the SAN. This was a project I've only just come on board with, the preivous design was setup like that but whoever did it only allocated one HBA to each host. I just needed to verify that HA wouldn't detect a failure of the connection to the SAN if the single HBA failed.
Cheers
