Re: FC Failover and reset

Josh26 · ‎01-12-2011

Hi,

We have been performing testing on a new environment. It is running ESXi 4.1 in a C7000 Bladecenter with FC interconnects. The multipath mode has been configured as Round Robin.

During our testing, we hot pulled one FC switch. Failover worked as expected, that is to say, no downtime and all, and logs just reported paths on one vmhba being lost.

We found however that when we plug the switch back in, the additional paths do not come back. Please note rescanning the adapters is the first thing we did and it does not correct the issue.

The FC switch reports all internal ESXi connected ports as "connected" but does not detect the WWN of any attached HBAs. After rebooting an individual host, the switch now detects the attached HBA, and VMWare reports the path as back alive on that host. The SAN's HBA reports as attached immediately after the switch is turned on - hence we feel the issue is with VMWare.

It would probably be acceptable to reboot each host if a failback (not an accurate word really for an active/active environment) is required after a complete switch failure, but ideally we would avoid it.

I did find a suggestion to unload the kernel module for the hba and reload it, but this requires enabling Technical Support mode, and once done it errors with "symbols in use".

mpverr · ‎01-12-2011

We have the same configuration:

4.1i; c7000; bl680g5

however, we found that the paths show back up. Have you ensured the bios of the blades; vc modules and oa are all up to date from HP?

Josh26 · ‎01-12-2011

Hi,

All blades were updated to latest firmware using Smart Update DVD 9.2, B-Series san switch is running v6.2.3b. I then checked for specific updates that postdated this DVD but only found the iLO relevant. VC is up to date - but we are not using that in the fibre connections. We can confirm that when pulling the a Flex-10 module, the network does not experience this issue when it is reconnected.

ESXi was updated with Update Manager as of today and then applied HP nmi and agents bundle. We just tested this with the second san switch and replicated the issue.