VMware Cloud Community
Rohail2004
Enthusiast
Enthusiast
Jump to solution

dead path

Last night one of my storage path went offline after the physical switch being rebooted, but once the switch came back online, then the ESXi 4.1 was showing only one path available, and the second path as “dead”.  I rescanned the host and then “dead” path went away and it was showing “blank”.  I had to reboot the ESXi 4.1 host in order to restore dead path.

I logged into the ESXi through SSH and it was showing only two HBA's, and missing two, but after reboot it was showing all four HBAs.

My question is has anyone seen this issue in ESXi 4.1? path should have restored after rescan the storage adapter, but it did not work.  Why do I have to reboot in order to restore dead path?

ESXi 4.1, vCenter 4.1

thanks

0 Kudos
1 Solution

Accepted Solutions
kjb007
Immortal
Immortal
Jump to solution

I don't really have an answer as to why the HBAs disappeared on you, but the fabric login has caused annoyances for me.  That is why a reboot is requried, however.  In your case, instead of rebooting the host, you could disable/enable the switch ports themselve, which would also cause a fabric login to occur.  Rebooting is sometimes the easier option, if you have a different team handling the SAN switches.  It is more convenient to bounce the switch port, as you don't incur the penalty of an ESX reboot itself.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

0 Kudos
8 Replies
kjb007
Immortal
Immortal
Jump to solution

How many hops away is your storage from your ESX host?  Meaning, is your ESX host connected to the same switch as your storage?  If not, how many switches are between the two?

I've seen similar issues, with multiple or cascaded switches between storage and host, and when an uplink switch fails.  It causes a fault on the HBA, and until that HBA is "reset", a rescan does not result in the path re-appearing.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Rohail2004
Enthusiast
Enthusiast
Jump to solution

Only 1 hop.. ESXi hosts are running in Cisco UCS..so from the fabric interconnect directly into SAN switches.

We tested the failover in the past and it worked, but this time in production it failed... I had to reboot the host... how can I avoid this in the future?

0 Kudos
kjb007
Immortal
Immortal
Jump to solution

What jumps to my mind is flipping the ESXi port when the switch reboots.  It's odd, but when ESXi reboots, it's also performing a full fabric login, which is what I've done with an hba reset.  While this should not be occurring with a directly connected switch, it may be a login problem.

Can you look at the switch logs to see if the HBA failed to log back in after the switch reboot?

Can you look at the ESXi logs to see what kind of errors you received after issuing the rescan?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Rohail2004
Enthusiast
Enthusiast
Jump to solution

I am not seeing anything in ESXi logs.. it does not even show those missing HBA's exist.. only showing two HBAs as active... but after a reboot it showed

all 4 ..

after rescan it's showing no error at all.

I have not looked the SAN switch log yet.

0 Kudos
kjb007
Immortal
Immortal
Jump to solution

From the time of the failure, and before the reboot, do you see any timeouts in the ESXi logs?  Any SCSI events at all?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Rohail2004
Enthusiast
Enthusiast
Jump to solution

Nope.  Even i sent the logs to vmware and they're saying they don't see any error.  It only shows 2 HBA's active, and 2 of them dont even exist after the switch came back up.  It seems like the switch had a login failure as you stated.

0 Kudos
kjb007
Immortal
Immortal
Jump to solution

I don't really have an answer as to why the HBAs disappeared on you, but the fabric login has caused annoyances for me.  That is why a reboot is requried, however.  In your case, instead of rebooting the host, you could disable/enable the switch ports themselve, which would also cause a fabric login to occur.  Rebooting is sometimes the easier option, if you have a different team handling the SAN switches.  It is more convenient to bounce the switch port, as you don't incur the penalty of an ESX reboot itself.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

I had a similar issue with HBAs seeming to not talk to a switch after it was rebooted, I posted it on the HP forums here:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1467048

I would consider enable/disable a switch port a much easier solution for me, but that server is live and I can't really test it now. I actually thought of that, but never understood how that could be different from power cycling the FC switch.

It would be very nice if there was a way to manage this fabric logon automatically from vmware. I did actually find I could reset the HBA by unloading and reloading the kernel module. But it pulled both HBAs offline and was more effort as rebooting.

0 Kudos