Hi Guys, the 'crisis' is over now, but I'm trying to get an understanding of why it happened. We have a customer whose entire infrastructure is virtualized. 14 virtual servers across three HP blade servers running ESX Std/Foundation, back-ending to an MSA2012fc SAN. The VC itself server is physical, but the DB is on the virtualized SQL server.
At the weekend we needed to perform some firmware updates to the blade infrastructure to resolve a known issue. Following the firmware update to the blades, blade #1 has failed completely (HP to resolve that with a new system board), but the real panic set in when we restarted the other two blades and they lost visibility of one of the LUNs on the SAN. After some research and reading we found that we could force a resignature/re-evaluation of SAN LUNs by making an advanced setting change and after a bit we got everything back up and running ok (albeit across two blades, not three).
The question is; why did this happen? According to the SAN Guide this can occur when things such as SAN snapshotting, mirroring, etc and that kind of thing result in a LUN being presented to ESX with the same signature as an already existing LUN. I think. Given that our SAN is merely storage only, and we do nothing at the SAN level, why did this occur and more importantly how do we prevent it from occurring again?
Well to begin with you have to make sure that the LUNs are presented to all the ESX servers in the same order.
Now the firmware update we are talking about here is this a Blade firmware update or the blade center firmware itself. In either the cases if you have taken the entire environment down. you should have disconnected the FC from the blade center.
If you were doing a blade-by-blade upgrade with out taking the entire environment down. you should have removed the zonning for the blade that will be undergoing the upgrade process. This is to avoid the FC/HBA cards to see the SAN during the upgrade process. (i.e. if you are upgrading the entire system components firmware with the HP smart boot CD)
Hope this is helpful.
We had updated the HP C3000 blade chassis onboard administrator firmware successfully earlier in the week to v2.41 I think it was. We were updating the blades by booting to the HP Proliant Firmware Maintenance CD v8.40 through an iLO mounted ISO. And no, the blades were not disconnected from the fabric, or zoned out, before we performed the upgrade. Advice taken, and we shall certainly do that next time. Not sure if that was the cause or not, but unless someone else can come up with anything else it certainly seems logical to me now (wish we'd thought of that when we did the update!). Many thanks for your quick response.