! I have opened a ticket with support but so far they dont see anything wrong.
Something else is wrong. It's not a VM Ware issue, which is why they haven't found anything. It's not a VM issue because as you said your server rebooted. So it's hardware at that point in time. I suspect you have QLogic BIOS enabled, and the HBA's are doing something weird to your Fiber switch, and that's what is causing the LUN's to go offline.
At any rate, we have 2950, Netapp, Fiber, been at this for almost 4 years.. NEVER seen this before, we have almost 30 ESX hosts. This is a hardware issue, or maybe it was just a fluke.
The manual ALSO states that ANY time you perform an upgrade you ALWAYS disconnect from SAN. So it's a hard lesson to learn, but follow ALL the instructions for ESX upgrades / updates and check the HBA for problems.
This is a hardware issue, not a result of the ESX update. It was coincidence that it happened, and a reboot of the ESX before the update probably would have done the same thing.
thank you for your reply, but I had very recently rebooted this server shortly before migration and once the weekend before when doing some unrelated maintenance. rebooting the server under ESX 3.5 never caused this problem in the past. The BIOS is NOT enabled, and I also have much experience and a very large environment (over 50 hosts since 2004!) and like I said, this is the first time this has happened to me, but so far it appears that this server is now crippled and to get it back to normal I have to re-install ESX 3.5. I dont have an issue doing this but I was just hoping someone knew of some HBA/ESX version/Ontap or some other incompatibility that I have not yet discovered. I have yet to disconnect from the san and have a problem with the in place upgrade process, but I can see how this might prevent a failed upgrade, I dont see how it relates to the problem I have right now.
thanks anyway,
-d
I also wanted to add that the crashing happens after ESX attempts to start, not before the OS loads.
it hangs on this statement for over 15 minutes, during this time it makes the storage unavailable for the other ESX hosts.
storage-drivers...
Starting Path Claiming and SCSI Device Discovery...
eventually it boots up but not with the storage and the rest of the hosts get disconnected.
This is why I like iSCSI. Have you checked the zones and everything on the switch lately? Dealing with fiber in the past most of my issues spawn from the switch in the middle. I would investigate there. Hope this helps.
Just an FYI, I had to upgrade Ontap to resolve this issue. Nothing would resolve it, I upgraded from 7.2.2 to 7.2.7, after that everything was golden. NetApp support stated that officially 7.2.7 is the earliest known tested version of Ontap to be used with ESX 4.0, but that some earlier versions like 7.2.6.1 or 7.2.5.1 may work as well. Indeed I have other sites with 7.2.5.1 running FC with ESX 4 with no problems, so I suspect that 7.2.2 is just too old of a version to work with ESX 4.0. thanks for the suggestions.