Solved: Re: unavailable ESX after a rescan

emavmware · ‎05-07-2009

Hi,

We had an issue with our infrastucture after an HBA rescan. ESX became unresponsive, and we couldn't access datastores.

I think I may be close to the problem. Do you know if we can connect one ESX to different storage devices? for example, one ESX to an IBM DS4700 and also to an IBM DS4300? If so, how do we have to set up zoning? does it have to be one zone for every storage device, or can we create only one zone to contain both?

Thanx

CiscoKid · ‎05-07-2009

You are probably using ESX 3.5, here is an updated link to a more current resolution to determine/resolve issues with IRQ conflicts:

Please award points if solution is accurate. Thanks.

View solution in original post

CiscoKid · ‎05-07-2009

When you say "unresponsive" are you able to ping and/or ssh into the ESX server? If not, you may have a classic case of an IRQ conflict. I had a client that had HP Proliant BL480c blades that had the same issue. Turns out that you need to disable the USB1.1/2.0 controller drivers from loading into the service console to remove the conflict. Ever since the change the rescans have worked like a champ. Another bit of advice is that if you are not using legacy VMFS volumes from ESX 2.x then you can also unload the VMFS2 driver from loading. This should also speed things up during your rescans. Lastly, if there are no new VMFS volumes to scan, try unchecking the bottom check box during the HBA rescan in the VI client. Hope that helps.

depping · ‎05-07-2009

It's supported to have multiple arrays attached to an ESX host. I would recommend though to make your life easier to use unique LUN id's. It's not a requirement if and when your arrays use NAA identifiers, but from a management perspective it's definitely a smart thing to do.

Anyway, it shouldn't be causing the issues you are seeing. Take a look at the following KB though:

In terms of zoning, Single initiator / single target zoning. Meaning that every HBA will be in a zone with a specific storage proc. It's a lot of work but it will reduce the amount of overhead on your fc.

Duncan

VMware Communities User Moderator

-

Blogging: http://www.yellow-bricks.com

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

CiscoKid · ‎05-07-2009

You are probably using ESX 3.5, here is an updated link to a more current resolution to determine/resolve issues with IRQ conflicts:

Please award points if solution is accurate. Thanks.

emavmware · ‎05-08-2009

CiscoKid, I think you hit it.

The link you sent is almost a copy of what I'm having in my ESX.

Thanks so much, I'm going to make all the corrections suggested

CiscoKid · ‎05-08-2009

I am glad to have delivered accurate information. The thing with HP is that there is a feature called Automatic Server Recover or ASR and it is enabled in the BIOS by default. What happened is that the rescans would lock the server up and then ASR would kick in after 10 minutes of being unresponsive and would reboot the ESX host. We looked at different options like disabling USB1.1/2.0 in the BIOS but then you couldn't use iLO effectively as a result. So we just ended up removing the drivers from the modules.conf. The only downfall with that is that you can't load virtual cdroms/floppies to the console. It was a trade for a more stable system that the client was willing to accept. Keep us posted as to the resolve. Thanks.

All

unavailable ESX after a rescan