VMware Cloud Community
gittro
Contributor
Contributor

Boot-from-SAN problem after changing HBA and WWN

Hello:

I've got an IBM BladeCenter H and IBM HS21XM blades linked via SAN (Brocade) to a Sun 6140 SAN.

In order to increase the number of netwok cards available per blade, I modified the cards that were inside my HS21XM blade that was acting as a Vmware ESX V3.5 host. It's Boot-from-SAN since that blade has no local hard disk. It all worked beautifully until I tried to reboot with the newly configured cards. I followed this process:

1) Modified Brocade switch I/O ports in BladeCenter to accommodate more NIC modules. This part worked fine.

2) Removed old HBA in blade, and installed new NIC module, and new dual NIC/HBA card. This gave the blade a new WWN and HBA model.

3) Updated FC switch configs so that the SAN could "see" the new WWN.

4) Created new initiators on the SAN and linked them to the boot-from-SAN partition on the Sun 6140.

No problem there - the blade could "see" the SAN boot partition, etc....evertyhing could "see" everything just fine via the Ctrl-Q (QLogic HBA config) interface on the blade.

I then tried to boot the blade so that it would see the Boot-from-SAN partition and load the VMware ESX V3.5 kernel. This part failed - it didn't think the partition was right.

NEW INFO: I've confirmed through the ESX V3.5 I/O Compatibility Guide that the driver should NOT change due to the hardware changes I made.

I'm going FROM:

QMI2472 QLogic 4Gb FC Dual Port BladeCenter Expansion Card: Driver: qla2300_707_vmw Version: 7.08‐vm32

TO:

QMI3472 QLogic 4Gb FC Dual Port BladeCenter Expansion Card: Driver: qla2300_707_vmw Version: 7.08‐vm32

Any ideas? I'm hoping I'm just missing a step somewhere, since if Boot-from-SAN becomes a problem as soon as a piece of hardware changes, that's not going to be a good thing (e.g. if you wanted to fail-over to a different blade to boot the same ESX host from an existing Boot-from-SAN partition).

Interestingly I had the SAME issue with another Blade that was booting-from-SAN Windows 2003...it did not like the Boot-from-SAN partition after I changed the HBA for a dual HBA/NIC card (different WWN, etc..). I assume for the Windows 2003 server, it's a driver issue. It doesn't appwar to be the same issue for ESX 3.5 boot-from-SAN.

There is a change in slot, bus and device nujmbers associated with the new QMI 3472 QLogic card..

i.e. via the Ctrl-Q QLogic HBA setup interface I went FROM:

Adapter Type

Address

Slot

Bus

Device

Function

QMI2472

5000

01

08

01

0

QMI2472

5100

01

08

01

1

TO:

Adapter Type

Address

Slot

Bus

Device

Function

QMI3472

5000

03

0C

00

0

QMI3472

5100

03

0C

00

1

MORE INFO:

One additional point of clarity: I know the SAN and FC fabric and BladeCenter connectivity is all "there" even with this Boot-from-SAN issue since, when I boot the blade, it "sees" the boot partition just after the Ctrl-Q prompt (for QLogic HBA setup)....so it SEES the Boot-from-SAN partition, but doesn't seem to like it...

If I reverse out all the changes, everything boots up fine from the SAN.

So...any idea WHY the blade "looks" at that Boot-from-SAN partition and refuses to load it? In the kernel setup, what aspect is hard-coded? If I KNOW the WWN's BEFORE the transtion, can I pre-populate some setting(s) in VMware to avoid a Boot-from-SAN rejection after the hardware change???

Worst case I can always build some new Vmware ESX Hosts (on the NEW config) and then link those to the VM partitions on the SAN - only "work" would then be to build a new host and re-define all the VM configs. A pain in the %&^$%$% but not impossible. BUT, I'd still like to UNDERSTAND how this is supposed to work, or does it simply NOT work???? i.e.Boot-from-SAN FAILS if you change the underlying Blade HBAs??? Is this true???

Any suggestions or insights are greatly appreciated.

Many thanks.

Reply
0 Kudos
2 Replies
kgottleib
Enthusiast
Enthusiast

Hi there - I don't have an answer to your problem, but if you could, I would like it very much if you could share some of your experiences with Boot from SAN. I'm not a proponent of boot from SAN due to the added complexity and would like to understand more about the pros and cons. For example:

How do you boot from SAN at your DR site if your hardware is different and what are some of the complications that can arise?

My associate director wants me to explore this option for him and so far all I see are Cons because the only pro so far, as far as I tell, is a faster recovery in the event of an outage, but in order for that to happen the hardware would need to be identical as well as the environment itself so that additinal configuration changes aren't needed.

Your insite is appreciated.

Gman

Reply
0 Kudos
gittro
Contributor
Contributor

I find Boot-from-SAN very useful, personally. I was afraid of it at first, when you read all the caveats, etc....but I've got a pretty simple design - one IBM BladeCenter H with 13 identical IBM HS21XM blades (all bought at same time with identical firmware and component levels) - so booting a different blade (in a fail-over) from a single Boot-from-SAN image isn't a huge issue for me - no hardware compatibility issues to worry about.

Also, most of my Boot-from-SAN images are of VMware ESX V3.5 - which isn't a particularly complex install - so worst case, creating a new Boot-from-SAN partition isn't that intimidating. All VMFS partitions are safely in other LUN's.

I like the saving in local hard disk costs (no RAID controller requirements, etc...plus with HS21XM blades you can only install a SINGLE NON-hot swap hard disk (unless you connect it to an expansion blade - which ends up taking up TWO BladeCenter slots for ONE server - not a great use of limited real estate)...so I save some money and BladeCenter real estate and get some additional O/S partition disk fault tolerance by utilizing Boot-from-SAN.

If you have a complex and diverse hardware base, then I could imagine that Boot-from-SAN may have limited use from a fail-over perspective....my prime rmotivators were savings on RAID controllers and savings on BladeCenter H slots for our relatively small environment.

The complexity of Boot-from-SAN somewhat disappears with familiarity, as with anything else...I'm not too intimidated by the concept any more. It has advantages and disadvantages...it's no panacea, but it's still pretty "cool" I guess - no local hard disks in the blades, etc....and if a blade fails, put an identical blade in the same slot and power it up - it should boot from that same partition (same server name and IP, etc...) as long as your FC switch config is correct, etc...and have the right initiators defined on the SAN end - it's not a HOT swap under my current config (I'll get there some day) - but downtime is minimal - after a few FC and SAN config changes.

One caveat for Boot-from-SAN if you have abusy SAN is ensuring sufficient I/O througput to never have the blade lose connection to it's boot drive partition on the SAN - if you have heavy SAN I/O you cold have sufficient latency for the O/S to panic about losing connection to its O/S Boot-from-SAN partition....so some SAn analysis is required....our SAN doesn't work too hard (Sun 6140) with only 400 users overall to support (amongst a few dozen - mostly low utilization, servers). But you need SOLID/STABLE I/O between the server and its Boot-from-SAN partition - so that's one thing to confirm as part of planning. Do a test server first and see how it behaves with high SAN I/O.

Hope this helps....something to play with in a PREPROD or DEV environment first - to see if the effort is worth it or not.

Reply
0 Kudos