VMware Cloud Community
dmarshallx
Contributor
Contributor

Boot issues with ESX 3.51 (and ESX3.5) on IBM LS21 using QLA4022 and EqualLogic San

We installed ESX 3.5.1 on a new IBM LS-21 blade on an IBM BladeCenter H. The LS-21 is running

BIOS 1.05

Qlogic add-on card with BIOS 1.09a and Firmware 2.00.00.62

32GB of RAM

Two 3.0GHz AMD dual-core processors

The LS-21 blade boots from our EqualLogic SAN. During booting of this system when the CD or OS boots when it reaches the “loading qla4022…” message on the console, the system takes 16-22 minutes to continue. We saw the same problem loading ESX 3.5 and decided to try 3.5.1 after reviewing the release notes. However this does not seem to have improved the situation.

All of our other LS-21 systems are running ESX 3.0.2 from the same EqualLogic SAN and they do not exhibit this behavior.

We’ve monitored the activity on the EqualLogic SAN console. During the boot, the ESX 3.5.1 system repeats a pattern of logins and resets.

The pattern:

- Server logons to the SAN

- Server stays connected for 2 minutes, no bytes are reported read or written

- SAN Session is reset

This pattern repeats within seconds of the Session reset 5 or 6 times before the system stays connected and begins loading the remainder of the OS.

We took a sniffer network trace of this activity of this activity. The bios load completes around packet 150, the qla4022 loading message is about packet 3150 and the os continues loading around packet 5300, by packet 61000 the OS is loaded and up.

The Qlogic card is configured with:

Jumbo Frames

Manual IP (no gateway)

Header and Data digest

Target IP address and strings set

Is this a known issue? If so, what is the resolution.

0 Kudos
74 Replies
i2ambler
Contributor
Contributor

Progress at last.. I was able to install the older firmware and now the qla4022.o driver is loading.. However, when trying to do the install - esx comes up and says no drives are available. I have the lun mapped to the card and all advanced settings set to disabled. There is a local disk in the box as well on a perc6. however i want to boot from san. any ideas?

0 Kudos
adehart
Contributor
Contributor

You may be in a catch-22 situation as this is the exact problem you will have with the card when using the older firmware. The newer firmware was necessary for the card to be recognized by VMWARE. That is exactly what happened to me initially until I put the 3.0.1.33 firmware on the card.

0 Kudos
i2ambler
Contributor
Contributor

Yeah, i guess im SOL... I dont understand why something would be 'certified' and on the HCL yet be so difficult to get to work - if at all.. the qla driver will not load if the firmware is .33, but it will if its .27. if I do a lsmod it shows the driver, and a dmesg shows the card. Im not sure what the problem is here.... Im a little more than fustrated at this point - and not getting help from anywhere - except what i have gotten from the community.

0 Kudos
i2ambler
Contributor
Contributor

Progress at last.. I reflashed the cards to .33 and its found the luns.. yeah! Im thinking something about the included .33 firmware it did not like. So I have a bunch of cards to reflash. Too bad this took so long to figure out!! Thanks for the help, guys.

0 Kudos
RODDYM
Contributor
Contributor

I was able to resolve the issue I had with the Qlogic 4060 cards and IBM x3650 server by performing the following steps.

STEP 1

I downgraded the firmware of the cards to .27 revision. I was able to obtain the firmware from IBMs site at the following link:

Create a bootable ISO from the download and you can flash the firmware of the cards straight from the CD.

(VMware Support had be upgrade to the latest revisions)

STEP 2

Assigned IP addresses to the Qlogic cards through their bios utility at server boot-up. Of course, IP address and Gateway on the iSCSI network.

(VMware Support had me use default settings and not assign IP addresses)

STEP 3

Went into the bios of each card (CTRL - Q) at start up and under advance settings changed these default values per the read me text on the latest BIOS revision of the 4060 cards.

(VMware support had me reset to adapter defaults...I had to make the following changes.)

6.1.3. Advanced Adapter Settings

-


Delayed ACK

Enter to toggle between Enabled and Disabled. The default is

enabled.

MTU

Enter to select either 1500 or 9000. The default is 1500.

Primary Boot Data Digest

Enter to toggle between Enabled and Disabled. The default is

disabled.

Primary Boot Header Digest

Enter to toggle between Enabled and Disabled. The default is

disabled.

Alternate Boot Data Digest

Enter to toggle between Enabled and Disabled. The default is

disabled.

Alternate Boot Header Digest

Enter to toggle between Enabled and Disabled. The default is

disabled.

0 Kudos
TheCleaner
Contributor
Contributor

If I have no need to boot from SAN for the ESX host, can I disable IPv4 and IPv6 completely?

I've never done it before, always leaving them as DHCP.

How will the host communicate with the SAN then through iSCSI? (I'm new to this)

Does it use the initiator info and LUN info to find each other somehow?

0 Kudos
AlexNG_
Enthusiast
Enthusiast

Hi TheCleaner,

No, do not disable IPv4. You can disable IPv6 if not used. If you disable IPv4, the iSCSI hw initiator won't be able to comunicate with the storage.

AlexNG

If you find this information useful, please award points for "correct" / "helpful".
0 Kudos
TheCleaner
Contributor
Contributor

OK, thanks AlexNG.

I tried the settings after downgrading the firmware and still I get a long delay.

1. Is it possible to simply replace the HBA card for a different QLogic card that will work just as well without any errors?

2. Is there supposed to be a real fix for this ever? Some kind of firmware patch, etc.?

All I'm trying to do is get the server (x3650) up and operational WITHOUT connecting it to iSCSI storage yet (it will boot ESX locally).

0 Kudos
RODDYM
Contributor
Contributor

Assign an IP address to the HBAs. Make sure they are connected to a LAN.

iSCSI HBAs sometimes require network connectivity to initialize. If you don't have the separate VLANs and Cisco 3650 switch..best for iSCSI I have found...plug then into a linksys gigabit switch or something to get them working.

0 Kudos
AlexNG_
Enthusiast
Enthusiast

Hi TheCleaner,

I recently installed an environment where we got iSCSI with those hbas... it was a pain to reboot each node (x3650)!!!

If you just want to boot your esx without remota storage, just unplug the hbas.

Just in case, wich hbas do you have? And wich storage? Just remember that IBM iSCSI hbas are not supported with DS3300!

AlexNG

If you find this information useful, please award points for "correct" / "helpful".
0 Kudos
TheCleaner
Contributor
Contributor

Thanks Alex...it's the QLE4062C in a x3650 attached to an IBM N3300.

0 Kudos
RODDYM
Contributor
Contributor

The Cleaner,

I had the exact same configuration, servers, HBAs, SAN. Once I configured the HBAs in their CMOS with an IP address and then powered up the server ESX loaded and went past the initialization of the HBA in startup. The firmware revision I am using is .27.

If that does not work and ESX hangs at startup, wait about 15 to 20 minutes. It will finally timeout and continue to boot the OS. At least that has been the case with me everytime. Boot it...go to lunch...come back

If you have an issue with the configuration of the cards I posted this some months ago with another site I setup... These were the 4060 cards.

http://communities.vmware.com/message/978589

RODDYM

0 Kudos
AlexNG_
Enthusiast
Enthusiast

Right,

It was my case also!! But finally, after strange behaviours, I rechecked the HCLs, and found that the QLA4062c (IBM oemmed), was not supported with DS3300, so finally we changed the hw initiators for dual network cards and configured sw iSCSI initiaros.

AlexNG

If you find this information useful, please award points for "correct" / "helpful".
0 Kudos
RODDYM
Contributor
Contributor

Software Initiators???? Don't like the overhead with those but if it works it works. I also had issues with the 4062's and the DS3300. Chucked them and got the 4060's. Rock solid. Three IBM x3650 hosts, Dual Qlogic HBAs in Each Host, Cisco 3750 Gig Switch, and IBM DS3300 SAN. Running 27 VMs with no issues with room to grow.

A lot of times iSCSI performance problems can be traced back to the switch...kind of off topic. Always try to go with Cisco if you can get them to buy off on it. I run those switches seperate off the corporations backbone only connecting them to the network if the client wants to remotely manage them. I use VLANing to segment off iSCSI and mangment traffic to isolate VMotion, HA, and DRS to its own seperate VLAN within the same iSCSI switch. I them try to run redundant switches but budget always comes into play on those designs.

0 Kudos
TheCleaner
Contributor
Contributor

Mine isn't a DS3300 but an N3300 (or Netapp FAS270 OEM'd).

My setup is two x3650 hosts that boot with local storage. They are connected via 4062 HBA's directly to the 2 ethernet ports on the N3300. The N3300 isn't on the network at all, it is simply iSCSI storage for the 2 hosts.

I went this route for simplicity and the very unlikely chance of an HBA or NIC failing...

0 Kudos
RODDYM
Contributor
Contributor

So is this up and running? You still have to assign IP address to the HBAs and use crosssovers for the iSCSI HBAs on the SAN.

You still have a single point of failure...the controllers on the SAN and your ESX hosts have no redundant route to communicate with the other controller port. That is why you want to put a switch in the mix and then set ACLs and Resource Groups for both esx hosts an associate them with both ISCSI HBA ports on the SAN. If one port goes down you have a redundant path... Plus, this SAN look like it has only two Giga E ports. If both are being used for iSCSI how are you going to manage the SAN?

0 Kudos
AlexNG_
Enthusiast
Enthusiast

You're right Roddym, but in our case we had no choice....

If you find this information useful, please award points for "correct" / "helpful".
0 Kudos
TheCleaner
Contributor
Contributor

Yeah, you are correct Roddym about the single point of failure and the "how are you going to manage the SAN" part.

The SAN has a management port on it as well that I can use to manage it remotely...not the best idea, but it'll work for me.

I'm not sure what other choice I have to be honest. If I connect both of those GigE ports to a VLAN on the switch and put the HBA's on the same VLAN, I don't see what that buys me overall. I guess it buys me a little redundancy for the hosts communicating to the SAN, but that seems to be about it. I don't mind going that route, but I'm needing to be convinced that it really is the best option overall.

0 Kudos
AlexNG_
Enthusiast
Enthusiast

Hi TheCleaner,

If you put a switch and separate traffic with VLANs, you'll gain some performance so that normal network traffic won't be on the same network.That could be important if you have high network traffic.

AlexNG

If you find this information useful, please award points for "correct" / "helpful".
0 Kudos
TheCleaner
Contributor
Contributor

Yeah, understand Alex...I decided to go ahead and put them on their own VLAN, etc.

Question: In the VIC it only shows a single hba connected and only gives me the opportunity to assign a single IP address. Shouldn't it show 2 hba's or at least two ip addresses? I mean the HBA has two ports, so how does it know if a link fails or not on that HBA?

0 Kudos