VMware Cloud Community
thomps01
Enthusiast
Enthusiast

ESX 4.x & HP Flex-10 v2.30 firmware issues

Anyone else seeing the issue below?

We are using HP BL685G6 blades but I think this issue also happens with the half heights.

Virtual Connect Flex-10 modules running firmware v2.30

ESX 4.0 or 4.0 U1, either shows the issue.

What we're seeing is when an uplink from the Flex-10 modules is pulled out, the status of the vmnics does not change. We should see the vmnics as 'down' but they continue to show as active.

Although an alert is shown in vCenter to say redundancy is lost, you do not see and red crosses by the cards in the networking configuration screen.

Because of this issue, unless you configure network failover to use beacon probing, you will lose connection if a link goes down.

Also, if you're planning to use the Nexus 1000v (we've tested this) there isn't an option for beacon probing,

0 Kudos
34 Replies
DSeaman
Enthusiast
Enthusiast

Ya, I'm pretty disappointed in the support and timely updates for this driver. The HP blades and virtual connect are supposed to be hot sellers and widely deployed. You'd think HP and VMware would be more responsive to problems and getting updated drivers to customers.

Derek Seaman
0 Kudos
conradsia
Hot Shot
Hot Shot

I've been running virtual connect for years now with no issues, the problems appeared with Flex-10 and now the DCC support needed for smartlink. I didn't use smartlink at all as it really wasnt needed since I was controlling the connections and failover at the chassis level but now that I am trying to get faster failover times by having the blades failover the connections when the switch goes down I need smartlink to work to accomplish this. I also need it in order for a channeled vswitch to work. But so far I havent had any issues with the new driver.

0 Kudos
mattkr
Contributor
Contributor

We just installed ESXi 4.1 and now have seemed to solved the PSOD issue, but lost SmartLink capabilities. Has anyone tested with 4.1 as of yet. The driver installed looks to be completely new...

/var/log # esxupdate query --vib-view | grep -i bnx2x

deb_vmware-esx-drivers-net-bnx2x_400.1.54.1.v41.1-1vmw.0.0.260247 installed 2010-05-19T00:12:49+00:00

cross_vmware-esx-drivers-net-bnx2x_400.1.52.12.v40.3-1.0.4.00000 retired 2010-07-22T11:49:12.679307+00:00

0 Kudos
DSeaman
Enthusiast
Enthusiast

We haven't tried 4.1 yet, but if it's true that VMware broke SmartLink, then that's a major problem. Now that v4.1 can do more granular network-level "QoS", I'm wondering if Flex-10 is now "outdated" going forward? With all of the issues with HP Flex-10, it almost seems more trouble that its worth given where VMware is going with I/O controls.

Derek Seaman
0 Kudos
bebman
Enthusiast
Enthusiast

There is another thread that is going that is discussing the same issues: http://communities.vmware.com/thread/273033

To that point, HP has just released a Customer Advisory about not using SmartLink with ESX 4.1 unless you are using Beacon Probing.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02476622&jumpi...

So your choices are this - If you want SmartLink, use an ESX version lower that 4.1 and run with downgraded NIC firmware or don't use SmartLink and you can run all the most recent version. Way to go HP - not staying up with the current versions.

Oh also and lastly, if anyone didn't know, the G1 blades are NOT on the ESX 4.1 compatibility list. So if you were thinking about using them for DR or lab use - not with ESX 4.1. Even though the chipsets for most of the G1s are similar to equivilent supported rackmounts, HP chooses not to certify them. Thanks again HP.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
KFM
Enthusiast
Enthusiast

Interesting, bebman - thanks for your contribution, have just read the other similar thread and it looks like HP and VMware have a lot of frustrated customers!

I think for now I will just stick with ESX 4u2 but with the older 1.48 driver. It didn't have smartlink/DCC but at least it didn't PSOD my hosts. In my environment, a PSOD was more likely to happen (due to our workload characteristics) than an uplink failing (and thus VMware failing over to the other active path). I don't even think I'll upgrade to ESX 4.1 yet - it seems more problematic and hopefully a new driver will come out before we require the new features of 4.1!

Cheers,

KFM

0 Kudos
bebman
Enthusiast
Enthusiast

Okay, somebody just shoot me know - less misery. Beacon probing, as recommended by HP, does not support LACP (802.3AD) ethernet trunks. My configuration has 4 - 1GB ports per side of the chassis trunked together for uplinks to the physical switch. I thought maybe since the trunk was on the upstream side of the Flex-10, beaconing wouldn't know. This turned to be about as fun as a root canal. The hosts and the VMs started to "flap" from side to side with no warning and during the changeover would be unreachable on from the outside of the chassis. This took about 2 hours to correct by catching all at the correct time. I am going back to an active/passive configuration.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
julianwood
Enthusiast
Enthusiast

If you are going upstream to Cisco Nexus switches you can try spread your LACP group across two switches using a virtual port channel. This should mean it would be very unlikely for your upstream links to go down as they are across multiple physical switches yet still in a LACP group so aggregated and presented as a single Flex-Nic down to your blades.

http://WoodITWork.com
0 Kudos
bebman
Enthusiast
Enthusiast

That is correct but this puts you in an active/passive configuration on the uplinks. If you want to take advantage of the bandwidth offered by both sides of the chassis with an active/active configuration and your host is controlling NIC failover when an issue occurs, you have to look at what the VMware host can do, what VirtualConnect do and what the upstream switches can do (Nexus or otherwise). Besides the fact that the perfectly good Catayst switch will NOT be replaced at this time just because Cisco wants more revenue in their coffers.

Virtually,

VMware Certified Professional

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.

NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
0 Kudos
DSeaman
Enthusiast
Enthusiast

Actually with the Nexus line you can "cluster" two switches and have a LACP channel split between switches and they are both active.

Derek Seaman
0 Kudos
julianwood
Enthusiast
Enthusiast

That's what I had meant. vPC is a virtual port channel which spans 2 switches. For LACP to work you need the uplinks coming from a single flex 10 switch though.

I always try to make all links active otherwise you are just wasting ports especially with 10 GB. What I do it make a single esx switch with 2 nics with each side through a separate virtual connect Ethernet network. Using port groups and active and standby links send Ian traffic over one side and nas storage traffic over the other but they can use each other for failover which would require quite an outage when spanning the uplinks between Nexus.

http://WoodITWork.com
0 Kudos
DSeaman
Enthusiast
Enthusiast

Yes, that's what I basically do as well. But we have two stacked chassis, each with two flex-10 modules. So I have one network uplink off the left bay in chassis 1, then the other network uplink on the right bay of the other chassis. The stacking links take care of intra-chassis comms so both LACP channels are active/active.

Derek Seaman
0 Kudos
julianwood
Enthusiast
Enthusiast

That's the way it should be done. I think people often don't take advantage of the chassis links or don't have enough input from networking people and as server people see each chassis as separate entities.

We have 3 chassis linked together per rack so you have a chassis between your two but with no uplinks but you can go up to 4. We went with 3 per rack due to power / cooling.

I really should write a post about our set up as an example as I've found the Ethernet cookbooks are maybe a little to broad. You can often do more with less and the cookbooks want to sell the benefits of Flex 10 by using all the Flex Nics when you don't need to.

http://WoodITWork.com
0 Kudos
vigen
Contributor
Contributor

Hi,

We are experiencing this problem as well on ESXi 4.1. Could you please advise how you roll the driver from ver 1.54 back to version 1.48 in ESXi 4.1 please.

Thanks

0 Kudos
ViFXStu
Contributor
Contributor

New driver released just now for 4.0, version 1.52.12.v40.8

The driver for 4.1 came out last week, version 1.60.50.v41.2

==

Will be testing real soon on 4.0 U1

0 Kudos