VMware Cloud Community
christian_ellge
Contributor
Contributor

PSOD on DELL Poweredge 29xx running ESX 3.5 and RESCAN Storage Adapters or VMFS

Hi,

have an issue with DELL Poweredge 29xx and ESX 3.5.

Installes Bios 2.2.6 (latest). Please be aware to have install Bios higher 2.1.x. because earlier Bios did not boot ESX 3.5 when starting megaraid_sas driver.

When running RESCAN Storage Adapters or VMFS s the ESX PSOD in reproducible way.

Verified on different systems and doublecheck with vmware support.

Other Vendors are still affected ( HP, FSC etc.)

No solution yet.

Sometimes RESCAN from COS works esxcfg-rescan vmhba in our DELL evironments.

Any idea

Reply
0 Kudos
11 Replies
opbz
Hot Shot
Hot Shot

Seems to be a pretty common problem.

I seen it with ISCSI and with FC sans

I know vmware are working on a patch for this.

A simple bandaid fix for this is to configure ssh access and disable usb at bios level. THis defintelly works for iscsi rescans

Reply
0 Kudos
MarkusGehm
Enthusiast
Enthusiast

Ok same Problem here after a rescan all of our brand new Servers got an PSOD, will try your solution asap.

It' s a very strange fault.

Reply
0 Kudos
MarkusGehm
Enthusiast
Enthusiast

OK, if I disable the internal USB Port and reboot the system I can see my LUN' s but a rescan results in a PSOD furthermore.

But as a workaround it helps Smiley Wink

Now I have a final workaround that 100% works after you have disabled the internal USB Controller you have to edit /etc/modules.conf and comment all lines that include USB in my case the last 2 ones.

After a reboot a manualy rescan works fine

Reply
0 Kudos
christian_ellge
Contributor
Contributor

HAve worked with VMware support over the last weeks and receive an patch that unfortunalety not work. It´s no patch available till yet, ESX 3.5 update 1 not fix the issue.

Please have a look on that;The PSOD is occurring because of command pointer aliasing that's caused by a previous device offline, which in turn is caused by abort/reset failure.

The abort/reset failure is reported to the console-os by vmkernel when we don't find commands on the adapter's command list which we are trying to abort / reset. This is due to delay in the command being put on the list by the vmkernel storage. If we wait for the commands to come on the adapter's command list then we will not return failure to console-os and we will not hit the issue. Some customers have removed the usb modules from modules.conf.

For them then the ESX does not crash on rescans from the GUI. Other customers have disabled USB devices in the BIOS settings. A stgorage re-scan causes a re-scan of USB devices too which is hitting the code that is causing this issue. PSOD.

Looking forward to receive latest informations from vmware support soon

Reply
0 Kudos
Dan_Jost
Contributor
Contributor

I had this problem on some other model Dell servers (2850's) - the only solution we had was to rescan the individual HBAs (right-click on them in the GUI) - this got rid of the system crashing. On some 6850's, disabling the DRAC redirection prevented the rescans from bringing the system down.

Hope this helps

Dan

Reply
0 Kudos
SuperGrobi73
Enthusiast
Enthusiast

Good morning from cologne,

I have the same issue with specific SAS Controller PERC 6/i in ESX 3.5. All DELL PE 29x0 with INTEL XEONs Series 54xx and 52xx are affected.

If you don't have these new XEONs the rescan doesn't return into a PSOD. AMD PEs 2970 and 6950 are not affected too.

Systems with a PERC 5/i are not affected!

The problem was seen first in january this year, when the new XEONs arrived, I have had a lot of trouble with this because first the BIOS Version had to be updatet to install ESX 3.5 and then the BIOS race began! Within two weeks the Version came up from 2.0.1 to 2.1.0 (this was the first BIOS to run ESX 3.5.x) up to 2.2.6.

The problem is assigned to the DELL Pro support and escalated to VMware, so I'm too waiting for the problem to be fixed, may be in ESX 3.5.0 Update2.

Any update to this?

Carsten

-- Mein Blog: http://www.datenfront.de
Reply
0 Kudos
MarkusGehm
Enthusiast
Enthusiast

Hi Grobi,

I'm waiting for a debug patch from VMware support as long as I don't get this it will not become GA.

But I don't hear anything from VMware about this

Reply
0 Kudos
christian_ellge
Contributor
Contributor

As discussed with VMware TechSupport solution should be released in september/october 2008. Possibly more as an "Patch". Looking forward to receive an solution earlier.

Reply
0 Kudos
mmurrin
Contributor
Contributor

Does this only happen with the software iscsi iniator or also HBAs. We have to PowerEdge 2950s and this issue has not happened yet. We use iscsi hbas.

Reply
0 Kudos
SuperGrobi73
Enthusiast
Enthusiast

I' ve checkd this PSOD error and it is still there in any ESX(i) 3.5.0 U2 Version X-(.

But it seems that DELL is the only vendor who is affected, I just tested Fujitsu-Siemens (FSC) Fujitsu in the U.S. T/RX 300 S4 Servers with the same LSIlogic Chipset and the Rescan SAN works fine, allthough all USB, S-ATA and SAS Controllers are enabled and no modifications being done in any configuration file of the COS.

The solution is a little bit tricky, boot your Server into the BIOS and then disable the onboard S-ATA controller and after the next reboot of your DELL PE 29x0III the Rescanning of the SAN workes like expected, no more PSODs.

In deed You will loose your local CD/DVD-ROM of your host, but in my opinion You will not need this optical drive untill you reinstall your Server.

Good luck to anybody and I'm looking forward to getting any feedback.

Carsten :smileycool:

-- Mein Blog: http://www.datenfront.de
Reply
0 Kudos
sfg34
Contributor
Contributor

Greetings all,

Can anyone confirm if the patch released on September 18th does address this issue?

Patch ESX350-200808402-BG

Thanks in advance for your replies.

Simone

Reply
0 Kudos