VMware Cloud Community
porschenm
Contributor
Contributor

PSOD on Dell R515 BIOS 2.0.2 - ESXi 5.0 Updatetd

Hi,

today i had a Purple Screen on a new Dell R515 with the latest BIOS (2.0.2).

We bought 2 of these machines in 2012 with BIOS 1.10.0 and had no problems.

Yesterday i installed 2 additional R515 with BIOS 2.0.2 (installed in factory) and one of the two crashed last night.

Are there BIOS settings which i should disable?

Like DMA Virtualization ?

C1E ?

Power Settings to High Performance in BIOS instead of OS-Control?

On one of the new machines i try to downgrade the BIOS an firmware to the versions of the first 2 machines and will test it.

Regards Michael

[Windows 7 Help|http://windows-7-board.de]
Reply
0 Kudos
44 Replies
dxcrp
Contributor
Contributor

Add Dell R715 to the list of servers affected too.  I am using AMD 6274 processors. I think VMWare 4.1 update 3 is alergic to Dell bios upgrade 3.0.4.  I am getting the occasional purple screens with no apparent reasons too.  I disabled C1E, I disabled VMotion, used both local storage and SAN storage, no avail.   I've spent numerous times diagnosing all hardware but can't find any problems.  I also contacted Dell support to see if they know anything but they just referred me to VMWare because they say all hardware checks out.  They told me to contact VMWare for support.  I just love being stuck in the middle, don't you?   I've just updated all the patches for 4.1, we'll see how it goes for the next 2 weeks.  If I still get the purple screen I'll contact VMWare for support and update everyone.

Reply
0 Kudos
ActiveX2
Enthusiast
Enthusiast

Guten Tag,

ich befinde mich zurzeit auf Geschäftsreise und habe nur eingeschränkten Zugriff auf meine E-Mails bin. Bin am 15.4. wieder zurück.

In dringenden Fällen wenden Sie sich bitte an Herrn Breuning. mbreuning@sievers-group.com

Vielen Dank für Ihr Verständnis.

Mit freundlichen Grüßen

i.A. Stefan Ohlmeyer

Stellv. Leiter IT-Services

Fon: +49 (541) 9493-160

Fax: +49 (541) 9493-260

sohlmeyer@sievers-group.com

SIEVERS-SNC Computer & Software GmbH & Co. KG

Ein Unternehmen der SIEVERS-GROUP

Hans-Wunderlich-Straße 8

49078 Osnabrück

Pers. haftende Gesellschafterin:

SIEVERS-SNC Beteiligungs GmbH

Amtsgericht Osnabrück, HRB 19289

Geschäftsführer:

Dipl.-Kfm. Klaus Gerdes-Röben

Marco Naber

Dipl.-Wirtschaftsing. Rüdiger Sievers

Amtsgericht Osnabrück, HRA 6465

www.sievers-group.com

VCP2/3/4/5/VCAP-DCD4/5/VCAP-DCA4/5
Reply
0 Kudos
brennanmichaelj
Contributor
Contributor

For all of you Dell customers, you really need to educate Dell on "microcode".  This would be something that Dell provides in BIOS updates that they receive from AMD.  Microcodes are software fixes to CPU bugs.

You can read the erratas here:

http://sources.progress-linux.org/gitweb/?p=releases/artax-backports/packages/amd64-microcode.git;a=...

In particular, when we experienced this problem, we were running on microcode 0x600062e (or earlier) and the HP BIOS upgrade got us to microcode 06000629, which resolved our problems.  Microcode 0x600062e was by far the worst and while it resolved some problems with PSODs, it created other problems, like individual VMs dropping off with "vmm64 fault 14" errors in the vmkernel logs, like this:

vmwareserver1: Feb 15 22:54:40 vmwareserver1 vmkernel: 14:07:58:16.617 cpu63:7755)WARNING: World: vm 7755: 9985: vmm1:virtualmachine1:vcpu-1:VMM64 fault 14: src=MONITOR rip=0xfffffffffc215821 regs=0xfffffffffc008670

vmwareserver2: Feb 27 14:15:03 vmwareserver2 vmkernel: 1:21:34:46.630 cpu3:4660)WARNING: World: vm 4660: 9985: vmm1:virtualmachine2:vcpu-1:VMM64 fault 14: src=MONITOR rip=0xfffffffffc27b36a regs=0xfffffffffc008d60

HP did the same thing to me, stating that hardware checks out ok and to go blame VMware for this.  VMware found that this issue was AMD microcode related and their backline support worked with HP Engineering directly on this.  It took a while for HP to understand and get it.  It sounds like you're running into that same problem with Dell, so keep pointing them at microcode.

One thing you can do to point Dell in the right direction is to reference HP's BIOS update, where they have fixed this specific issue:

http://alerts.hp.com/r?2.1.3KT.2ZR.yrG0Y.IwG%5fog..T.c41G.7mom.bW89MQ%5f%5fDGeGFSE0

Any AMD 6200 Series processor is affected, so the only solution for this problem is a microcode update.  Technically, this is in Dell's court to fix.  I believe VMware can also push down microcode in ESXi 5, so you may be able to pursue this route with VMware.

If you are a Dell customer experiencing this problem, I would be interested in the Hardware model, BIOS revision and the AMD microcode revision.  The next time you get a PSOD, pop in a CentOS Live CD and boot it up.  When it's booted up at the console, type "dmesg | grep "micro" and post the output to this forum.

For Example:

BL685c G7

Version: A20

Release Date: 12/09/2012

microcode_amd_fam15h.bin

patch_level=0x600062e

DL585 G7 BIOS 12/09/12

Version: A16

Release Date: 12/09/2012

microcode_amd_fam15h.bin

patch_level=0x600062e

Reply
0 Kudos
patm521
Contributor
Contributor

Dell released updated BIOS version 3.0.5 for the R815 yesterday (4/24/13).  In the Fixes and Enhancements section it states it corrects the PSOD hang with Opteron 62xx series procs.  I have not yet confirmed if there was an update for the R515.

http://www.dell.com/support/drivers/us/en/555/DriverDetails/Product/poweredge-r815?driverId=F8FCX&os...

Reply
0 Kudos
dxcrp
Contributor
Contributor

Thanks for the update.  After what happened I think I'll wait another few months before trying the new BIOS.  FYI, a few weeks back I contacted Dell and pointed them to this thread.  I think it was BrennanMichael's note that they paid close attention to and escalated my problem to their engineering team.  They then suggested to go back to 2.9.0 BIOS.  I did, and so far so good, no PSOD.  I think I'll stick to what works for now.  Thanks for the information everyone.

Reply
0 Kudos