Hi,
today i had a Purple Screen on a new Dell R515 with the latest BIOS (2.0.2).
We bought 2 of these machines in 2012 with BIOS 1.10.0 and had no problems.
Yesterday i installed 2 additional R515 with BIOS 2.0.2 (installed in factory) and one of the two crashed last night.
Are there BIOS settings which i should disable?
Like DMA Virtualization ?
C1E ?
Power Settings to High Performance in BIOS instead of OS-Control?
On one of the new machines i try to downgrade the BIOS an firmware to the versions of the first 2 machines and will test it.
Regards Michael
Add Dell R715 to the list of servers affected too. I am using AMD 6274 processors. I think VMWare 4.1 update 3 is alergic to Dell bios upgrade 3.0.4. I am getting the occasional purple screens with no apparent reasons too. I disabled C1E, I disabled VMotion, used both local storage and SAN storage, no avail. I've spent numerous times diagnosing all hardware but can't find any problems. I also contacted Dell support to see if they know anything but they just referred me to VMWare because they say all hardware checks out. They told me to contact VMWare for support. I just love being stuck in the middle, don't you? I've just updated all the patches for 4.1, we'll see how it goes for the next 2 weeks. If I still get the purple screen I'll contact VMWare for support and update everyone.
Guten Tag,
ich befinde mich zurzeit auf Geschäftsreise und habe nur eingeschränkten Zugriff auf meine E-Mails bin. Bin am 15.4. wieder zurück.
In dringenden Fällen wenden Sie sich bitte an Herrn Breuning. mbreuning@sievers-group.com
Vielen Dank für Ihr Verständnis.
Mit freundlichen Grüßen
i.A. Stefan Ohlmeyer
Stellv. Leiter IT-Services
Fon: +49 (541) 9493-160
Fax: +49 (541) 9493-260
sohlmeyer@sievers-group.com
SIEVERS-SNC Computer & Software GmbH & Co. KG
Ein Unternehmen der SIEVERS-GROUP
Hans-Wunderlich-Straße 8
49078 Osnabrück
Pers. haftende Gesellschafterin:
SIEVERS-SNC Beteiligungs GmbH
Amtsgericht Osnabrück, HRB 19289
Geschäftsführer:
Dipl.-Kfm. Klaus Gerdes-Röben
Marco Naber
Dipl.-Wirtschaftsing. Rüdiger Sievers
Amtsgericht Osnabrück, HRA 6465
For all of you Dell customers, you really need to educate Dell on "microcode". This would be something that Dell provides in BIOS updates that they receive from AMD. Microcodes are software fixes to CPU bugs.
You can read the erratas here:
In particular, when we experienced this problem, we were running on microcode 0x600062e (or earlier) and the HP BIOS upgrade got us to microcode 06000629, which resolved our problems. Microcode 0x600062e was by far the worst and while it resolved some problems with PSODs, it created other problems, like individual VMs dropping off with "vmm64 fault 14" errors in the vmkernel logs, like this:
vmwareserver1: Feb 15 22:54:40 vmwareserver1 vmkernel: 14:07:58:16.617 cpu63:7755)WARNING: World: vm 7755: 9985: vmm1:virtualmachine1:vcpu-1:VMM64 fault 14: src=MONITOR rip=0xfffffffffc215821 regs=0xfffffffffc008670
vmwareserver2: Feb 27 14:15:03 vmwareserver2 vmkernel: 1:21:34:46.630 cpu3:4660)WARNING: World: vm 4660: 9985: vmm1:virtualmachine2:vcpu-1:VMM64 fault 14: src=MONITOR rip=0xfffffffffc27b36a regs=0xfffffffffc008d60
HP did the same thing to me, stating that hardware checks out ok and to go blame VMware for this. VMware found that this issue was AMD microcode related and their backline support worked with HP Engineering directly on this. It took a while for HP to understand and get it. It sounds like you're running into that same problem with Dell, so keep pointing them at microcode.
One thing you can do to point Dell in the right direction is to reference HP's BIOS update, where they have fixed this specific issue:
http://alerts.hp.com/r?2.1.3KT.2ZR.yrG0Y.IwG%5fog..T.c41G.7mom.bW89MQ%5f%5fDGeGFSE0
Any AMD 6200 Series processor is affected, so the only solution for this problem is a microcode update. Technically, this is in Dell's court to fix. I believe VMware can also push down microcode in ESXi 5, so you may be able to pursue this route with VMware.
If you are a Dell customer experiencing this problem, I would be interested in the Hardware model, BIOS revision and the AMD microcode revision. The next time you get a PSOD, pop in a CentOS Live CD and boot it up. When it's booted up at the console, type "dmesg | grep "micro" and post the output to this forum.
For Example:
BL685c G7
Version: A20
Release Date: 12/09/2012
microcode_amd_fam15h.bin
patch_level=0x600062e
DL585 G7 BIOS 12/09/12
Version: A16
Release Date: 12/09/2012
microcode_amd_fam15h.bin
patch_level=0x600062e
Dell released updated BIOS version 3.0.5 for the R815 yesterday (4/24/13). In the Fixes and Enhancements section it states it corrects the PSOD hang with Opteron 62xx series procs. I have not yet confirmed if there was an update for the R515.
Thanks for the update. After what happened I think I'll wait another few months before trying the new BIOS. FYI, a few weeks back I contacted Dell and pointed them to this thread. I think it was BrennanMichael's note that they paid close attention to and escalated my problem to their engineering team. They then suggested to go back to 2.9.0 BIOS. I did, and so far so good, no PSOD. I think I'll stick to what works for now. Thanks for the information everyone.
