kbulgrien4freed
Enthusiast
Enthusiast

PowerEdge 2950: ESXi 6.0 boot: VMB: 85 Halting. E1420 CPU BUS PERR.

A PowerEdge 2950 II running VMware ESXi, 6.0.0, 5050593 Image Profile (Updated) ESXi-6.0.0-20170202001-standard has been running without issue for quite some time, and the underlying hardware has had no issues for several years.  Recently, an Intel 350T2V2 NIC was installed and configured for use, then a Dell SAS 6 GB HBA External Controller Card 7RJDT was installed.  Neither installation had a negative impact on system stability.

Next, upon replacing four (4) Crucial 4GB 240 Pin 512Mx72 DDR2 PC2-5300 CL5 ECC DIMMs with eight (8) A-TECH 8G DDR2 PC2-5300 ECC FULLY BUFFERED DIMMs, the BIOS memory check passed, but seemed to proceed very (very) slowly.  ESXi started to boot, but took an extraordinarily (very) long time at the /sb.v00 and /s.v00 steps of the "Loading VMware Hypervisor" stages.  Eventually, and a (very) long time later, a message appeared stating "Relocating modules and starting up the kernel...".  Again, a significant amount of time transpired.  Then, the screen blacked out and this:

VMB: 398: Unexpected exception 2 @0x41800e06957e

VMB: 405: cr0 0x8001003d cr2 0x0 cr3 0x100803000 cr4 0x30

VMB: 407: error code 0x2 rip 0x41800001eee0 cs 0x8

VMB: 409: rflags 0x86 rsp 0x42800001eee0 ss 0x0

VMB: 411: rax 0x12345678 rcx 0x101ffff rdx 0xffff4c000

VMB: 413: rbx 0x0 rbp 0x0 rsi 0x1000

VMB: 415: rdi 0xffff81100004c000 r8 0x2 r9 0x23

VMB: 417: r10 0x8000000000000003 r11 0x0 r12 0xffff4c

VMB: 419: r13 0x420000045221 r14 0xd r15 0x0

VMB: 420: gs 0x10 fs 0x10

VMB: 422: FSbase:0x0 GSase:0x417rce236200 kernelGSbase:0x0

VMB: 139: [0x42800001eee0] 0x41800e06957e

VMB: 139: [0x42800001ef00] 0x41800e06a0ad

VMB: 139: [0x42800001ef900] 0x41800e814c24

VMB: 139: [0x42800001efc0] 0x41800e000fb8

VMB: 85: Halting.

At the same time, the PowerEdge 2950 front panel LCD switch from blue to amber and reported:

  E1420 CPU BUS PERR

At this point the system is dead and must be powered off.

The RAC System Event Log shows entries like:

  Entry 007 of 007

  Severity: Non-Recoverable

  Date and Time: Wed May 10 13:48:12 2017

  Description:

  CPU Bus PERR: Processor sensor, transition to

  non-recoverable was asserted.

Dell forums show a flurry of PowerEdge 1950/2950 CPU Bus PERR reports in the Apr-May 2008 time frame, but no conclusive resolutions were spotted, though it seemed apparent Dell acknowledged an issue at some point and RHEL issued a related OS patch at some point. Xeon E5xxx processors were mentioned and this one has Xeon E5345 CPUs.  Various posts seemed to suggest the issue might be related to virtualization.

Various BIOS setting changes have been tested per a number of Dell / VMware forum posts to no avail.

The system successfully boots a CentOS 7 1503 Live KDE 64-bit and CentOS 6.5 Live KDE 32-bit DVDs, though one gets an impression that possibly the system is running a slow.

One is led to suspect the new DIMMs triggered this situation, but it seems over hasty to remove a 64GB upgrade and return to a 16GB configuration since 16GB RAM is not going to support VMs planned for this system.  To this end, research continues.

0 Kudos
2 Replies
kbulgrien4freed
Enthusiast
Enthusiast

The system BIOS is 2.0.1, and it appears that a 2.7.0 is available.  I have a Dell Server Update Utility (SUU_741_x32_96) DVD issued March, 2014 and found a later SUU published December, 2014 still supports the PowerEdge 2950, so I'm downloading it.  The SUUs issued in 2015 seem to have dropped support for the PowerEdge 2950.

Attempts at using the SUU on CentOS 7 1503 64-bit Live KDE DVD fail in a number of ways.  The suu utility script bombs out with basic shell errors, and while attempting to use the repository resources directly, the Linux environment seems to have either bogged down and become unusable or has cratered.

Using the CentOS 6.5 i386 Live KDE DVD seems a reasonable next attempt.

0 Kudos
kbulgrien4freed
Enthusiast
Enthusiast

The suu utility runs better in the CentOS 6.5 32-bit Live DVD environment, but bombs when various arguments are used while figuring out how to use it.

The SUU DVD has a repository/PE2950_BIOS_LX_2.7.0.BIN file that contains a BIOS upgrade.  Running it gives an error that reports a compat-libstdc.i686 RHEL package is needed to use the utility.  There does not appear to be any such package in CentOS, but `yum search libstdc` shows compat-libstdc++-296.i686 and compat-libstdc++-33.i686 packages are available.  Having recalled that compat-libstdc++-296-2.96-132.7.2.i386.rpm was once installed on a Mandriva OS when trying to get Dell tools to work on a PowerEdge 4400, this is attempted, but fails.

Upon installing compat-libstdc-33.i686, the BIOS update applies.

At this juncture, VMware ESXi boot does not slow down at /sb.v00 and /s.v00, and in fact starts up without triggering an exception!

By this time, the newer SUU released December, 2014 has finished downloading.

The CentOS 6.5 32-bit Live DVD environment is booted again, and `suu -u` is attempted again.  This time, the utility works, albeit with numerous console errors apparently triggered as system component firmwares upgrade.

In the end, most system components seem to have updated:  the PERC RAID controller, system baseboard, RAC, RAC Utility, etc.

The system seems to be fine, so attention returns to spinning up VMs... What an adventure!

This conversation has been posted in case it helps someone (me?) in the future, though I'm probably nuts for running this old hardware.

0 Kudos