VMware Cloud Community
SCampbell1
Enthusiast
Enthusiast

Illegal Opcode (Red Screen of Death) after upgrade to ESX35u3

After using CDROM media to upgrade the first HP DL580G5 in an ESX cluster from ESX35u2 to ESX35u3, I get an illegal opcode on reboot.

I did not think to detach from the SAN during the upgrade (it's an upgrade for goodness sake, why should I have to?), so perhaps the system reconfigured itself to use one of the SAN drives to load the system? After detaching from the SAN, and performing the upgrade again, it clearly writes the master boot record and upgrades the system on the correct local drive, but still RSOD's on reboot.

Question: Is there an easy and safe way I can sneak in and fix this boot configuration? Is this the likely problem, or are there other possibilities?

I had used the same ESX35u3 media to upgrade a SAN-attached DL380G5 (not in an ESX cluster) with no problems, so was quite surprised and disappointed when this one failed.

I have cruised the forums for this, and there are hints of doing stuff to the QLA drivers from 2006 but no concrete pointers. If necessary, I'll try a full reinstall, but that will be somewhat unpleasant.

Thanks!!!

Reply
0 Kudos
13 Replies
Texiwill
Leadership
Leadership

Hello,

After using CDROM media to upgrade the first HP DL580G5 in an ESX cluster from ESX35u2 to ESX35u3, I get an illegal opcode on reboot.

I would do several things but primarily I would update the BIOS on the DL580 as well as the firmware for your PCI cards and devices. They other is verify that there are no obvious hardware issues.

I had used the same ESX35u3 media to upgrade a SAN-attached DL380G5 (not in an ESX cluster) with no problems, so was quite surprised and disappointed when this one failed.

380 is much different than a 580.

I have cruised the forums for this, and there are hints of doing stuff to the QLA drivers from 2006 but no concrete pointers. If necessary, I'll try a full reinstall, but that will be somewhat unpleasant.

I had similar problems and BIOS upgrade fixed the problem.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

SearchVMware Blog: http://itknowledgeexchange.techtarget.com/virtualization-pro/

Blue Gears Blogs - http://www.itworld.com/ and http://www.networkworld.com/community/haletky

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos
SCampbell1
Enthusiast
Enthusiast

Thanks for this.

The BIOS was Jan 2008 which I thought would be recent enough.

I'll give the Sep 2008 version a whirl and will post the result.

Reply
0 Kudos
SuryaVMware
Expert
Expert

Any chance of posting the PSOD (Purple Screen of Death) screenshot here? Guess you could use iLO to capture the screen.

-Surya

Reply
0 Kudos
SCampbell1
Enthusiast
Enthusiast

Sorry for the delay. I didn't get the posting notification.

I applied the Sep 2008 HP580G5 BIOS update today and the result was the same. I will be onsite again on Tuesday and will spend the time needed to resolve this.

I did write down the PSOD screen (pink, red, purple: so many colours for the same problem) and will post that for your information, as well as the final result.

My plan is:

  • Post the Illegal Opcode screen (It's 4-5 lines, the first is "Illegal Opcode" and the remaining lines are register settings) to this forum

  • Confirm the System BIOS change was applied. I am 99 34/100ths % certain it was.

  • Confirm the QLA BIOS versions and upgrade the BIOS if required

  • Re-apply the ESX update

  • Re-install ESX, reformatting the local drive

During the above plan, the HBA will be disconnected from the switch fabric. I'll plug the HBA's back in once ESX comes up.

If anyone has a comment, it would be muchly appreciated

Again thanks to all

Reply
0 Kudos
SuryaVMware
Expert
Expert

Having worked for VMware for almost 5 years i have seen a lot of PSODs, and most of them are h/w realated and Looking at the PSOD screen i will be able to tell you which component is causing the problem. If you could post the screen that will be helpful.

-Surya

Reply
0 Kudos
SCampbell1
Enthusiast
Enthusiast

Anyone who can understand register settings is A-OK in my books. Thanks for this!!!

Here's what the screen says from my hand-written notes:

Illegal Opcode

EAX=0000543E BX=00007000 CX=00646165 DX=00000080

EBP=0000FBFB ESI=00007C19 EDI=00000000

DS=0000 ES=0800 FS=0000 GS=0004

CS:EIP=0000:0000836C SS:ESP=0000:0000FFFF

EFlags 00000202

(It is possible BX=00607000)

Again, thanks.

Reply
0 Kudos
SuryaVMware
Expert
Expert

This is definetly coming from the GRUB loader. Can you get to the grub prompt?

if so can you try the following?

grub> device (hd0) /dev/cciss/c0d0

grub> root(hd0)

grub> setup ( hd0,0)

Let me know if this helps.

-Surya

SCampbell1
Enthusiast
Enthusiast

Thanks Surya,

This Illegal Opcode pops up as soon as the POST finishes and the grub window doesn't show at all.

Is there some key I can press to try to grab grub?

When I ran the update the first time, the HBA was connected although I'm not aware of any Unix LUNs presented to the server, but perhaps something happened then.

When I ran update each time after that, the HBA was not connected, and the boot loader record was definitely written to cciss\d0p0 (I may have the letters wrong, but it was definitely the P400 adapter)

After booting from the update CDROM, the update does find the existing ESX implementation and updates it as you expect.

Again, thanks for this.

Reply
0 Kudos
SuryaVMware
Expert
Expert

You need to press 'c' when the grub menu shows up. Let me know how this works.

-Surya

Reply
0 Kudos
Texiwill
Leadership
Leadership

Hello,

If grub menu does not appear then either grub is having issues or the MBR is corrupt or both. You can try to repair the MBR using the boot disk and hitting ALT-F1 (when in text mode) after it asks you for a mouse. You then have access to the shell and you need to follow LINUX MBR repair steps to fix this, which means you need to know which partition represents /boot and /.

The other options and this is the one I would take is to reinstall while preserving VMFS. Make sure no fibre devices are attached during the install.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

SearchVMware Blog: http://itknowledgeexchange.techtarget.com/virtualization-pro/

Blue Gears Blogs - http://www.itworld.com/ and http://www.networkworld.com/community/haletky

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
SCampbell1
Enthusiast
Enthusiast

Thank you both for your assistance.

The PSOD occurred before the grub menu appeared so even though I pressed the c key continuously after the POST, the PSOD always appeared.

In the end, I followed the "safe" route and reinstalled with the SAN disconnected, and reconfigured the server to match it's previous settings.

Reply
0 Kudos
NZSolly
Contributor
Contributor

I just dont understand why you didnt use Update Manager in VC, or am I missing something?

Reply
0 Kudos
SCampbell1
Enthusiast
Enthusiast

This was our first implementation of 3.5u3 on a production server and I thought this would be a safer alternative rather than remediating the one server. We also wanted to take the shutdown opportunity to flip the NX flag in the BIOS so we could get ready to implement eVC on the cluster.

I guess I was wrong, and in another life will use VUM even for minor version upgrades.

Thanks...

Reply
0 Kudos