- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was the customer that worked with HP on the BIOS fix for 12/17/12.
For those Dell customers that are still affected, you want to push Dell to update their AMD microcode in the BIOS to microcode level 0x06000629
VMware had indicated that they think the bug we were encountering was related to AMD Errata #734.
We have had HP BIOS 12/17/12 in our environment for a couple weeks now and the spontaneous reboots, Machine Check Exceptions (MCEs), VMM64 page fault 14 errors, PF #14 PSODs for single VMs or vmotionStream and other issues involving virtual machine memory corruption have gone away.
We're running ESX4.1, but this will affect both ESX and ESXi 4.x and 5.x. We specifically had problems with HP DL585 G7s and BL685c G7s, all running AMD 6200 Series processors.
HP BIOS 03/19/12 - ESX servers crashing, MCEs, PSODs (PF #14 on individual VMs)
HP BIOS 08/15/12 - ESX servers crashing, MCEs, PSODs (PF #14 on individual VMs)
HP BIOS 12/09/12 - MCEs, PSODs (not as many, but PF #14 on individual VMs or vmotionStream), introduction to VMM64 page fault 14s (which causes VMs to crash, both Linux and Windows) and memory corruption errors on VMs (Windows DLL crashes in event viewer)
If you need to figure out the microcode level, load something like CentOS LiveCD on the ESX host and run 'dmesg | grep “micro"'. It should output something like:
microcode_amd_fam15h.bin
patch_level=0x600062e
If the patch level is not at least 0x06000629, then you will experience problems. Push them to fix the problem.
HP was able to reproduce the bug by sending traffic back and forth to VMs, so they used some kind of stress utility specific for network traffic to reproduce.