I have new server, new instalation.
Inside i have 2 EG0600FCSPL disk in RAID 1
There are curently running 3 VM. 2 Server 2012 R2 and one Windows 8.1 VM.
First i got PSOD 14 days ago, and it poped up during instalation. I updated firmwere diretly from server.
Now when server is rdy to start runing (domain is UP, computer in domain etc..) i got PSOD again yesterday. After reading some forums, may be problem with HP power saving futures. I set all on high preformance. At night got new PSOD,In attachment ares PSOD's from yesterda and this night. All were about "No heartbeat".
I am also noticing when i log to the server in the morning, it need like 30 sec to get log in. After firt login is taking like 5 sec, much faster. This i am noticing from start till today. Maybe is conected with someting.
Any idea what can be wrong ?
Tnx
Mathjaz.
Regarding power management.
Ive never had any success with power management enabled.
Power management means PSOD..
In bios you should set power management to high performance, disable all "C" states and set cooling to maximum.
We have suffered PSOD before due to power management and now ALWAYS disable it.
Looks like you have the same problem as this other thread.... HP ProLiant DL380e Gen8 - ESXi 5.5.0 build 2143827 - PSoD
I recommend following HP's Vmware Recipe for building out your server. After rebuilding according to HP's specs as far as firmware/driver and custom ISO image we have not had any problems with the servers like this in our environment.
Here's the recipe we used from HP for our own servers http://vibsdepot.hp.com/hpq/recipes/June2014VMwareRecipe13.0.pdf
Yesterday after posting i notice that i installed ESXi in october, and newest verzion is same build but novebmer date. So i reinstaled (clean install) again. So far so good, but only 1 day alive. Hope i get back after a week with good replay
Again PSOD
Today i tried to upgrade firmwere from smart iteligent. Found new firmwere on network device. Treyed to upgrade it, but when he's done and rescaning the hardwere and new firmwere, there is again same network device with old firmwere....
After rebooting server, server vent crazy (ventilators to 100 %, ilo unknown), and i was still conected on ilo and ping was working, no command did work out. Pressed power off and he went off. But i couldn't power him on from ILO. After manual pressing power on, he went crazy again ... We manualy pressed power off for 5 sec and cut off electricity for 1 minut, and power on was now normal.
Same thing happend last time, but thot it was one time incident...
And PSOD was same as i posted in first post. Does anone know what it means ?
Hi there,
you had a Non-Maskable Interrupt (Non-maskable interrupt - Wikipedia, the free encyclopedia) interfering with the hardware - the last instruction before the host crashes is related to Spinlock Spinlock - Wikipedia, the free encyclopedia - this seems that the CPU is waiting for instructions, but didn't get any (got locked up) and after two ticks there was a check to induce kernel panic via NMI to maintain data integrity.
I'd suggest checking up the iLO for any error or critical events. My suspicion is on a faulty CPU or the motherboard if the host went crazy as you say, this does not seem to be bound to any other hardware or driver. Get your HP support involved and have your hardware checked and exchanged.
Hey Mathjaz.
When you update firmware on these HP servers I'd suggest you do NOT update individual firmware..
Download the HP SPP 2014.09.00 and use it as your base package..
As mentioned by LeslieBNS9 HP labs also release what they call a recipe. This is a suggested firmware verse software compatibility matrix.
Personally I'd just go with the SPP2014.09.00..
From an ESXi perspective.
Download the HP custom ESXi5.5u2 image from VMWare.
Once installed you need to disable 2 drivers if you want this to be stable..
The SMX driver which interacts with SIM (which I doubt you are using)
and the axm driver which interacts via PCC with power management on the gen8 servers, you don't need this unless you want PSOD's
Putty into your server and run the below. Once complete reboot the host
Get Current state of CIMvmw_hp-smx-providerProviderEnabled
esxcfg-advcfg -g /UserVars/CIMvmw_hp-smx-providerProviderEnabled
Set CIMvmw_hp-smx-providerProviderEnabled State to 0 disabled
esxcfg-advcfg -s 0 /UserVars/CIMvmw_hp-smx-providerProviderEnabled
Get Current state of CIMvmw_hp-smx-providerProviderEnabled
esxcfg-advcfg -g /UserVars/CIMvmw_hp-smx-providerProviderEnabled
/etc/init.d/sfcbd-watchdog stop
/etc/init.d/sfcbd-watchdog start
/etc/init.d/sfcbd-watchdog status
Disable AMS driver.
/etc/init.d/hp-ams.sh stop
Run this command to remove the VIB.
esxcli software vib remove -n hp-ams
The site as changed my hash to 1. where you see the "1." please replace these with a hash "shift 3" on your keyboard.
Hello
We have changed motherbord of the server. And still same problem PSOD.
Tryed what markzz sugested, and server was up only few hours than again PSOD.
SPP 2014.09 i dont have, but was also trying to do F10 Intelignet provision and update firmwere. Again problem, now i got "Unable to contact update server" everytime i try (trying over a week now).
Also set server on maximum preformance.
Any more idea ?
Realy don't know anymore what to do.
Hi there,
you can get the SPP from here: HP® Servers - Service Pack for ProLiant
The error is still reported on PCPU0, so my best bet would be to exchange the 1st CPU. Perhaps if you run some stress tests on the whole NUMA node and get it to crash (while it won't on the next one) you can be almost certainly sure that this is a hardware fault.
Alistar may have something of an answer there..
How about you remove CPU1.
Move CPU2 to CPU1's socket and test again.
regarding the SPP.
you can either create a DVD from the ISO or you can mount the ISO via the ILO.
Once booted you only need to choose to update firmware from DVD media.. Don't try to go on line it all gets messy with proxy's etc..
That's a great idea! You will find an example stress test (albeit for RAM) in my article: Stress Testing an ESXi Host with Windows Server VMs | VMXP
Tnx Alistar,
I have downloaded SPP 2014.09.00 and automaticly updated server. After i also checked F10 smart inteligent, which is now working ok, and no new update avalible
Meanwhile i decided to insert new SD card and instaled vSphere 5.5 U1 Dec 2014, since is newer date. (Release build 1746018)
Stress test is also good idea in some way
But since i think i have problem when server goes to sleep and than PSOD (not every time) don't know if is the best test
Will strees test him for few days.. Hope it survive
Tnx all, will replay, hope in few days, not tomorow
Mathjaz.
Regarding power management.
Ive never had any success with power management enabled.
Power management means PSOD..
In bios you should set power management to high performance, disable all "C" states and set cooling to maximum.
We have suffered PSOD before due to power management and now ALWAYS disable it.
Today system still stable.
Next week server will go in production.
Tnx all for help