-
1. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
Texiwill Sep 7, 2007 4:17 PM (in response to RUG201110141)Hello,
All NMI's are produced by the hardware. Generally either CPU or memory related. This is definitely a hardware issue. It could be a heat related issue, or something like that which means diagnostics must run > 10 hours and sometimes 100 hours even to reproduce the problem. Short runs will not vet the hardware if it is that type of problem.
Best regards,
Edward
-
2. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
RUG201110141 Sep 10, 2007 12:22 PM (in response to Texiwill)Yeah, definitely hardware related. I don't know what exactly has failed, but I happened to have a spare server that was the exact same model. So I took the CPU tray, hard drives, hba's, and memory out of the faulty machine and placed them in the spare and viola no problems whatsoever. Now I get to argue with IBM some more.
-
3. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
alexonline2 Sep 19, 2007 7:24 AM (in response to RUG201110141)After the upgrade from 3.0.1 to 3.0.2 I have exact the same problem. Our server did only run with 3.0.1. With this version the server works since several month without any problems. I think it is not a hardware problem.
-
4. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
a.wolf Nov 22, 2007 5:47 AM (in response to alexonline2)I have the same problem when i try to install esx 3.0.2 ........
with 3.0.1 work all fine !!
have you LSI logic controller SCSI in your server ???? ...
I have LSI 53c1030 on board (INTEL motherboard) and when I has install patch ESX-7408807 the server crash same as 3.0.2
-
5. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
Rob.Bohmann Dec 18, 2007 1:50 PM (in response to a.wolf)Just got this today on a DL585G1 (dualcore 2.4Ghz - amd 880's) running ESX 3.0.1 build 39823 though the core dump file posted a message about build 40087...
Just trolling to see if anyone else has seen this error message besides the post above. I have an SR open, pursuing all avenues.
16:22:33:11.714 cpu1:1152)Heartbeat: 469: PCPU 0 didn't have a heartbeat for 61 seconds. may be locked up
16:22:35:11.714 cpu1:1152)Heartbeat: 469: PCPU 0 didn't have a heartbeat for 181 seconds. may be locked up
16:22:38:41.812 cpu0:1024)Host: 3293: BEGIN
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
If the service console is bound to cpu 0 and pcpu0 and the service console cannot communicate (i guess no heartbeat implies that) then how is the service console keeping track of time to know how long it has gone without a heartbeat?
Inquiring minds would like to know...
-
6. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
astronyth Feb 8, 2008 1:55 PM (in response to Rob.Bohmann)I encountered the same problem this morning on an HP BL460c that has been running 3.0.1 with no problems. A few weeks ago I upgraded it to 3.0.2 and I can't help but wonder based on the posts of others here if it's not related. I'm planning on upgrading to 3.5 this weekend.
-
7. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
mixolydian Feb 18, 2008 12:58 PM (in response to astronyth)Have you or anyone located a solution for this? I have the same problem with 3.5 on an IBM x3500. I can recreate the problem by doing a rescan for storage in the Management Interface. Reinstalled ESX and applied patches one at a time and tested after each patch with same results.
Thank you,
Brian
-
8. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
astronyth Feb 19, 2008 7:28 AM (in response to mixolydian)To clarify my experience, while I was running 3.0.1 I never saw this problem. After upgrading to 3.0.2 and during the couple weeks before I upgraded to 3.5, it happened on 3 different hosts a handful of times. After I upgraded to 3.5 the problem has not happened again.
-
9. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
brandt_triple Feb 29, 2008 5:00 AM (in response to mixolydian)Same problem - also on IBM X3500 - anyone found a solution?
-
10. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
Friendlyware Mar 9, 2008 6:59 PM (in response to brandt_triple)Hi, we have the same problem in two IBM x3400 machines. The machines were working fine with v3.0.2 but since we updated to v3.5 we got an similar error. CPU1:1075 Heartbeat 470 PCPU0 didn't have a heartbeat for 18s - may be locked up.
Did somebody found a solution for that ?
-
11. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
brandt_triple Mar 12, 2008 7:06 AM (in response to brandt_triple)Updated both BIOS and BMC on the server to newest versions - now everything works
-
12. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
Friendlyware Mar 12, 2008 4:58 PM (in response to brandt_triple)Same here, updated the IBM X3400 to latest BIOS and both our systems now run stable on ESX 3.5
-
13. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
credfern Mar 24, 2008 1:44 PM (in response to Friendlyware)I have this same problem, but for the IBM x346. I used the UpdateXpress CD to update the BIOS and BMC but it did NOT fix the problem.
Any suggestions?
-
14. Re: NMI: 1193 and PCPU didn't have a heartbeat for 181 seconds
Henry Dorset Case Jun 16, 2008 11:31 PM (in response to credfern)Hello,
I got the same errors on two brand-new IBM x3650 with BIOS 1.10 and in addition to that I got a BIOS Error 00180103 stating "Device resource allocation error". My Qlogic FC-Adapter did not show up at boot time and consequentially the qla2300 module could not be loaded. I found an IBM support document at https://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?brandind=5000008&lndocid=MIGR-61047 (though for HS20 with some other additional HW) that states that maybe there is not enough Option ROM space left at boot time. After disabling PXE boot on the second onboard NIC the machine comes up clean and the PCPU-Error is no longer logged in vmkwarning.
Regards,
HDC