So, it's confirmed : without the BMC, ESXi is stable since four days.
A friend said to me that there are also issues with IPMI (so the BMC) under Linux (freeze after two hours of power on), and to "never use IPMI on eServer 326m".
The question/problem now... : how to disable IPMI on the BMC under ESXi ?
I just tried two things (after reinstalling the BMC) :
- on one server, tried to disable the CIM, by unchecking the "Misc.CimEnabled" parameter
- on another server, tried to disabled the IPMI driver (with command "esxcfg-module -d ipmi_si_drv" on unsupported console)
Just wait and see...
These two "solutions" don't works... my two servers are freezed this morning...
Will look for other solutions...
May be I've found a solution !
I've tried many tuning : disabling the ipmi modules, no success (the modules are disabled, but loaded).
I've tried stopping sfcbd daemons, non success.
A lot of parameters, no succes...
And finaly, I've deleted ipmi modules ( ipmi_devintf.o, ipmi_msghandler.o and ipmi_si_drv.o) : it seems to be a working solution. My server is up since 2 days now, I will wait a bit more, but it seems OK (before, it freezed after a few hours maximum).
To delete these modules, it is not possible to do that directly in the /mod directory, this directory is rebuild at startup.
Here is how to do :
- boot a live Linux (from a CD)
- mount /dev/sda5 and /dev/sda6 (these are the two "banks" of the firmware)
- extract the binmod.tgz archive, remove the 3 ipmi modules, rebuild this archive, and replace the original.
That's all. The ipmi modules will not be loaded now, and eServer will not crash or freeze.
It will be necessary to redo that after each firmware update.
I've got to stack some more ram in ours, so I'll try this later in the week when we have an outage.
My eServer is up now since 5 days, so I think it is the working solution !
Conclusion : IPMI, and IPMI drivers are the problem with this kind of IBM eServer.
And the solution exposed in my previous message is a good working one.
This issue exists also in ESX 4.0 (Update1). Unfortunately only updating BIOS and BMC, disabling ipmi and/or deleting ipmi modules doesn't help in this case (or not much). After 20-30 minutes comes Purple Screen or "PCPU x didn't have a heartbeat for xxx seconds", and no ping reply. Just the same as in ESX 3.5. I've been working to find a solution last 3 days, and I think I've got it.
The solution is to turn off ACPI, by adding "acpi=off" to the kernel option in GRUB. The server is since 4 days up now without any problems so far.. Maybe this solution could be helpful for someone and save him much time, so I'm publising it. I couldn't find any help to this problem in internet..
Is the issue still present on the ESXi 4.1? I've got an eServer 326m which I'd like to use to run ESXi on it, but I'd rather stay away from ESXi if it's not fully compatible.
The solution is to turn off ACPI, by adding "acpi=off" to the kernel option in GRUB.
How can I do that?
So yeah, couldn't even get to the 4.1 installer, but 4.0 installed almost without an issue. Almost -- because out of the two SCSI drives ESXi can only see just one.
I've had the FreeBSD installed on my first drive (da0), I've made a complete copy of it onto a second drive (da1) just in case -- and it wasn't in vain as the ESXi only offered one disk to be installed on (da0) and I was hoping to boot it up and quickly add a FreeBSD VM with the second disk (da1) in the direct access mode (and then transfer services into either a newer FreeBSD or CentOS one by one). However when I go into "Devices" within ESXi it doesn't even list the second drive.
I'm assuming all 326m are equal with the same SCSI controller, right? Anyone else had a problem of ESXi only seeing just one disk? Is it possible to resolve that?
My previous question still stands -- how can I turn ACPI off for ESXi permanently? Is there a file I need to edit? Please help.
PS. I can still boot into FreeBSD if I select second disk as a startup drive in BIOS, so yeah, it's there and it's still got FreeBSD on it.