I have a VMware cluster running on a blade server and all but one of the HOSTS report this error Alert Host IPMI System Event Log status. I did a Google search and came up with this fix VMware KB: The Host IPMI System Event Log Status alarm is triggered repeatedly in VMware vCenter ... In step 4 i get this The system event log is empty. Any help with this would be great.
To resolve this issue, stop the alarm from triggering repeatedly and clear the IPMI System Event log and reset the sensors.
To clear the log and reset the sensors:
I cannot help you with that, but I have the very same problem.
It seems to be connected to Dell servers and mostly to Supermicro mainboards. I never had it on an HP server yet. It began several months ago on a Dell server and every week or so another machine starts to throw those errors. Currently I see them on about 20 hosts, spread across customers in Austria and Hungary.
I followed the VMware KB article - no success: the system event log is empty and thus no clear button. The `localcli´ command did nothing.
I re-flashed the BMC firmware wih the latest versions, reset everything to defaults, cleared all logs - no success.
I wonder if it is connected to the IPMI weakness exploits, that are currently under attack in many systems I know.
We reacted by adding ACLs - either in the IMPI firmware itself or in the routers. Since then we have no bandwidth issues any more, but I'm unsure if we really secured the IPMI/BMC systems. Maybe the IPMI error in the vCenter servers reflects another hack or attempt?
Same issue here with 1 server out of a fleet of 70 HP DL360 G8s. Event log is empty and I am running 5.0 so no chance of using the command line.
Hello,
Since I have updated my servers Proliant DL360p Gen8, I got the same problems.
I got more than 100 alerts during the last 24h from my 3 ESXi ([VMware vCenter - Alarme alarm.SELHealthAlarm])
I did all the reset adviced but nothing changed.
Is there someone who can help us ?
PARENT Thomas.
I've logged a case with VMware Support. I will report back here.
I had the same and used some the tricks above, sometimes and worked sometimes it didn't.
But somewhere I found a trick from someone to remove the ESX host from the inventory and re-add it. Somehow something hangs in vCenter/DB which keeps getting these messages back. Sometimes it was a solution for me
I had a remote session with VMware support. I only have 1 affected server so we did a tail on the /var/log/syslog.log file and found the following repeated over and over:
2014-07-08T00:55:34Z sfcb-vmware_raw[3569]: IpmiIfcSelGetAll: no record count
2014-07-08T00:56:12Z sfcb-vmware_raw[3569]: IpmiIfcSelReadEntry: data length mismatch req=19,resp=3
2014-07-08T00:56:15Z sfcb-vmware_raw[3569]: IpmiIfcSelReadEntry:error 203.
2014-07-08T00:56:15Z sfcb-vmware_raw[3569]: IpmiIfcSelReadEntry:error 203.
2014-07-08T00:56:15Z sfcb-vmware_raw[3569]: IpmiIfcSelReadEntry:error 203.
2014-07-08T00:56:15Z sfcb-vmware_raw[3569]: IpmiIfcSelReadEntry: retry expired.
2014-07-08T00:56:15Z sfcb-vmware_raw[3569]: IpmiIfcSelReadNew: failed call to IpmiIfcSelReadEntry cc = 0xff
Which led to an article that didn't exactly match the symptoms: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=207769...
But the fix does appear to have rectified the issue. It was to restart the management agents from the console using services.sh
For information, I solved my problem with the HP support. It was due to the BIOS server (DL360p) service pack update. I had to shutdown the ESX, connect to the HP console interface (Intelligent Provisioning) and delete all the logs of the physical server. I restarted this physical server. On my Vcenter interface, I updated the hardware status and reseted the sensors. It is OK for me. I hope this will be helpful for someone. Thomas.
After installing the recent ESXi updates all the IPMI issues vanished. I don't know which of the 7 updates is responsible in particular, but I have no IPMI errors since several days.
What also works is to login through SSH
Clear the IPMI logfiles with:
localcli hardware ipmi sel clear
Restart the hardware status provider
/etc/init.d/sfcbd-watchdog restart
You also can list the logs with :
localcli hardware ipmi sel list
Last time I noticed these messages was when upgrading to the latest HP ILO version
Hi,
What do you get when you run /bin/enum_instances CIM_RecordLog?
In my case, vCenter was showing server had more than 61000 entries on the log. Using the iDRAC (Dell R710), I could only see 19 entries, which matched the number of logs I saw using localcli hardware ipmi sel list.
Restarting the management on the ESXi host did the trick here. vCenter now shows only 19 records on IPMI SEL event log.
IPMI unknown issue was fixed by restarting 'small footprint cim broken daemon'
/etc/init.d/sfcbd-watchdog restart
I have got a PowerCli script for this - we are facing the same bug on HP blades in our environment
What it does is basically everything described above, but without tne hassle of connecting to each host manually - it's a little bit modular so take a look on the code first if you understand the mnemonics, if not just run it, enter your vCenter and VM name (don't forget to see the $hosts value - it should contain your domain't name!) and wait for a moment
Write-Host
"Reset sensors, refresh HW data & their views on an ESXi host"
`n
-ForegroundColor
Green
<# Uncomment this to enable connection to vCenters defined in a text file
#
$vcenterlist = Get-Content .\vcenters.txt
ForEach ($vcenter in $vcenterlist) {
Define vCenter
Write-Host `n"Connecting to vCenter $vcenter"
# Connect to vCenter
Connect-VIServer $vcenter | Out-Null
}
#>
# Define a blank array for the hosts
$hosts
= @()
# input checking loop to check if $vcenter is null or not.
if
(
$vcenterlist
-eq
$null
) {
do
{
[Boolean]
$match
=
$false
$vcenter
=
Read-Host
"Define a vCenter where the host is located"
$vcenter
.Replace(
' '
,
''
)
if
(
$vcenter
-notlike
"
") { $match = $true }
Else {
Write-Host "
The value must not be null. Try again or CTRL+C to
break
.
"`n -ForegroundColor Red
$match = $false
}
} Until ($match -eq $true)
}
# ESXi host definition
$input = Read-Host "
Enter a name of ESXi host where you want to reset the HW sensors
"
# Generate FQDN and store into an Array
$hosts += "
$input
`.yourdomain.lab
"
# Connect to vCenter
Write-Host `n "
Connecting to vCenter
$vcenter
`...
"
ForEach ($vcenter in $vcenterlist) {
Connect-VIServer $vcenter | Out-Null
}
# The VMhost needs to be stored into an array with Get-VMhost for further processing
$vmhosts = Get-VMHost -Name $hosts
# Get all vmhosts for the connected vCenter sessions
#$vmhosts = Get-VMHost
ForEach ($vmhost in $vmhosts)
{
Try
{
#initialize calls for refreshing hardware status..
Write-Host "
Restarting CIM Server service on
$vmhost
"
Get-VMHost $vmhost | Get-VMHostService | Where { $_.Key -eq “sfcbd-watchdog” } | Restart-VMHostService -Confirm:$false | Out-Null
Start-Sleep -Seconds 15
Write-Host "
Starting to refresh HW info on
$vmhost
(this can take a
while
)
"
# Define variables for system calls
$hv = Get-View $vmhost
$hss = get-view $hv.ConfigManager.HealthStatusSystem
Write-Host "
Resetting HW Sensors...
"
$hss.ResetSystemHealthInfo()
Start-Sleep -Seconds 15
Write-Host "
Refreshing Data...
"
$hss.RefreshHealthStatusSystem()
Start-Sleep -Seconds 15
Write-Host "
Refreshing Data View...
"
$hss.UpdateViewData()
Start-Sleep -Seconds 15
}
Catch [System.Exception]
{
Write-Host "
There was an error
while
trying to refresh the hardware data.
" `n `
"
Please check the ESXi host
's Hardware Status Tab." -ForegroundColor '
Red'
}
Finally
{
Write-Host
"Disconnecting from the vCenter Server...
"
Disconnect-VIServer $vcenter -Confirm:$false
Write-Host "
Done processing
$vmhost
."
-ForegroundColor
Green
}
}
Hola, te cuento que yo he tenido el mismo problema, pues debes de realizar lo siguiente por que el mio lo he solucionado asi y bastante logica.
Procura que la hora de tu Host Hardware sea la correcta, modifica y guardala, la hora de tu vcenter server, la hora del software esxi ntp este bien sincronizada....
despues realizas esto y eliminas la alarma...
To clear the IPMI System Event.log file and reset the sensors:
saludos y me cuentas......
This is what I saw prior to the spam of IPMI events:
2015-01-06T20:24:10Z watchdog-sensord: '/usr/lib/vmware/bin/sensord ++min=0,max=10' exited after 7092587 seconds 1
2015-01-06T20:24:10Z watchdog-sensord: Executing '/usr/lib/vmware/bin/sensord ++min=0,max=10'
2015-01-06T20:24:12Z watchdog-sensord: '/usr/lib/vmware/bin/sensord ++min=0,max=10' exited after 2 seconds (quick failure 1) 0
2015-01-06T20:24:12Z watchdog-sensord: Executing '/usr/lib/vmware/bin/sensord ++min=0,max=10'
2015-01-06T20:24:15Z watchdog-sensord: '/usr/lib/vmware/bin/sensord ++min=0,max=10' exited after 3 seconds (quick failure 2) 0
2015-01-06T20:24:15Z watchdog-sensord: End '/usr/lib/vmware/bin/sensord ++min=0,max=10', failure limit reached
2015-01-06T20:30:44Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: data length mismatch req=19,resp=15
2015-01-06T20:30:47Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: data length mismatch req=19,resp=3
2015-01-06T20:30:49Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: EntryId mismatch req=0001,resp=0002
2015-01-06T20:30:50Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: EntryId mismatch req=0001,resp=0002
2015-01-06T20:31:14Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: data length mismatch req=19,resp=15
2015-01-06T20:31:15Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: data length mismatch req=19,resp=3
2015-01-06T20:31:18Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: EntryId mismatch req=0001,resp=0002
2015-01-06T20:31:20Z sfcb-vmware_raw[10147]: IpmiIfcSelReadEntry: EntryId mismatch req=0001,resp=0002
Restarting the watchdog as noted above did the trick
can the restart of the sfcbd-watchdog be done during production hours, without impact on the vm's running on the host??
Or would it best to migrate vm's off the host and then do the restart?
Thank you
Hi,
restart of this service is non-disruptive so you can do it at any time.
This is one that's been bugging me for a while. I've just ignored it cuz I've had other more 'pressing' issues to deal with than bother with this error. It finally bugged me enough, and things have calmed down around me enough to where I can now focus on resolving this nit-pick issue. What worked for me is what Wh33ly posted (SSH to problem Host(s) & view the events: localcli hardware ipmi sel list ; then if any show, clear the events: localcli hardware ipmi sel clear ; then restart the hardware service: /etc/init.d/sfcbd-watchdog restart ).
Thanks!
@coolsport00 (Shane)
Thanks for sharing your tip. I had this issue with an IBM x3650 M4 after I upgraded the IMM. Restarting services.sh did it.