VMware Cloud Community
maros141
Contributor
Contributor

SNMP trap: This trap is sent when a virtual machine detects a loss/regains in guest heartbeat.

Hello,

I am implementing System Center Operation manage 2007 R2 with Quest QMX extension for VMware.

I started monitoring SNMP traps from ESX servers over QMX tool after that I recieved from all virtual machines these traps:

VMWare: This trap is sent when a virtual machine detects or regains the guest heartbeat.

vmID=1 vmConfigFile=/vmfs/volumes/4b694757-1f79af5f-c0fd-00215e08aefa/srvspi03/srvspi03.vmx (alarm supplied by VirtualAgent.js EXIT)

VMWare: This trap is sent when a virtual machine detects a loss in guest heartbeat.

vmID=1 vmConfigFile=/vmfs/volumes/4b694757-1f79af5f-c0fd-00215e08aefa/srvspi03/srvspi03.vmx (alarm supplied by VirtualAgent.js EXIT)

-


VMWare: This trap is sent when a virtual machine detects a loss in guest heartbeat.

vmID=1 vmConfigFile=/vmfs/volumes/4b694794-d10567e3-3f3d-00215e08aefa/srvspi02/srvspi02.vmx (alarm supplied by VirtualAgent.js EXIT)

VMWare: This trap is sent when a virtual machine detects or regains the guest heartbeat.

vmID=1 vmConfigFile=/vmfs/volumes/4b694794-d10567e3-3f3d-00215e08aefa/srvspi02/srvspi02.vmx (alarm supplied by VirtualAgent.js EXIT)

-


I reinstalled VMware tool on virtual machines but it did not help me. I still recieved SNMP trap loss/regain the guest heartbeat.

I don't understand why all VM have vmDI=1 .... ??

Can You help how to resolve this problem ?

Thanks

Maros

0 Kudos
10 Replies
geddam
Expert
Expert

Can you be more precise on your environment details....

What versions of ESX and VC, on which version of Windows do you generally recieve this.....

vmID: This is the ID of the affected virtual machine generating the trap. If the vmID is non-existent, (such as for a power-off trap) 1 is returned.

Thanks,,

Ramesh. Geddam,

VCP 3&4, MCTS(Hyper-V), SNIA SCP.

Please award points, if helpful

Thanks,, Ramesh. Geddam,
0 Kudos
maros141
Contributor
Contributor

I have four ESX 4 update 1 in cluster + vSphere 4 sever (fyzical server, Windows 2008 Standard x86 32-bit).

Guests VM have operation system Windows 2003 Standard.

SCOM server is fyzical machine with Windows 2008 Standard x64.

On this server is installed System Center Operations Manager R2(SCOM) with Quest QMX extension for VMware and IBM director.

All ESX server has enable service Net-SNMP through a proxy configuration of VMware SNMP agent.

SNMP Traps from all servers are sending to SCOM server and through QMX extension are sending to SCOM.

0 Kudos
AndyChip
Contributor
Contributor

We've got a very similar situation with the spurious SNMP heartbeat traps. We have 3 hosts with about 3-15 VMs on each. They are all identical Dell hardware and all machines behave perfectly - apart from when I started monitoring SNMP. It's driving me mad.

Did you ever get a fix for this?

0 Kudos
maros141
Contributor
Contributor

I don't have succesfully solved this problem.

0 Kudos
AndyChip
Contributor
Contributor

That's a shame. I'm looking at trying to decrease the sensitivity of the trap (if possible) but still searching for answers.

Thanks for replying anyway.

A.

0 Kudos
tdubb123
Expert
Expert

any fix for this? dell ome alerts going razy when i put host in maintenance mode

0 Kudos
VirtualCop
Contributor
Contributor

Hello guys,

have you figured out the fix for this issue ?

I'm still struggling with this bug, too.

Cop

0 Kudos
OzBoyBlue
Contributor
Contributor

Bump!

I also have the same issue, Dell's OpenManage Essentials running in a guest VM and monitoring host ESXi.

I already have an ignore rule for 'vmwCimOmHeartbeat' otherwise I get an email almost exactly every 5 minutes to let me know it's still alive with a description 'vmwEnvIndicationTime', but I don't really want to just keep creating rules to ignore things that at some point I might want to know about..

The above seemed very consistent and usual behaviour so I am happy to ignore it, but the 'machine detects or regains the guest heartbeat' is highly erratic and sometimes won't occur for 6-7 hours, but then there might be 3-4 alerts within a 2 hour period.

The guest VM's the alert is referring too are not experiencing any problems, no network dropout's, no OS issues, everything's running smoothly, but these heartbeat alerts keep coming and so far searching/reading various information across forums.. nothing!

If anyone has some more insight into this it would be greatly appreciated.

0 Kudos
lof
Contributor
Contributor

We are having this exact same problem.  Our hardware is 3 Dell R610's each running ESXi 5.5.0. vCenter 6.0.0

We recently setup OpenManage Essentials to receive SNMP traps from the Dell R610's. 

Like others reporting here, we receive regular messages which generate e-mail alerts:  Severity:Warning, Message:Virtual machine detects a loss in guest heartbeat

Yet we can see no problem with the VM's at all.  We also don't see an indication of this heartbeat loss in vCenter.

I will say that we see more of these during Veeam backups - but they also occur intermittently throughout the day.

We went into vCenter and set the heartbeat timeout to 60 seconds - the default was 30.  This made no difference.  This seems like a really long time!  Does this setting have anything to do with the reported problem?

0 Kudos
lof
Contributor
Contributor

After a bit of research, here is what I've come up with.  If anyone has anything to add or correct, please feel free...

These heartbeat errors have nothing to do with vCenter.  This can be seen by powering off the vCenter server and noting that these errors are still generated with the same random intervals.

These heartbeat errors are generated in ESXi and sent to OpenMange via the SNMP service in ESXi. 

Most of the web info about heartbeat errors is all about vCenter heartbeat - that seems to be a different issue than what I’m seeing.  No indication of these is in the vmware.log files within the VM.  Everything I have found on the web about heartbeat timeouts seems completely irrelevant to my issue because it has nothing to do with vCenter.

I think this heartbeat lies between VMware tools running on the VM, and ESXi.  Stopping VMware tools on the client will generate this trap.  I also found that doing snapshots on VMs will often trigger this trap - which explains why I sometimes see it during Veeam backups.  I could find no information on how to change the timeout value of this heartbeat. 

The SNMP trap is called:  vmHBLost

.1.3.6.1.4.1.6876.4.1

Generic = 6

Specific = 3

Sample:  Message:Virtual machine detects a loss in guest heartbeat: vmID=3, vmConfigFile=/vmfs/volumes/StoragePath/VMpath/vm.vmx , vmDisplayName=servername

Some info on the web suggests this trap is deprecated.  The only solution I can come up with is to ignore this trap.

I chose to change the e-mail alert settings in OpenManage so that it doesn’t e-mail when this trap is sent.

  1. Sign in to OpenManage Essentials as administrator
  2. Manage > Alerts > Alert Actions > E-mail
  3. Edit the action for “E-Mail for OpenManage Alerts”. 
  4. Step through the config pages, and on the “category and sources association” page, deselect VMWare ESX Server > vmHBLost

Alternatively, it looks like the trap could be disabled in ESXi:

https://pubs.vmware.com/vsphere-50/index.jsp#com.v...

0 Kudos