VMware Cloud Community
zaspam
Enthusiast
Enthusiast
Jump to solution

VM HA status permanently red with a vSphere HA virtual machine monitoring error

Hi all,

I have two VMs that have a recurring vSphere HA virtual machine monitoring error.

It is recurring because as I reset it to green it turns back again to red.

I have the following recurring errors in the fdm.log:

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::ReportVmMetricsResult] reset vm /vmfs/volumes/5b7d31cc-e7ec791b-6587-0017a4770820/YYYYYYYYYYYYY.vmx: false

2019-10-28T13:18:34.783Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::ReportVmMetricsResult] reset vm /vmfs/volumes/5cb1f615-60e54c97-8233-0017a4770438/XXXXXXXX.vmx: false

2019-10-28T13:18:34.783Z verbose fdm[404CB70] [Originator@6876 sub=Hal] No stats listeners! Nothing to do!

2019-10-28T13:18:36.380Z info fdm[3F89B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] [ClusterManagerImpl::LogState] hostId=host-261 state=Slave master=host-47194 isolated=false host-list-version=276 config-version=26355 vm-metadata-version=86995 slv-mst-tdiff-sec=0

2019-10-28T13:18:37.780Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [MonitorHeartbeatState::MoveToResetStateIfTimersExpired] VM /vmfs/volumes/5cb1f615-60e54c97-8233-0017a4770438/XXXXXXXXXX.vmx is going to reset state because of GOS crash.Reset no: 1 out of Max allowed reset count: 3

2019-10-28T13:18:37.780Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [MonitorHeartbeatState::MoveToResetStateIfTimersExpired] VM /vmfs/volumes/5b7d31cc-e7ec791b-6587-0017a4770820/YYYYYYYYYYYY.vmx is going to reset state because of GOS crash.Reset no: 1 out of Max allowed reset count: 3

2019-10-28T13:18:37.780Z info fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::PerformCheckIoStats] Checking io stats on  a list of 2 VMs.

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 2 at 0 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 58 at 1 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 109 at 2 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 1 at 3 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 2 at 4 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 3 at 5 for metric 196608

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 0 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 1 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 2 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 3 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 4 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 5 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 0 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 1 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 2 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 3 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 4 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 5 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 23 at 0 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 24 at 1 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 23 at 2 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 19 at 3 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 23 at 4 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 20 at 5 for metric 589827

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 0 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 1 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 2 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 3 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 4 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 5 for metric 589826

2019-10-28T13:18:34.779Z verbose fdm[404CB70] [Originator@6876 sub=Policy] [VmOperationsManager::CompleteCheckVmMetrics] IO metrics value is 0 at 0 for metric 589826

From what I have seen the VMs have indeed crashed (confirmed from the windows server event logs), however they have rebooted and are working correctly.

The crash was related to a ntoskrnl.exe

ntoskrnl.exe    ntoskrnl.exe+3ed39c    fffff800`0181b000    fffff800`01df8000    0x005dd000    0x5d803c60    9/17/2019 2:52:32 AM

Even the VMware tools are up to date, however the hardware version is ESXi 5.0 and later (VM version 8).

I am on esxi 6.5 U1, and I know that it may be updated to a higher version.

Yet the esxi host still reports these errors every minute or so.

I know the trivial answer to migrate the VM off of that host and see what happens and if the error is still there, however I would like to troubleshoot this as

this is a recurring theme and the issue is likely to reappear again, and I need to know what causes it.

I would be very happy if you  could point me to further steps on how to troubleshoot this!

0 Kudos
1 Solution

Accepted Solutions
KocPawel
Hot Shot
Hot Shot
Jump to solution

There is KB for such isue as yours. Take a look:

VMware Knowledge Base

Second idea is to migrate those VM (compute and storage), check and later go back.

View solution in original post

0 Kudos
3 Replies
KocPawel
Hot Shot
Hot Shot
Jump to solution

There is KB for such isue as yours. Take a look:

VMware Knowledge Base

Second idea is to migrate those VM (compute and storage), check and later go back.

0 Kudos
zaspam
Enthusiast
Enthusiast
Jump to solution

Hi Pawel,

Thank you for the answer. That did resolve the problem.

Have you got any insight on why the issue seems to appear?

I understand as much, that VM monitoring is not aware that the guest VM is up and running correctly or that it may have a flag somewhere set for the guest VM saying that it needs to reset it but then it fails to execute the reset.

I'm more worried that having VM monitoring turned on may cause unwanted downtime, as this event is present on some other hosts and VMs in my environment.

Have you got any resources that migh help me understand the circumstances under which the error occurs?

0 Kudos
KocPawel
Hot Shot
Hot Shot
Jump to solution

Unfortunately, I don't know why it happens.

n KB we can see an information: "If the issue is repeatedly triggered, file support request with VMware Support to troubleshoot" so I guess that there could be some bug. Keep VMTools and ESXi up to date and observe.

0 Kudos