kondrichRHI
Enthusiast
Enthusiast

VCSA 6.5 statsmonitor service stops unexpectedly

Hi,

from time to time the VCSA 6.5e statsmonitor service stops unexpectedly. This breaks the ability to backup VCSA data via PowerCLI script. Another consequence is that there are no CPU/Memory/Database Statistics collected in VCSA management backend.

vRealize LogInsight Log excerpt is attached for the considered timeframe and filtered by "statsmonitor". I cannot tell from this log why it tried to add this service, tried to start it and then failed.

So the stop occurred on June, 13th for the last time. It remained stopped since then. Before June, 13th, I realized it and fixed it on June, 8th.

I see this problem occurring one one of our VCSA 6.5 instances but not on the other ones.

Does anyone else see this behavior or any idea what's wrong?

Tags (1)
10 Replies
msripada
Virtuoso
Virtuoso

From the events, it shows the below file does not exists or cannot read

Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml

can you enable a putty session or login to shell of VCSA

cd /var/vmware/applmgmt/

execute the "ls -lah" command and check if the file exists statsmonitor_health.xml and also check the size of the file

Please copy the output of ls -lah

Thanks,

MS

0 Kudos
kondrichRHI
Enthusiast
Enthusiast

Yes, I saw this message, too. When the service is started, the file is existing.

ls -lah currently reports (service is running):

total 28M

drwxr-xr-x 5 root root 4.0K Jul 19 08:02 .

drwxr-xr-x 5 root root 4.0K Mar 27 11:04 ..

-rwxr-xr-x 1 root root  509 Jul  2 05:59 appliance-health.xml

-rwxr-xr-x 1 root root 5.3K Jul  2 05:59 appliance.stats

-rw-r--r-- 1 root root  27M Jul 19 07:32 appliance_stats.sqlite

-rw-r--r-- 1 root root  32K Jul 19 08:02 appliance_stats.sqlite-shm

-rw-r--r-- 1 root root 1.0M Jul 19 08:02 appliance_stats.sqlite-wal

-rwxr-xr-x 1 root root  35K Jul 18 20:49 backupRestore-history.json

drwxr-xr-x 2 root root 4.0K Jun 19 07:23 metadata

drwxr-xr-- 2 root root 4.0K Jun 19 07:24 patch-history

-rw-r--r-- 1 root root  602 Jun  8 04:39 ResourceBundle.zip

drwx------ 2 root root 4.0K Jul 18 20:49 session

-rw-r--r-- 1 root root  369 Jul 19 08:02 statsmonitor_health.xml

0 Kudos
kondrichRHI
Enthusiast
Enthusiast

I updated VCSA to 6.5u1 last Friday morning. Right after update and reboot, statsmonitor service was running and data was displayed in FAMI. However, it somewhen stopped until Friday in the evening. VSA backup failed since then.

Please note: 6.5u1 adds alarms for 6.5-new services like statsmonitor (please see: VMware vCenter Server 6.5 Update 1 Release Notes ). So we can configure it to send an email, if the service stops.

0 Kudos
kondrichRHI
Enthusiast
Enthusiast

I had to restart VCSA 6.5u1 last Friday. Statsmonitor did not get started automatically after this reboot (I triggered it via FAMI backend), it was stopped during the weekend. Log Insight tells me:

2017-09-01T13:38:48.609769+02:00 VDCSVCENTER1 vmon 2390 - -  Service statsmonitor api health check cmd returned unknown exit code 1

2017-09-01T13:38:48.609197+02:00 VDCSVCENTER1 vmon 2390 - -  Service statsmonitor api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'

2017-09-01T13:38:39.096032+02:00 VDCSVCENTER1 vmon 2390 - -  Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonEventPublisher.py --eventdata statsmonitor,UNKNOWN,UNHEALTHY,0

2017-09-01T13:38:39.095899+02:00 VDCSVCENTER1 vmon 2390 - -  Re-check service statsmonitor health since it is still initializing.

2017-09-01T13:38:39.095760+02:00 MyvCenterHost vmon 2390 - -  Service statsmonitor api health check cmd returned unknown exit code 1

2017-09-01T13:38:38.657214+02:00 MyvCenterHost vmon 2390 - -  Service statsmonitor api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'

2017-09-01T13:38:32.515739+02:00 MyvCenterHost vmon 2390 - -  Constructed command: /usr/lib/vmware-statsmonitor/statsMonitor.sh /etc/vmware/statsmonitor/statsMonitor.xml

2017-09-01T13:38:32.492675+02:00 MyvCenterHost vmon 2390 - -  Executing op START on service statsmonitor...

2017-09-01T13:38:32.392698+02:00 MyvCenterHost vmon 2390 - -  Adding service statsmonitor.

Again and again and again. Now, after manually starting statsmonitor service, the file statsMonitor.xml is existing in /etc/vmware/statsmonitor.

0 Kudos
tim_841
Enthusiast
Enthusiast

I started seeing this issue since upgrading to 6.5U1. I also had a similar issue with the vmware-content-library, but I found the fix for that.

This is more of an annoyance though, this shouldn't have any direct performance implications. But I would love to get it fixed, right along with the update health issue.

0 Kudos
kondrichRHI
Enthusiast
Enthusiast

I just installed Patch a after 6.5.0u1 and rebooted. Service "vmware-statsmonitor" is stopped after reboot and /var/vmware/applmgmt/statsmonitor_health.xml is missing.

After manually starting vmware-statsmonitor the file statsmonitor_health.xml is existing (369 byte). this is reproduceable after every reboot.

0 Kudos
Raj1988
Enthusiast
Enthusiast

This is a known issue with vSphere 6.5 and should be fixed in the next release.

Please take a snapshot of VCSA without memory and run through the below steps .

1. Take SSH to VCSA using root login.

2.  # Modify statsmonitor service config for vMon to set higher startup timeout:

sed -i '/StartTimeout/d' /etc/vmware/vmware-vmon/svcCfgfiles/statsmonitor.json
sed -i '/ApiHealthFile/a "StartTimeout": 600,' /etc/vmware/vmware-vmon/svcCfgfiles/statsmonitor.json

3.  kill -HUP $(cat /var/run/vmon.pid)

4.  # Stop and start statsmonitor service explicitly.

  /usr/lib/vmware-vmon/vmon-cli -k statsmonitor
  /usr/lib/vmware-vmon/vmon-cli -i statsmonitor

Now reboot the VCSA and check after 10-15 mins and the Statsmonitor Service should start up automatically.

bbiandov
Enthusiast
Enthusiast

I can confirm that the same issue exists with the following appliance but slightly different, same exact error except this time once the appliance is booted one can manually start the monitoring service. It just doesn't start automatically even though it is set for auto start..

Type: vCenter Server with an embedded Platform Services Controller

Product: VMware vCenter Server Appliance

Version: 6.7.0.21000 Build number 11726888

The error I get upon boot and auto start attempt for VMware Appliance Monitoring Service is below:

<statsmonitor> Service api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'
0 Kudos
loungehostmaste
Enthusiast
Enthusiast

and why the $&$#@! can't VMware fix such issues?

[root@vcenter:~]$ cat /usr/local/bin/system-erros.sh

#!/usr/bin/bash

dmesg | grep -i warn

dmesg | grep -i fail

dmesg | grep -i error

cat /var/log/messages | grep -i warn

cat /var/log/messages | grep -i fail

cat /var/log/messages | grep -i error

when i run that script after reboot i could puke when i compare it to my production guests which reboot between 5 and 10 seconds and don't spit anything in that context
maybe because they are unsupported Fedora guests maintained by someone knowing what he is doingwly get tired about all that

i get tired abotu all of that stuff like sendmail (who is using that to begin with) broken for months after very update, lost network with the boradcom native drivers, BS about corrupt or unsupported format when migrate machines to a NFS store and after some retries and praying it works, vMotion not working after every ESXi reboot until disable/enable the firewall rule and so on

NOTE:  Content edited by Moderator for profanity

0 Kudos
bbiandov
Enthusiast
Enthusiast

LOL I hear you man, this is a frustrating profession, lack of accountability, lack of anything other than the will to take one's money for "contracts" and "licensing" and then good luck to all of us Smiley Happy

0 Kudos