from time to time the VCSA 6.5e statsmonitor service stops unexpectedly. This breaks the ability to backup VCSA data via PowerCLI script. Another consequence is that there are no CPU/Memory/Database Statistics collected in VCSA management backend.
vRealize LogInsight Log excerpt is attached for the considered timeframe and filtered by "statsmonitor". I cannot tell from this log why it tried to add this service, tried to start it and then failed.
So the stop occurred on June, 13th for the last time. It remained stopped since then. Before June, 13th, I realized it and fixed it on June, 8th.
I see this problem occurring one one of our VCSA 6.5 instances but not on the other ones.
Does anyone else see this behavior or any idea what's wrong?
From the events, it shows the below file does not exists or cannot read
Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml
can you enable a putty session or login to shell of VCSA
execute the "ls -lah" command and check if the file exists statsmonitor_health.xml and also check the size of the file
Please copy the output of ls -lah
Yes, I saw this message, too. When the service is started, the file is existing.
ls -lah currently reports (service is running):
drwxr-xr-x 5 root root 4.0K Jul 19 08:02 .
drwxr-xr-x 5 root root 4.0K Mar 27 11:04 ..
-rwxr-xr-x 1 root root 509 Jul 2 05:59 appliance-health.xml
-rwxr-xr-x 1 root root 5.3K Jul 2 05:59 appliance.stats
-rw-r--r-- 1 root root 27M Jul 19 07:32 appliance_stats.sqlite
-rw-r--r-- 1 root root 32K Jul 19 08:02 appliance_stats.sqlite-shm
-rw-r--r-- 1 root root 1.0M Jul 19 08:02 appliance_stats.sqlite-wal
-rwxr-xr-x 1 root root 35K Jul 18 20:49 backupRestore-history.json
drwxr-xr-x 2 root root 4.0K Jun 19 07:23 metadata
drwxr-xr-- 2 root root 4.0K Jun 19 07:24 patch-history
-rw-r--r-- 1 root root 602 Jun 8 04:39 ResourceBundle.zip
drwx------ 2 root root 4.0K Jul 18 20:49 session
-rw-r--r-- 1 root root 369 Jul 19 08:02 statsmonitor_health.xml
I updated VCSA to 6.5u1 last Friday morning. Right after update and reboot, statsmonitor service was running and data was displayed in FAMI. However, it somewhen stopped until Friday in the evening. VSA backup failed since then.
Please note: 6.5u1 adds alarms for 6.5-new services like statsmonitor (please see: VMware vCenter Server 6.5 Update 1 Release Notes ). So we can configure it to send an email, if the service stops.
I had to restart VCSA 6.5u1 last Friday. Statsmonitor did not get started automatically after this reboot (I triggered it via FAMI backend), it was stopped during the weekend. Log Insight tells me:
2017-09-01T13:38:48.609769+02:00 VDCSVCENTER1 vmon 2390 - - Service statsmonitor api health check cmd returned unknown exit code 1
2017-09-01T13:38:48.609197+02:00 VDCSVCENTER1 vmon 2390 - - Service statsmonitor api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'
2017-09-01T13:38:39.096032+02:00 VDCSVCENTER1 vmon 2390 - - Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonEventPublisher.py --eventdata statsmonitor,UNKNOWN,UNHEALTHY,0
2017-09-01T13:38:39.095899+02:00 VDCSVCENTER1 vmon 2390 - - Re-check service statsmonitor health since it is still initializing.
2017-09-01T13:38:39.095760+02:00 MyvCenterHost vmon 2390 - - Service statsmonitor api health check cmd returned unknown exit code 1
2017-09-01T13:38:38.657214+02:00 MyvCenterHost vmon 2390 - - Service statsmonitor api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'
2017-09-01T13:38:32.515739+02:00 MyvCenterHost vmon 2390 - - Constructed command: /usr/lib/vmware-statsmonitor/statsMonitor.sh /etc/vmware/statsmonitor/statsMonitor.xml
2017-09-01T13:38:32.492675+02:00 MyvCenterHost vmon 2390 - - Executing op START on service statsmonitor...
2017-09-01T13:38:32.392698+02:00 MyvCenterHost vmon 2390 - - Adding service statsmonitor.
Again and again and again. Now, after manually starting statsmonitor service, the file statsMonitor.xml is existing in /etc/vmware/statsmonitor.
I started seeing this issue since upgrading to 6.5U1. I also had a similar issue with the vmware-content-library, but I found the fix for that.
This is more of an annoyance though, this shouldn't have any direct performance implications. But I would love to get it fixed, right along with the update health issue.
I just installed Patch a after 6.5.0u1 and rebooted. Service "vmware-statsmonitor" is stopped after reboot and /var/vmware/applmgmt/statsmonitor_health.xml is missing.
After manually starting vmware-statsmonitor the file statsmonitor_health.xml is existing (369 byte). this is reproduceable after every reboot.
This is a known issue with vSphere 6.5 and should be fixed in the next release.
Please take a snapshot of VCSA without memory and run through the below steps .
1. Take SSH to VCSA using root login.
2. # Modify statsmonitor service config for vMon to set higher startup timeout:
sed -i '/StartTimeout/d' /etc/vmware/vmware-vmon/svcCfgfiles/statsmonitor.json
sed -i '/ApiHealthFile/a "StartTimeout": 600,' /etc/vmware/vmware-vmon/svcCfgfiles/statsmonitor.json
3. kill -HUP $(cat /var/run/vmon.pid)
4. # Stop and start statsmonitor service explicitly.
/usr/lib/vmware-vmon/vmon-cli -k statsmonitor
/usr/lib/vmware-vmon/vmon-cli -i statsmonitor
Now reboot the VCSA and check after 10-15 mins and the Statsmonitor Service should start up automatically.
I can confirm that the same issue exists with the following appliance but slightly different, same exact error except this time once the appliance is booted one can manually start the monitoring service. It just doesn't start automatically even though it is set for auto start..
Type: vCenter Server with an embedded Platform Services Controller
Product: VMware vCenter Server Appliance
Version: 126.96.36.19900 Build number 11726888
The error I get upon boot and auto start attempt for VMware Appliance Monitoring Service is below:
<statsmonitor> Service api-health command's stderr: Error getting service health. Error: Failed to read health xml file: /var/vmware/applmgmt/statsmonitor_health.xml. Error: [Errno 2] No such file or directory: '/var/vmware/applmgmt/statsmonitor_health.xml'
and why the $&$#@! can't VMware fix such issues?
[root@vcenter:~]$ cat /usr/local/bin/system-erros.sh
dmesg | grep -i warn
dmesg | grep -i fail
dmesg | grep -i error
cat /var/log/messages | grep -i warn
cat /var/log/messages | grep -i fail
cat /var/log/messages | grep -i error
when i run that script after reboot i could puke when i compare it to my production guests which reboot between 5 and 10 seconds and don't spit anything in that context
maybe because they are unsupported Fedora guests maintained by someone knowing what he is doingwly get tired about all that
i get tired abotu all of that stuff like sendmail (who is using that to begin with) broken for months after very update, lost network with the boradcom native drivers, BS about corrupt or unsupported format when migrate machines to a NFS store and after some retries and praying it works, vMotion not working after every ESXi reboot until disable/enable the firewall rule and so on
NOTE: Content edited by Moderator for profanity
LOL I hear you man, this is a frustrating profession, lack of accountability, lack of anything other than the will to take one's money for "contracts" and "licensing" and then good luck to all of us