UM3.5 - The Usage Meter services have been restart...

kevin_croughs · ‎02-14-2017

Hi,

We receive a lot of e-mails from our Usage Meter 3.5 appliance, saying "The Usage Meter services have been restarted because of an error" without anymore details.

However when I login to the web interface of the appliance, I can't see any errors or failures, all collections were successful.

Can you help me find what is causing these e-mails?

Thank you.

Kind regards,

Kevin Croughs

James_Tsiao · ‎02-14-2017

Hi Kevin,

How often are you receiving these email messages? Do they follow any sort of pattern, such as every midnight?

The email is sent by a cron job that monitors the availability of Usage Meter. It checks every 20 minutes and if Usage Meter fails to respond, it sends out the email and restarts Usage Meter. Conceivably Usage Meter can fail to respond for a variety of reasons, from internal lockup to operating system level issues.

The Usage Meter logs are located in /var/log/usgmtr/. You can log into console and check whether any errors are logged there. Also, since it could be a system level issue, please check system logs, such as /var/log/messages, to see if anything happened at the system level, such as processes like Tomcat or Postgres being killed.

Thanks,

James.

kevin_croughs · ‎02-14-2017

Hi James,

There doesn't seem to be a pattern in the frequency of the emails, some days we receive them multiple times, some days we don't receive any at all. Also the time is different each time.

What I also noticed is that sometimes the appliance uses all of its CPU (100% usage for extended periods of time) resulting in the web interface being unresponsive. The only thing that seems to help then is to just power off / power on the appliance (VMware Tools reboot doesn't respond either).

I will have a look at the log files you mentioned and will come back with more information asap.

Thank you.

Kind regards,

Kevin

kevin_croughs · ‎02-14-2017

Hi James,

I *tried* to have a look at the log files. In /var/log/usgmtr I can find several log files (error.log error.log.1 error.log.2 etc..), at first sight I can't find anything serious in there but I'm afraid I'm not a skilled linux log analyzer 🙂

Also the /var/log/messages doesn't seem to show any relevant information to me.

Maybe it's better to create a log bundle in the web interface and log a support ticket ?

Kind regards,

Kevin

James_Tsiao · ‎02-15-2017

Hi Kevin,

Before you create a support bundle, go to the Support page in the UI and change the Log Level to "Trace". It's located just above the generate support bundle button. Then, wait for the next email, and then generate a support bundle. In your support ticket, mention the time when the email was sent so we can hopefully correlate the time to events in the logs. Also in the support ticket mention this thread so if another engineer looks at it, he or she has a frame of reference.

Thanks,

James.

nitin17786 · ‎02-20-2017

I'm also facing the same issue and get the same message - "The Usage Meter services have been restarted because of an error."

I have 2 different deployments of vCenter 6.0 with Usage Meter 3.5 version in 2 different locations and both setups starting to experience the same issue recently.

Initially there were no such errors reported for more than a week of deployment but recently from last 5 days i'm getting emails every hour stating the same error message even though Usage Meter is able to run the hourly reports with no issues.

Checked the logs in /var/log/usgmtr/ & /var/log/messages as mentioned above but since I'm not a Linux expert, i was not able to find any info or errors which could suggest the cause of this issue.

To me, it looks like a bug or known issue in vCUM 3.5 since both setups have only 1 vCenter deployed for testing and have minimum load on it with sufficient resources available for vCUM to function.

Please advice on further troubleshooting or fix for it asap.

kevin_croughs · ‎02-22-2017

Meanwhile we have a support case open for this issue (SR 17382555502), VMware is investigating what is going wrong in our environment.

Also the problem with the 100% cpu usage and the web interface being unresponsive keeps coming back a few days after a reboot of the virtual appliance.

To be continued...

All

UM3.5 - The Usage Meter services have been restarted because of an error

Usage Meter 3.5.0