ManivelR
Enthusiast
Enthusiast

vCloud Usage Meter Hourly Collection, Errors Detected

Hi All,

We are getting the following message every 3 hours.We logged a case with VMware and still investigation is going on...

Product: vCenter Server, Hostname: 0.0.0.0, Message: Collection didn't finish successfully. Please check the log.

As per the their observations,we have done the following tasks so far but no luck sofar.

1) Upgraded the UM from 3.6.1 to 3.6.1 Hot patch2

2) Increased the Heap value to 10 G from UM.

Our UM has only 8 vCenter servers only and I dont  know why this alert is getting generated frequently ?

Time: 2019-07-28 16:35:00, Product: vCenter Server, Hostname: 0.0.0.0, Message: Collection didn't finish successfully. Please check the log.

Collection log:-

2019-07-28 13:20:00,479 ERROR [Primary collection timer] collect.Collector: Collection didn't finish within 45 minutes.

java.util.concurrent.TimeoutException: Futures timed out after [45 minutes]

at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)

at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)

at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

at scala.concurrent.Await$.ready(package.scala:86)

at com.vmware.cloud.usgmtr.collect.Collector$.collectAll(Collector.scala:256)

at com.vmware.cloud.usgmtr.collect.Collector$.com$vmware$cloud$usgmtr$collect$Collector$$notifyAndCollect(Collector.scala:195)

at com.vmware.cloud.usgmtr.collect.Collector$$anon$1.runIfNotBusy$1(Collector.scala:138)

at com.vmware.cloud.usgmtr.collect.Collector$$anon$1.run(Collector.scala:152)

at java.util.TimerThread.mainLoop(Timer.java:555)

at java.util.TimerThread.run(Timer.java:505)

Can someone help me to troubleshoot further ? I dont know about these JAVA time out error. Do we have any option to increase the JAVA time outs?

Regards,

Manivel RR

0 Kudos
2 Replies
jhammons2
VMware Employee
VMware Employee

Hi Manivel,

From the SR and our previous discussion with development: this is environmental and due to network latency.

For understanding the error: Usage Meter is failing to get a full reply from all endpoints added to Usage Meter within 45 minutes. We can not increase the timeout. Investigation has to be done on the environment, and the issue is most likely network latency.

The issue was resolved by splitting up the environment. The reason the error took 24 hours to disappear is Usage Meter can still run into an issue with the tomcat service and EventsMonitor; basically the entire appliance can fail  out until a healthcheck service restarts tomcat. The healthcheck script itself can error out.

ManivelR
Enthusiast
Enthusiast

Thanks much for the response John.

0 Kudos