Solved: Re: Get a list of VMs restarted by HA break the vR...

ririmia · ‎04-12-2020

Hello,

I'm trying to create a workflow to get a list of VMs restarted by HA.

Below script is working as expected in the meaning that is getting the list of VMs but at the last point when it should exit the last "for" is literally breaking the vRO.

for each (var vm in allVMs){

var beginTime = new Date();

//Set the Start Date where we chose to parse the events

beginTime.setHours(beginTime.getHours()-24));

//Create Event Specifications

var vcEventFilterSpec = new VcEventFilterSpec();

var spec = new VcEventFilterSpec();

var vcEventFilterSpecByTime = new VcEventFilterSpecByTime();

vcEventFilterSpec.time = vcEventFilterSpecByTime;

vcEventFilterSpecByTime.beginTime = beginTime;

//Filter by the Type ID of HA Event

spec.eventTypeId = ["com.vmware.vc.ha.VmRestartedByHAEvent"];

spec.entity = new VcEventFilterSpecByEntity();

spec.entity.entity = vm;

spec.entity.recursion = VcEventFilterSpecRecursionOption.self;

spec.time = vcEventFilterSpecByTime;

var events = vm.sdkConnection.eventManager.queryEvents(spec);

//For each VM we query VM restart Event on the last 24 hours

for each (var ev in events) {

//System.log(ev);

System.debug("VM: " + ev.vm.name);

System.debug("Hostname: " + ev.host.name);

System.debug("Cluster: " + ev.computeResource.name);

System.debug("Date: " + ev.createdTime);

restartLogs.push(ev);

}

After this error message the vRO service is stopped and a big core.java file is dumped on the disk.

vRO76:/storage/ext/core # ls -altrh

total 1.6G

drwxr-xr-x 3 root root 4.0K Mar 21 2019 ..

drwxrwxr-x 2 root coredump 4.0K Apr 11 20:57 .

-rw------- 1 vco vco 3.1G Apr 11 20:57 core.java.5245

I tried with version 7.1, 7.2 and 7.6. No special vRO customization. Just out of the box. Same results on all versions.

Can somebody give me a clue? It's just a vRO bug?

Thanks.

jonathanvh · ‎04-13-2020

Hi,

I've tried running this in my home lab (vRO 8.0.1) and there it runs without crashing the Orchestrator.

In your code there is also a ")" too much.

beginTime.setHours(beginTime.getHours()-24));

=>

beginTime.setHours(beginTime.getHours()-24);

View solution in original post

eoinbyrne · ‎04-12-2020

It's possible that vRO is crashing due to an OutOfMemoryError - check the file /var/log/vco/appserver/server.log (or just run grep OutOfMemory *.log in that directory)

If that is the case then you'll need to update the Java Heap allocation for the vRO JVM - KB here for standalone vRO - VMware Knowledge Base

- KB here for embedded vRO in a vRA appliance - VMware Knowledge Base

Do you have any idea how many events you would expect there for your 24 hour window?

jonathanvh · ‎04-13-2020

Hi,

I've tried running this in my home lab (vRO 8.0.1) and there it runs without crashing the Orchestrator.

In your code there is also a ")" too much.

beginTime.setHours(beginTime.getHours()-24));

=>

beginTime.setHours(beginTime.getHours()-24);

ririmia · ‎04-13-2020

@jonathanvh: You can disregard the ")" in the code. This is just a typo in my attempt to simplify the code for this forum by removing a variable.

Initial code: "beginTime.setHours(beginTime.getHours()-(24*lastNumberOfDays));"

Thanks.

ririmia · ‎04-13-2020

@eoinbyrne: I'm expecting only one VM as the Debug log is correctly listing.

Below are the logs with OutOfMemoryError:

vRO76:/var/log/vco/app-server # cat server.log | grep OutOfMemoryError

2020-04-12 17:40:17.345+0000 [pool-1-thread-1] WARN {} [CacheConfiguration] Cache: vcoPrincipalCache has a maxElementsInMemory of 0. This might lead to performance degradation or OutOfMemoryError at Terracotta client.From Ehcache 2.0 onwards this has been changed to mean a store with no capacity limit. Set it to 1 if you want no elements cached in memory

2020-04-12 17:40:17.373+0000 [pool-1-thread-1] WARN {} [CacheConfiguration] Cache: vcoOriginatorBySessionCache has a maxElementsInMemory of 0. This might lead to performance degradation or OutOfMemoryError at Terracotta client.From Ehcache 2.0 onwards this has been changed to mean a store with no capacity limit. Set it to 1 if you want no elements cached in memory

2020-04-12 17:40:17.373+0000 [pool-1-thread-1] WARN {} [CacheConfiguration] Cache: vcoSessionCache has a maxElementsInMemory of 0. This might lead to performance degradation or OutOfMemoryError at Terracotta client.From Ehcache 2.0 onwards this has been changed to mean a store with no capacity limit. Set it to 1 if you want no elements cached in memory

2020-04-12 17:40:50.316+0000 [Thread-27] INFO {} [O11N] InputArguments: [-Djava.util.logging.config.file=/var/lib/vco/app-server/conf/logging.properties, -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager, -Djdk.tls.ephemeralDHKeySize=2048, -Djava.protocol.handler.pkgs=org.apache.catalina.webresources, -Dorg.apache.catalina.security.SecurityListener.UMASK=0027, -Djava.awt.headless=true, -Dch.dunes.install-path=/var/lib/vco/app-server/.., -Djavax.net.ssl.trustStore=/var/lib/vco/app-server/../app-server/conf/security/jssecacerts, -Dtomcat.truststoreFile=/var/lib/vco/app-server/../app-server/conf/security/tctruststore, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote, -Djsse.enableSNIExtension=false, -Dfile.encoding=UTF-8, -XX:+UnlockDiagnosticVMOptions, -XX:+PrintCommandLineFlags, -XX:+PrintFlagsFinal, -Xmx2560m, -Xms2560m, -Xmn896m, -XX:MetaspaceSize=512m, -XX:MaxMetaspaceSize=1024m, -Xss256k, -XX:SurvivorRatio=4, -XX:TargetSurvivorRatio=90, -XX:MaxTenuringThreshold=8, -XX:+ParallelRefProcEnabled, -XX:+CMSParallelSurvivorRemarkEnabled, -XX:+AggressiveOpts, -XX:+UseConcMarkSweepGC, -XX:+UseParNewGC, -XX:+CMSParallelRemarkEnabled, -XX:+ScavengeBeforeFullGC, -XX:+CMSScavengeBeforeRemark, -XX:CMSInitiatingOccupancyFraction=65, -XX:+UseCMSInitiatingOccupancyOnly, -XX:ConcGCThreads=2, -XX:ParallelGCThreads=2, -XX:CMSFullGCsBeforeCompaction=1, -XX:CMSMaxAbortablePrecleanTime=10000, -XX:+DisableExplicitGC, -XX:+CMSClassUnloadingEnabled, -XX:+OptimizeStringConcat, -XX:+UseCondCardMark, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/vco/app-server/../app-server/logs/vco.hprof, -XX:ErrorFile=/var/lib/vco/app-server/../app-server/logs/vro_%p.error_log, -XX:-PrintGCCause, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -XX:+PrintGCDateStamps, -Xloggc:/var/lib/vco/app-server/../app-server/logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=5, -XX:GCLogFileSize=50M, -XX:+PrintTenuringDistribution, -javaagent:/usr/lib/vco/app-server/bin/notsoserial.jar, -Dnotsoserial.blacklist=/var/lib/vco/app-server/../app-server/conf/remote_facade_serialization_blacklist.txt, -Dignore.endorsed.dirs=, -Dcatalina.base=/var/lib/vco/app-server, -Dcatalina.home=/usr/share/tomcat, -Djava.io.tmpdir=/var/lib/vco/app-server/temp]

To increase the appliance memory and the Java HeapSize for a lab with 2 ESXi hosts and 2 powered on VMs so the vRO can get from VM logs one event is a little bit too much. Don't you think?

I increased the memory heap size as you suggested and the only thing is that vRO will stay a little bit longer until eventually is crashing.

vRO workflow end with the same error:

[2020-04-13 14:18:19.475] [E] Workflow execution stack:

***

', state: 'failed', business state: 'null', exception: 'null'

*** End of execution stack.

I mean this for sure it's a vRO bug and not a code error.

eoinbyrne · ‎04-13-2020

I agree, it's certainly a server problem & I would not expect that your code is the issue. Since the server is still dropping a core dump of the JVM then it will report an Execption/Error in the server.log or catalina.out files for the instance so it should be clear what the underlying problem is.

What error messages are reported prior to the crash? (the timestamp of the core file will indicate when it was created and the JVM would have been spitting errors before that if you need to isolate a timeline for this)

ririmia · ‎04-16-2020

@eoinbyrne: Thanks for your help but this investigation is going to deep and the result will not help much.

I was expecting for a VMware employee to step in and give a real feedback. There are just a few lines of code to be tested that could solve a vRO bug.

Thank you again!

I'll go with @jonathanvh suggestion and test vRO 8.0.1. Anyway, v8.x is really bad related to all the missing features comparing with Java client of the previous versions.

jonathanvh · ‎05-13-2020

Just an FYI, it works in Orchestrator 8.1.0 too in my home lab.

All

Get a list of VMs restarted by HA break the vRO.