Solved: guest info failures for elasticappa0 elasticappb0 ...

niceguy001 · ‎06-11-2018

as shown in fig. attached below:

before this happened, i did re-provision all the tiles(from 0 to 3),

And also tried rebooting PrimeClient, Client VMs and all the workload VMs before executing the VMmark run.

(the IP mapped each VM in the primeclient's /etc/hosts file correctly and i could manually staf ping ping to the VMs via terminal)

but this strange issue occurred, which prevents the test from succeed.

"staxprocessstarterror signal raised. continuing job.", "full status unknown state", "guest info failures for the following machines"

basically the logs only tell the errors but not the reasons(neither stx_job or guestinfofiles...etc)

the test result folder is zipped and attached,

can anyone help???

RebeccaG · ‎06-14-2018

Hi there, when you did a 'staf ping' from the prime client to the ElasticAppA0, ElasticAppB0, and ElasticDB0, did you use the hostname that is present in the prime client's hosts file?

And when you ssh into ElasticAppA0, ElasticAppB0, and ElasticDB0, 'staf primeclient ping ping' returns 'pong'?

Have you altered any of VMs' hosts files since provisioning?

Try looking in the troubleshooting section of the VMmark User's Guide, "STAF and STAX Issues". Since the staf ping is not working, the issue is either going to be network related, or that STAF is not running correctly on the Elastic VMs. I think you've already understood this by trying to reboot the VMs in question, but try taking a look at that section if you haven't already.

View solution in original post

fredab2 · ‎06-13-2018

From the PrimeClient and Client0 can you ssh into the following 3 VMs -ElasticDB0, ElasticAppA0, and ElasticAppB0. If you can, then can you ssh back to Client0 and PrimeClient from each of the 3 VMs listed.

If not, try rebooting those 3 VMs and see if ssh works and if so try run again.

Fred

fredab2 · ‎06-13-2018

Also, can you zip up and attach the provisioning directory from where you said you just provisioned the VMs again.

Fred

niceguy001 · ‎06-14-2018

fredab2 hi, thanks for reply

the related folders are attached.

the ssh worked just fine on either prime, client0 or other VMs, they can ssh to each other with no problems.

i did 4-tiles turbo run and got the same issue, and i couldn't figure out why(screenshot is below):

(the ssh of the failed VMs are good)

i've tried rebooting all the VMs and perform another test run but still failed, as shown in fig. below:

(these workloads can be staf pinged from primeclient)

RebeccaG · ‎06-14-2018

Hi there, when you did a 'staf ping' from the prime client to the ElasticAppA0, ElasticAppB0, and ElasticDB0, did you use the hostname that is present in the prime client's hosts file?

And when you ssh into ElasticAppA0, ElasticAppB0, and ElasticDB0, 'staf primeclient ping ping' returns 'pong'?

Have you altered any of VMs' hosts files since provisioning?

Try looking in the troubleshooting section of the VMmark User's Guide, "STAF and STAX Issues". Since the staf ping is not working, the issue is either going to be network related, or that STAF is not running correctly on the Elastic VMs. I think you've already understood this by trying to reboot the VMs in question, but try taking a look at that section if you haven't already.

niceguy001 · ‎06-20-2018

hi RebeccaG

thanks for the prompt!

after some deep check on physical network i discovered that there were physical issueson my switch's ports

which cause the following error repeatedly : "get guest info failures...", "could not staf ping the following X machines..." and "staxprocessstarterror signal raised. continuing job.", etc

the test became just fine after i use another normal switch

RebeccaG · ‎06-21-2018

I'm glad to hear you were able to resolve the issue. Thanks for following up here about the solution.

All

guest info failures for elasticappa0 elasticappb0 elasticdb0?