PowerRails
Contributor
Contributor

STAF Service Failing Frequently When Starting a JOB

Hi,

I have been having frequent issue when starting jobs with VMmark3. I am using the latest version and on average, 6 out of 10 job submissions fail.

I know when it fails when the process stays "stuck" at the VMmark Initialization Phase.

PowerRails_0-1610021852302.png

 

When I try to display the logs I get this error:

PowerRails_1-1610021891084.png

 

What I usually do, to "workaround" this issue is simply restart the PrimeClient VM, but this has become a major pain recently as it seems to me that the software is very unreliable to get a job started. Once it starts, it works fine and I was already able to run a couple of benchmarks, but its really annoying to have to reboot several times the VM.

 

These are some of the log files

journalctl _SYSTEMD_UNIT=startstaf.service

Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140690240665344;00000100;Error accepting on server socket, socket RC: 24
Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140690240665344;00000100;Error accepting on server socket, socket RC: 24
Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140690240665344;00000100;Error accepting on server socket, socket RC: 24
Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140690240665344;00000100;Error accepting on server socket, socket RC: 24
Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140684513220352;00000100;Received signal 11 (SIGSEGV)
Jan 07 07:08:45 primeclient startSTAFProc.sh[767]: 20210107-07:08:45;140684513220352;00000100;Received signal 6 (SIGABRT)
Jan 07 07:08:45 primeclient stopSTAFProc.sh[6743]: Error registering with STAF, RC: 21

 

Could you help me to find out the cause of this instability ?

 

Thanks !

 

0 Kudos
4 Replies
PowerRails
Contributor
Contributor

Just got these error now:

 

PowerRails_0-1610022903147.png

PowerRails_1-1610022923745.png

 

 

It's really unstable ... and I feel lucky when I get a job submitted.

 

0 Kudos
PowerRails
Contributor
Contributor

I start suspecting this might be a network configuration issue.

I have 2 interfaces on the PrimeClient VM, one external and one internal.

I removed the external from the VM and unconfigured it from the CentOS and when I start VMmark it always complains about the old IP

PowerRails_0-1610030149238.png

 

Is there a way to "force" VMmark to look for the internal IP only ?

0 Kudos
jpschnee
VMware Employee
VMware Employee

Apologies for the delay in response.  VMmark isn't dictating which network to utilize it simply issues the commands and the system (via routing tables) sends the traffic.  This is not something that has been an issue before.  My suggestion would be to make sure you didn't miss a step in the user guide's instructions for configuring the Prime Client or that something didn't modify the routing tables.

What version of STAF/STAX are you using?

-Joshua
0 Kudos
PowerRails
Contributor
Contributor

Thanks Josh,

 

No problems, wasn't something really urgent.

It turns out that I no longer have that issue. All jobs are able to start quickly.

I believe it was due to a misconfiguration on my PrimeClient VM (/etc/hosts file likely).

 

0 Kudos