VMware Performance Community
mathieugont
Enthusiast
Enthusiast

VMmark: staf returns RC = 17 (file-system problem)

Dear community

I am trying to run VMmark -> Weathervane/Auction but, even if my staf commands like "staf $host MISC WHOAMI" succeed, it systematically crashes on FS operation with RC = 17.

In other words, staf says there is an issue with the filesystem.

Any idea?

Tags (2)
0 Kudos
5 Replies
dmorse
VMware Employee
VMware Employee

Hi mathieugont

In other words, staf says there is an issue with the filesystem.

Right; this tells me, there definitely seems to be an issue with your underlying storage infrastructure.  Did you see my request on the other thread you created?

Namely:

Can you elaborate on:

  • Server make/model (ESXi hosts)
  • Storage make/model (including types of vendor/capacity of your drives, whether they're HDDs [and if so what speed] or SSDs, and what RAID/LUN configurations they are in?  Are you using vSAN?
  • ESXi and vCenter version, including exact build numbers

Thanks, David

0 Kudos
mathieugont
Enthusiast
Enthusiast

Hi David

Sorry, I should give more details.

To answer to your questions:

- Dual socket BDW E5-2699v4 servers

- 780GB DRAM per server

- 1x local ATA disk (1.8TB HDD) per server (Western Digital?), partition format GPT

- ESXi-6.7.0.13006603

- Each server's HDD is a datastore based on VMFS-6.82

- no vSan (i guess)

- vSphere Client 6.7.0.30000

- All my clients are on a hardware server (ESXi001), except the tile and the prime clients hosted on another one (ESXi005).

0 Kudos
mathieugont
Enthusiast
Enthusiast

Below some additional info...

The HDD of my ESXi server was full because I created additional disk. Can it be the reason of RC = 17?

After deleting these disk, I made 600GB of free space and rerun. Find the stdout/stderr in attachment.

According to the 1st error (the others are same), it is now RC = 16:

The process failed to start, RC: 16, STAFResult: STAFConnectionProviderConnect: Client SSL handshake timed out: 22, Endpoint: ssl://AuctionLB0

But I can run:

>>> time staf AuctionNoSQL0 ping ping

Response

--------

PONG

real    0m20.076s

user    0m0.003s

sys     0m0.004s

>>> ping -c 5 AuctionNoSQL0

PING AuctionNoSQL0 (172.22.197.237) 56(84) bytes of data.

64 bytes from AuctionNoSQL0 (172.22.***.***): icmp_seq=1 ttl=64 time=0.576 ms

64 bytes from AuctionNoSQL0 (172.22.***.***): icmp_seq=2 ttl=64 time=0.635 ms

64 bytes from AuctionNoSQL0 (172.22.***.***): icmp_seq=3 ttl=64 time=0.659 ms

64 bytes from AuctionNoSQL0 (172.22.***.***): icmp_seq=4 ttl=64 time=0.664 ms

64 bytes from AuctionNoSQL0 (172.22.***.***): icmp_seq=5 ttl=64 time=0.663 ms

--- AuctionNoSQL0 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4001ms

rtt min/avg/max/mdev = 0.576/0.639/0.664/0.040 ms

To be continued...

0 Kudos
dmorse
VMware Employee
VMware Employee

Hi mathieugont

Thank you for providing your hardware specs; just to quote them again:

- 1x local ATA disk (1.8TB HDD) per server (Western Digital?), partition format GPT

- ESXi-6.7.0.13006603

- Each server's HDD is a datastore based on VMFS-6.82

- no vSan (i guess)

I see two problems here:

1. You don't have vSAN, as you guessed, but that is not a requirement.  What is required is shared storage (i.e. some kind of SAN, iSCSI, NFS, etc.).
Per the VMmark User's Guide:

System Under Test Storage Requirements

Configure the system under test with enough shared datastore capacity to hold the disks and paging files for

all the virtual machines required for the VMmark Benchmark runs. This is approximately 891GB per tile (not

counting the storage space required by the prime and tile clients, addressed in “Storage Requirements for

VMmark Clients” on page 35).

NOTE In order to provide source and target datastores for the storage relocation operations that are part of

the benchmark, VMmark requires a minimum of two datastore partitions.

If using VMware vSAN™ as the primary storage solution, a secondary storage solution will be needed for

infrastructure operations.

Additionally, the benchmark requires that all ESXi hosts used in a test have access to the same shared storage.

2. The fact that you only have one local (not shared) HDD per ESXi host / server is problematic for several reasons:

  • 1 x local ATA 1.8TB HDD = probably 10K RPM (not SSD) - the IOPS (throughput) will be very low (~500?) - this is well below the requirements of VMmark.

Per the User's Guide:

The VMmark benchmark needs high-throughput, low-latency storage. While the exact bandwidth

requirements will vary based on other aspects of the environment, a single VMmark tile can drive about 3500

IOPS (Input/Output Operations Per Second), with additional tiles typically each driving somewhat less. The

latency requirements will also vary based on other aspects of the environment; a review of published VMmark

results will provide a sense of the storage solutions that work with VMmark. Thus in addition to ensuring that

you have enough storage capacity, you should also make sure your storage system will have adequate

performance.

So, I think a lot of the problems you have with the VMmark3 benchmark is due to the lack of a high-performance, shared storage array.  I would recommend looking at VMmark 3.x Results​ -- those PDFs have a section on their storage hardware, most of which consistent of all-flash storage arrays.

0 Kudos
mathieugont
Enthusiast
Enthusiast

Hi David

Thanks for your advice.

Please look at the new post

VMmark/Auction fails with RC=1

(I could not replied to this one... weird).

0 Kudos