VMware Cloud Community
aNinjaneer
Contributor
Contributor

Witness appliance PSOD on heavy writes

I have a two-node vSAN 6.7U2 cluster I setup with a witness appliance running on a separate vSAN cluster that I'm doing some testing on, using HCIBench to generate load. Every time I run a 100% write workload, I get a PSOD on the witness appliance, and it kills the capacity drive on the appliance. It's 100% repeatable. Has anyone else seen this, or know how to tune/fix it? This is a fairly high performance configuration, hitting about 15-20K write IOPS per node, so perhaps the writes are somehow overpowering the storage layer, causing the failure. This is my first time doing performance testing with the witness appliance, so any insight would be helpful. I've run numerous other tests on 4-node clusters without the appliance and have never had an issue like this.

Reply
0 Kudos
3 Replies
Jasemccarty
Immortal
Immortal

aNinjaneer


Interesting, as writes don't go to the vSAN Witness Host. Only changes to metadata. Powering on VM's, powering them off, taking snapshots, etc.

Have you reached out to GSS to open a ticket?

I'd like to know more. Curious what type of hardware your vSAN Witness Appliance is running on.

Also, can you send me a private message with your email?

Thanks,

Jase

Jase McCarty - @jasemccarty
Reply
0 Kudos
gerrywei4455
Contributor
Contributor

just wanna make sure there's no VMs deployed on the witness right? I mean, in that 2-node cluster, you only have two physical hosts and the witness is not in the cluster?

how many vms you deployed? whats the data disk number/size? whats the workload look like?

Reply
0 Kudos
aNinjaneer
Contributor
Contributor

I deleted all disk groups and recreated them, and I haven't seen the same failure since. However, now I have a new issue, which is one that I've seen in previous testing. I'm unable to create more than three disk groups. When I create a fourth disk group, the disk group shows no disk format version and I get an Operation Health warning for that disk group. I have tested with numerous different drives, both NVMe and SATA, and it always does the same thing. I will create a separate discussion for that matter.

Reply
0 Kudos