3 Replies Latest reply on May 10, 2019 8:53 AM by aNinjaneer

    Witness appliance PSOD on heavy writes

    aNinjaneer Novice

      I have a two-node vSAN 6.7U2 cluster I setup with a witness appliance running on a separate vSAN cluster that I'm doing some testing on, using HCIBench to generate load. Every time I run a 100% write workload, I get a PSOD on the witness appliance, and it kills the capacity drive on the appliance. It's 100% repeatable. Has anyone else seen this, or know how to tune/fix it? This is a fairly high performance configuration, hitting about 15-20K write IOPS per node, so perhaps the writes are somehow overpowering the storage layer, causing the failure. This is my first time doing performance testing with the witness appliance, so any insight would be helpful. I've run numerous other tests on 4-node clusters without the appliance and have never had an issue like this.

        • 1. Re: Witness appliance PSOD on heavy writes
          Jasemccarty Champion
          vExpertVMware Employees

          aNinjaneer


          Interesting, as writes don't go to the vSAN Witness Host. Only changes to metadata. Powering on VM's, powering them off, taking snapshots, etc.

           

          Have you reached out to GSS to open a ticket?

           

          I'd like to know more. Curious what type of hardware your vSAN Witness Appliance is running on.

           

          Also, can you send me a private message with your email?

           

          Thanks,

          Jase

          • 2. Re: Witness appliance PSOD on heavy writes
            gerrywei4455 Novice

            just wanna make sure there's no VMs deployed on the witness right? I mean, in that 2-node cluster, you only have two physical hosts and the witness is not in the cluster?

            how many vms you deployed? whats the data disk number/size? whats the workload look like?

            • 3. Re: Witness appliance PSOD on heavy writes
              aNinjaneer Novice

              I deleted all disk groups and recreated them, and I haven't seen the same failure since. However, now I have a new issue, which is one that I've seen in previous testing. I'm unable to create more than three disk groups. When I create a fourth disk group, the disk group shows no disk format version and I get an Operation Health warning for that disk group. I have tested with numerous different drives, both NVMe and SATA, and it always does the same thing. I will create a separate discussion for that matter.