VMware Cloud Community
MarcLaf
Enthusiast
Enthusiast
Jump to solution

Need to move vSAN tag to new vmknic but job keeps failing

Our 2-Node vSAN cluster has been humming away for quite some time now and it's only until now that a few health checks have become noticed. Turns out it appears that the networking portion on the Witness is misconfigured. Both the vmk0 (Management) and vmk1 (WitnessPG) are on the same subnet which is an unsupported configuration. The health check that's failing is the hosts with connectivity and all hosts have a vSAN vmknic configured which is result of the Multi-Homing issue https://kb.vmware.com/s/article/2010877)

For the rest of the cluster, the physical hosts have the vSAN Witness tagged on their management interfaces.

From going over the 2-Node Cluster Guide again, it would appear that to move this to a supported config all I need to do is move the vSAN tag from the Witness' vmk1 to vmk0 (Management). Is that all or am I missing something?

Also, is there a supported process for re-tagging the Witness vmknic's because I tried to uncheck vSAN tag in vCenter and check the other interface as vSAN but the changes didn't commit. Whenever I try to enable vSAN tag on the Management interface the job fails because one of the tasks returns with "A general system error occurred: Unable to load module /usr/lib/vmware/vmkmod/cmmds: Busy".  Does the witness need to be in maintenance mode first? Or do I need to make these changes in esxcli even though the options are there in vCenter?

Labels (3)
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

@MarcLaf  the health check you mentioned triggers if the configuration information of a vsan-tagged vmk for a node isn't returned to the health check - this can occur due none being tagged or some other issue preventing this from returning to the query in a timely manner. From the other points you mentioned this looks to be the latter.


"A general system error occurred: Unable to load module /usr/lib/vmware/vmkmod/cmmds: Busy" is a quite severe sign of general unhealthiness of this Witness - was this health check preceded by any events such as Witness VM losing access to its disks or any unsupported activities that can impair Witnesses such as SvMotion or taking snapshots of the Witness VMs disks? I would start with validating the VM isn't running on snapshots and rebooting it then checking does the issue persist, if it does then sure you can troubleshoot further or you can just replace this Witness with a new healthy one.

View solution in original post

2 Replies
TheBobkin
Champion
Champion
Jump to solution

@MarcLaf  the health check you mentioned triggers if the configuration information of a vsan-tagged vmk for a node isn't returned to the health check - this can occur due none being tagged or some other issue preventing this from returning to the query in a timely manner. From the other points you mentioned this looks to be the latter.


"A general system error occurred: Unable to load module /usr/lib/vmware/vmkmod/cmmds: Busy" is a quite severe sign of general unhealthiness of this Witness - was this health check preceded by any events such as Witness VM losing access to its disks or any unsupported activities that can impair Witnesses such as SvMotion or taking snapshots of the Witness VMs disks? I would start with validating the VM isn't running on snapshots and rebooting it then checking does the issue persist, if it does then sure you can troubleshoot further or you can just replace this Witness with a new healthy one.

MarcLaf
Enthusiast
Enthusiast
Jump to solution

@TheBobkin 

Thank you for the response. While I was working on it last week I did end up rebooting the witness (well first I tried placing it in Maintenance Mode but it just sat there for many hours not progressing). After a reboot I was able to successfully change the vSAN tag to the other vmknic which totally resolved the other health checks. I don't know what previously happened or when but I'm glad it's working now. I was prepared to deploy a new Witness as well as I originally forgot that was a valid option.

Tags (1)
Reply
0 Kudos