VMware Cloud Community
TRottig
Enthusiast
Enthusiast
Jump to solution

Witness host fails to join cluster

Hi,

I am trying to setup a vsan robo cluster. The cluster creation itself is successful, but the witness host is not able to join the cluster for unknown reasons.

On the GUI i get no error message on creation. The only indication that something is wrong is the on disk format version

pastedImage_0.png

When I now run rvc (vsan.health.health_summary), or Vsan Health Check or try to configure the stretched cluster the only result is

pastedImage_3.png

or SystemError: ascii for rvc.

I have also tried to join the witness client manually to the cluster (or create the cluster completely via esxcli) and the result is always that the witness host is in Standalone Mode.

Cluster created by Wizard:

Cluster Information

   Enabled: true

   Current Local Time: 2016-08-19T07:17:10Z

   Local Node UUID: 5752ae98-d678-8a07-e68a-0cc47a696ea0

   Local Node Type: NORMAL

   Local Node State: BACKUP

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 581ba10d-8210-0304-3b56-0025905db4d9

   Sub-Cluster Backup UUID: 5752ae98-d678-8a07-e68a-0cc47a696ea0

   Sub-Cluster UUID: 52b7b85f-65f6-1f99-88aa-423148eab28d

   Sub-Cluster Membership Entry Revision: 1

   Sub-Cluster Member Count: 2

   Sub-Cluster Member UUIDs: 581ba10d-8210-0304-3b56-0025905db4d9, 5752ae98-d678-8a07-e68a-0cc47a696ea0

   Sub-Cluster Membership UUID: 6eb0b657-1e87-c006-4dde-0025905db658

Witness after joining the Cluster manually

Cluster Information

   Enabled: true

   Current Local Time: 2016-08-19T07:17:34Z

   Local Node UUID: 570c0e4e-51bd-8980-05d4-001e6758faf3

   Local Node Type: WITNESS

   Local Node State: STANDALONE

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID:

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 52b7b85f-65f6-1f99-88aa-423148eab28d

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 570c0e4e-51bd-8980-05d4-001e6758faf3

   Sub-Cluster Membership UUID: 00000000-0000-0000-0000-000000000000

The witness host is on routed L3, all ports are open

Any idea what might be the issue?

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
TRottig
Enthusiast
Enthusiast
Jump to solution

Ok,

since I didn't have easy access to the Microsoft Language pack to change the server's language to English I instead deployed a new vCenter instance, moved the relevant nodes over and tried again ... success.

Not sure whether the language issue was cause or effect and whether the changed language or new vCenter instance was the ultimate fix but that language issue it sure looks like a bug.

Unfortunately I don't have support contract to raise it so if anybody is in the mood ...

Thanks.

View solution in original post

Reply
0 Kudos
3 Replies
TRottig
Enthusiast
Enthusiast
Jump to solution

So I have double checked all network connectivity, have stood up another box as witness host in the local datacenter, have tried various combinations of configurations, all to no avail.

For some reason the witness box(es) are not able to communicate with the vsan cluster ...

So I have tried checking the logs, but that's not really conclusive (to me):

Witness host.d:

2016-08-21T09:53:26.742Z info hostd[469C2B70] [Originator@6876 sub=Solo.Vmomi opID=1b37c58f-6785-11e6-6b-31ee user=:com.vmware.vsan.health] Activation [N5Vmomi10ActivationE:0x1f448330] : Invoke done [RetrieveCapability] on [vim.host.VsanDiskManagementSystem:ha-vsan-disk-management-system]

2016-08-21T09:53:26.742Z info hostd[469C2B70] [Originator@6876 sub=Solo.Vmomi opID=1b37c58f-6785-11e6-6b-31ee user=:com.vmware.vsan.health] Throw vim.fault.NotAuthenticated

2016-08-21T09:53:26.742Z info hostd[469C2B70] [Originator@6876 sub=Solo.Vmomi opID=1b37c58f-6785-11e6-6b-31ee user=:com.vmware.vsan.health] Result:

--> (vim.fault.NotAuthenticated) {

-->    faultCause = (vmodl.MethodFault) null,

-->    object = 'vim.host.VsanDiskManagementSystem:ha-vsan-disk-management-system',

-->    privilegeId = "System.Read",

-->    msg = ""

--> }

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

Accepted password for user vpxuser from 192.168.119.2

2016-08-21T09:53:26.930Z info hostd[47580B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=1b37c58f-6785-11e6-6b-31f0 user=:com.vmware.vsan.health] Event 338 : User vpxuser@192.168.119.2 logged in as VMware-client/5.1.0

2016-08-21T09:53:26.958Z info hostd[FFCC1B70] [Originator@6876 sub=SysCommandPosix opID=1b37c58f-6785-11e6-6b-31f4 user=vpxuser:com.vmware.vsan.health] ForkExec(/usr/bin/sh)  255708

2016-08-21T09:53:48.317Z info hostd[47580B70] [Originator@6876 sub=Solo.Vmomi opID=9c303143 user=root] Activation [N5Vmomi10ActivationE:0x464759b8] : Invoke done [waitForUpdatesEx] on [vmodl.query.PropertyCollector:ha-property-collector]

2016-08-21T09:53:48.317Z verbose hostd[47580B70] [Originator@6876 sub=Solo.Vmomi opID=9c303143 user=root] Arg version:

--> "71"

2016-08-21T09:53:48.317Z verbose hostd[47580B70] [Originator@6876 sub=Solo.Vmomi opID=9c303143 user=root] Arg options:

--> (vmodl.query.PropertyCollector.WaitOptions) {

-->    maxWaitSeconds = 600,

-->    maxObjectUpdates = 100

--> }

2016-08-21T09:53:48.317Z info hostd[47580B70] [Originator@6876 sub=Solo.Vmomi opID=9c303143 user=root] Throw vmodl.fault.RequestCanceled

2016-08-21T09:53:48.317Z info hostd[47580B70] [Originator@6876 sub=Solo.Vmomi opID=9c303143 user=root] Result:

--> (vmodl.fault.RequestCanceled) {

-->    faultCause = (vmodl.MethodFault) null,

-->    msg = ""

--> }

2016-08-21T09:53:48.317Z error hostd[FFCC1B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x46891fbc, h:31, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

2016-08-21T09:54:14.351Z error hostd[46940B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: <io_obj p:0x47218c74, h:-1, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>, error: N7Vmacore16TimeoutExceptionE(Operation timed out)

2016-08-21T09:54:24.177Z info hostd[469C2B70] [Originator@6876 sub=Hostsvc.VsanSystemProvider opID=9c303204] Complete, runtime info: (vim.vsan.host.VsanRuntimeInfo) {

-->    accessGenNo = 0,

--> }

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

2016-08-21T09:55:55.368Z info hostd[469C2B70] [Originator@6876 sub=Hostsvc.VsanSystemProvider opID=9c30322b] Complete, runtime info: (vim.vsan.host.VsanRuntimeInfo) {

-->    accessGenNo = 0,

--> }

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):

Those messages continue in the log.

I am quite lost here and not sure what to check next, so any pointers at all would be greatly appreciated Smiley Happy

Reply
0 Kudos
TRottig
Enthusiast
Enthusiast
Jump to solution

Well I finally managed to find the log file containing the 'ascii' error...

Looks like there is a problem with localization...

vmware-vsan-health-service.log

2016-08-23T19:48:05.96Z DEBUG vsan-health[Thread-19] [VsanHealthServer::do_GET] In do_GET: ('127.0.0.1', 53153)

2016-08-23T19:48:05.96Z WARNING vsan-health[Thread-19] [VsanHealthServer::do_GET] do_GET: isStringResponse = True

2016-08-23T19:48:05.96Z INFO vsan-health[Thread-19] [VsanHealthServer::log_message] ('127.0.0.1', 53153) - - "GET /vsanHealth/health HTTP/1.1" 200 -

2016-08-23T19:48:05.98Z DEBUG vsan-health[Thread-19] [VsanHealthServer::do_GET] Done do_Get: ('127.0.0.1', 53153) (took 0.0)

2016-08-23T19:48:05.312Z DEBUG vsan-health[Thread-53] [VsanHealthServer::do_POST] In do_POST: ('127.0.0.1', 64474)

2016-08-23T19:48:05.312Z INFO vsan-health[Thread-53] [VsanHealthServer::InvokeHandler] request = <__main__._MessageBodyReader instance at 0x000000000635F3C8>

2016-08-23T19:48:05.313Z DEBUG vsan-health[Thread-53] [VsanHealthServer::DebugLogFilteredXML] <?xml version='1.0' encoding='UTF-8'?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Header><Cookie xsi:type="string">"593016c5eed3def9754774eab203c4b9558f150c"</Cookie></soapenv:Header><soapenv:Body><VsanQueryVcClusterHealthSummary xmlns="urn:internalvim25" xmlns:internalvim25="urn:internalvim25"><_this type="VsanVcClusterHealthSystem">vsan-cluster-health-system</_this><cluster type="ClusterComputeResource" serverGuid="f3b10871-a24e-44f6-9994-e20268e217c3">domain-c79</cluster><includeObjUuids>false</includeObjUuids><fields>objectHealth</fields><fetchFromCache>true</fetchFromCache></VsanQueryVcClusterHealthSummary></soapenv:Body></soapenv:Envelope>

2016-08-23T19:48:05.349Z WARNING vsan-health[8231f21e-696a-11e6] [VsanVcExtension::CheckCallerPriviledges] Privs: {'VirtualMachine.Inventory.Create': True, 'System.View': True, 'System.Read': True, 'VirtualMachine.Inventory.Delete': True}

2016-08-23T19:48:05.381Z WARNING vsan-health[8231f21e-696a-11e6] [VsanPyVmomiProfiler::InvokeAccessor] Invoke: mo=ServiceInstance, info=content

2016-08-23T19:48:05.391Z WARNING vsan-health[8231f21e-696a-11e6] [VsanPyVmomiProfiler::InvokeAccessor] Invoke: mo=host-21, info=configManager

2016-08-23T19:48:05.392Z ERROR vsan-health[8231f21e-696a-11e6] [VsanVcStretchedClusterSystemImpl::GetWitnessHosts] Failed to get witness host info for a stretched cluster: (vmodl.fault.ManagedObjectNotFound) {

   dynamicType = <unset>,

   dynamicProperty = (vmodl.DynamicProperty) [],

   msg = u'Das Objekt wurde bereits gel\xf6scht oder noch nicht vollst\xe4ndig erstellt',

   faultCause = <unset>,

   faultMessage = (vmodl.LocalizableMessage) [],

   obj = 'vim.HostSystem:host-21'

}

2016-08-23T19:48:05.392Z ERROR vsan-health[8231f21e-696a-11e6] [VsanVcStretchedClusterSystemImpl::GetWitnessHosts] (vmodl.fault.ManagedObjectNotFound) {

   dynamicType = <unset>,

   dynamicProperty = (vmodl.DynamicProperty) [],

   msg = u'Das Objekt wurde bereits gel\xf6scht oder noch nicht vollst\xe4ndig erstellt',

   faultCause = <unset>,

   faultMessage = (vmodl.LocalizableMessage) [],

   obj = 'vim.HostSystem:host-21'

}

Traceback (most recent call last):

  File "C:\Program Files\VMware\vCenter Server\vsan-health\pyMoVsan\VsanVcStretchedClusterSystemImpl.py", line 1398, in GetWitnessHosts

    vsanHostConfig = hostMo.configManager.vsanSystem.config

  File "C:\Program Files\VMware\vCenter Server\python-modules\pyVmomi\VmomiSupport.py", line 537, in __call__

    return self.f(*args, **kwargs)

  File "C:\Program Files\VMware\vCenter Server\python-modules\pyVmomi\VmomiSupport.py", line 360, in _InvokeAccessor

    return self._stub.InvokeAccessor(self, info)

  File "C:\Program Files\VMware\vCenter Server\vsan-health\pyMoVsan\VsanPyVmomiProfiler.py", line 105, in InvokeAccessor

    out = self._stub.InvokeAccessor(mo, info)

  File "C:\Program Files\VMware\vCenter Server\python-modules\pyVmomi\StubAdapterAccessorImpl.py", line 24, in InvokeAccessor

    return self.InvokeMethod(mo, info, (prop,))

  File "C:\Program Files\VMware\vCenter Server\python-modules\pyVmomi\SoapAdapter.py", line 1273, in InvokeMethod

    raise obj # pylint: disable-msg=E0702

vmodl.fault.ManagedObjectNotFound: (vmodl.fault.ManagedObjectNotFound) {

   dynamicType = <unset>,

   dynamicProperty = (vmodl.DynamicProperty) [],

   msg = u'Das Objekt wurde bereits gel\xf6scht oder noch nicht vollst\xe4ndig erstellt',

   faultCause = <unset>,

   faultMessage = (vmodl.LocalizableMessage) [],

   obj = 'vim.HostSystem:host-21'

}

2016-08-23T19:48:05.392Z ERROR vsan-health[Thread-53] [SoapHandler::_HandleRequest] ascii

2016-08-23T19:48:05.392Z ERROR vsan-health[Thread-53] [SoapHandler::_HandleRequest] Traceback (most recent call last):

  File "C:\Program Files\VMware\vCenter Server\vpxd\pyJack\SoapHandler.py", line 1289, in _HandleRequest

    return self._InvokeMethod(msg)

  File "C:\Program Files\VMware\vCenter Server\vpxd\pyJack\SoapHandler.py", line 1367, in _InvokeMethod

    message = ExceptionMsg(err)

  File "C:\Program Files\VMware\vCenter Server\vpxd\pyJack\SoapHandler.py", line 121, in ExceptionMsg

    return str(msg)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 28: ordinal not in range(128)

2016-08-23T19:48:05.394Z DEBUG vsan-health[Thread-53] [VsanHealthServer::DebugLogFilteredXML] <?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<soapenv:Body><soapenv:Fault><faultcode>ServerFaultCode</faultcode><faultstring>ascii</faultstring><detail><SystemErrorFault xsi:type="SystemError"><reason xmlns="urn:vim25">Runtime fault</reason></SystemErrorFault></detail></soapenv:Fault></soapenv:Body></soapenv:Envelope>

Reply
0 Kudos
TRottig
Enthusiast
Enthusiast
Jump to solution

Ok,

since I didn't have easy access to the Microsoft Language pack to change the server's language to English I instead deployed a new vCenter instance, moved the relevant nodes over and tried again ... success.

Not sure whether the language issue was cause or effect and whether the changed language or new vCenter instance was the ultimate fix but that language issue it sure looks like a bug.

Unfortunately I don't have support contract to raise it so if anybody is in the mood ...

Thanks.

Reply
0 Kudos