vSAN1

 View Only
Expand all | Collapse all

2-Node ROBO Cluster does not deploy correctly

  • 1.  2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 06:35 PM

    We are trying to roll out a 2node VSAN cluster at a remote office. All networking is in place and we can ping between VSAN nodes 1 and 2 and the VSAN witness node at our datacenter(over the VSAN networks defined at each site). The static route have been entered correctly. When we enable VSAN the step " Convert to stretched cluster"  fails and the error is "Failed to add witness host to a stretched cluster." VSAN then continues to deploy and finished but no witness appliance is present and we therefore cannot write to the VSAN datastore. What are we missing?



  • 2.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 07:28 PM

    Hello,

    Have you tested ping from and to the specific interface used for vSAN traffic not just general ping?

    Check if any hosts are forming cluster:

    From one data-node:

    # esxcli vsan cluster new

    # esxcli vsan cluster get

    From the other data-node:

    # esxcli vsan cluster join -u <Sub-Cluster UUID of Cluster>

    See if you can add the witness manually via the CLI in case there is an issue with the client/vCenter:

    Get the Sub-Cluster UUID from one of the other nodes using

    # esxcli vsan cluster get

    Then from Witness Appliance:

    # esxcli vsan cluster join -u <Sub-Cluster UUID of Cluster> -t -p <Preferred Fault Domain>

    Bob

    -o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-



  • 3.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 07:33 PM

    Yes we canping from/to each VSAN interface on both data nodes and the witness node.



  • 4.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 08:35 PM

    All of these commands were run without issue. The GUI indicates VSAN is disabled on the cluster



  • 5.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 08:57 PM

    Hello,

    "All of these commands were run without issue. The GUI indicates VSAN is disabled on the cluster"

    Please ignore the GUI for a moment.

    After running the commands to add the Witness Appliance manually, what does the # esxcli vsan cluster get output show?

    Is it added to the cluster and as Witness and does the cluster show as 3 members?

    If not then check if the two data-nodes see each other in cluster using the same command.

    Bob

    -o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-



  • 6.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 09:10 PM

    Witness output:

    Cluster Information

       Enabled: true

       Current Local Time: 2017-05-06T21:09:20Z

       Local Node UUID: 590e1b0c-dd93-bc9b-193d-0050568c1a27

       Local Node Type: WITNESS

       Local Node State: STANDALONE

       Local Node Health State: HEALTHY

       Sub-Cluster Master UUID:

       Sub-Cluster Backup UUID:

       Sub-Cluster UUID: 522a5bb4-18a7-d56a-e361-71e68215e7ac

       Sub-Cluster Membership Entry Revision: 0

       Sub-Cluster Member Count: 1

       Sub-Cluster Member UUIDs: 590e1b0c-dd93-bc9b-193d-0050568c1a27

       Sub-Cluster Membership UUID: 00000000-0000-0000-0000-000000000000

    Data Node 1 Output:

    Cluster Information

       Enabled: true

       Current Local Time: 2017-05-06T21:09:51Z

       Local Node UUID: 58ecd598-9868-2fa2-5353-1402ec9233d8

       Local Node Type: NORMAL

       Local Node State: MASTER

       Local Node Health State: HEALTHY

       Sub-Cluster Master UUID: 58ecd598-9868-2fa2-5353-1402ec9233d8

       Sub-Cluster Backup UUID: 58ecd62f-08b0-3128-3533-1402ec923318

       Sub-Cluster UUID: 522a5bb4-18a7-d56a-e361-71e68215e7ac

       Sub-Cluster Membership Entry Revision: 1

       Sub-Cluster Member Count: 2

       Sub-Cluster Member UUIDs: 58ecd598-9868-2fa2-5353-1402ec9233d8, 58ecd62f-08b0-3128-3533-1402ec923318

       Sub-Cluster Membership UUID: 9a320e59-5282-8a7c-52bb-1402ec9233d8

    Data Node 2 Output:

    Cluster Information

       Enabled: true

       Current Local Time: 2017-05-06T21:10:14Z

       Local Node UUID: 58ecd62f-08b0-3128-3533-1402ec923318

       Local Node Type: NORMAL

       Local Node State: BACKUP

       Local Node Health State: HEALTHY

       Sub-Cluster Master UUID: 58ecd598-9868-2fa2-5353-1402ec9233d8

       Sub-Cluster Backup UUID: 58ecd62f-08b0-3128-3533-1402ec923318

       Sub-Cluster UUID: 522a5bb4-18a7-d56a-e361-71e68215e7ac

       Sub-Cluster Membership Entry Revision: 1

       Sub-Cluster Member Count: 2

       Sub-Cluster Member UUIDs: 58ecd598-9868-2fa2-5353-1402ec9233d8, 58ecd62f-08b0-3128-3533-1402ec923318

       Sub-Cluster Membership UUID: 9a320e59-5282-8a7c-52bb-1402ec9233d8



  • 7.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 11:04 PM

    Okay, this looks like a standard network partition.

    Witness Appliance cannot join the cluster due to this.

    Regarding the Witness Appliance networking:

    Is it in the same subnet and VLAN as the data nodes?

    Are you using Jumbo frames (9000 MTU) anywhere here?

    Is there any configurations on the switch(es) that might be blocking the traffic to the Witness Appliance?

    Test some different packet sizes to check this, not just 8972 as per the kb, check 1500 in case some header is being added at some point:

    https://kb.vmware.com/kb/1003728

    Bob

    -o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-



  • 8.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 06, 2017 11:31 PM

    OK.

    The Witness VSAN network is a different VLAN and subnet. Layer 3 between VSAN witness and Data nodes.

    I'll check on the Jumbo Frames.



  • 9.  RE: 2-Node ROBO Cluster does not deploy correctly

    Posted May 07, 2017 12:37 AM

    No Jumbo frames. vmkping -d -s 1500 xxx.xxx.xxx.xxx fails both ways with sendto() failed (Message too long). Would this be expected behavior?



  • 10.  RE: 2-Node ROBO Cluster does not deploy correctly

    Broadcom Employee
    Posted May 08, 2017 11:23 AM

    you sure the connection between data and witness allows for packets of 1500? I've seen this being the problem with other customers as well. try a ping with 1472. it could be your using IPSEC for instance and the packet size should be smaller. you will need to configure the VMkernel interfaces with the correct MTU.



  • 11.  RE: 2-Node ROBO Cluster does not deploy correctly
    Best Answer

    Posted May 08, 2017 12:24 PM

    Ok we got the issue resolved.

    As I mentioned the convert to stretched cluster step failed during the creation process. If I tried to manually add a witness afterward, it would state that the witness was already used for another cluster. I ran the esxcli vsan cluster leave    command on the witness and then tired to add the witness again manually. It worked and I finally saw the witnesses disk group and could write to the VSAN datastore.

    The big question is why does the VSAN creation wizard fail on the convert to stretched step. Has this been seen before,

    Thanks for the responses.