vSAN1

 View Only
Expand all | Collapse all

VSAN Datastore ZERO

  • 1.  VSAN Datastore ZERO

    Posted Jun 15, 2019 03:03 AM

    VMware 6.5

    Just install SSD drives on my ESXI Hosts and configured VSAN.

    After claiming the drive on 4 hosts

    The VSANDatastore was created on all the hosts but has Zero space available

    Any reason why ??

    Thank you

    Tom



  • 2.  RE: VSAN Datastore ZERO

    Posted Jun 15, 2019 07:47 AM

    Hello Tom,

    By claimed SSDs do you mean that you have configured vSAN Disk-Groups?

    If these have been created then other causes can be hosts in Maintenance Mode or the cluster being network-partitioned - vSAN Health check should inform further:

    Cluster > Monitor > vSAN > Health

    Bob



  • 3.  RE: VSAN Datastore ZERO

    Posted Jun 15, 2019 01:05 PM

    Hi

    I did not create a disk group although I see GOUP 1 on the display

    The error above not helpful what is cluster error?



  • 4.  RE: VSAN Datastore ZERO

    Broadcom Employee
    Posted Jun 17, 2019 07:33 AM

    Check the physical disks section to see if any disks were claimed, cause it seems something has failed. I would also recommend using the H5 interface if you have vSphere 6.7



  • 5.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 10:45 AM

    Hello

    I am running VMWARE 6.5 not 6.7

    I believe all my claimed disks are H5  see images

    The error on the Data Center reflects that I do not have any capacity

    The Disk Format version column shows 5 that's H5 you mentioned correct?

    IF the disks are claimed which I did manually I do not understand why I do not have a capacity

    Any other way to check command line ??

    Thank you.

    Tom



  • 6.  RE: VSAN Datastore ZERO
    Best Answer

    Posted Jun 17, 2019 01:03 PM

    Check whether the nodes have a non-zero vSAN decom state:

    # cmmds-tool find -t NODE_DECOM_STATE

    Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.

    As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI  advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a  correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:

    # vmkping -I vmkX -s 8972 -d <DestIP>

    Bob



  • 7.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 01:24 PM

    Not a very good test? But then you advised to use the same test and simply changed the command slightly from 9000 to 8972 and removed -c which is for count so the command runs for a set amount.



  • 8.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 01:47 PM

    "simply changed the command slightly from 9000 to 8972"

    Yes because -s 9000 -d won't pass on a functional Jumbo-frame enabled network (unless it has been raised to something higher than necessary on the swtich) - test it if you do not believe me.

    And without -d it doesn't validate whether full Jumbo-frame can pass (which is what I assume you were advising to test as otherwise I don't know why you would specify a size value at all).

    Bob



  • 9.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 04:31 PM

    [root@TGCSESXI-4:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
    PING 10.2.8.62 (10.2.8.62): 8972 data bytes
    sendto() failed (Message too long)
    sendto() failed (Message too long)
    sendto() failed (Message too long)

    --- 10.2.8.62 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss
    [root@TGCSESXI-4:~]

    ???

    I tried on another host with same results

    [root@TGCSESXI-9:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
    PING 10.2.8.62 (10.2.8.62): 8972 data bytes
    sendto() failed (Message too long)
    sendto() failed (Message too long)
    sendto() failed (Message too long)

    --- 10.2.8.62 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss
    [root@TGCSESXI-9:~]



  • 10.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 04:34 PM

    These messages indicate you don't have jumbo frames configured properly on your external switching infrastructure.



  • 11.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 04:06 PM

    How do I check that?

    Where do I do that also?

    On the Network adapters on the ESXI host?



  • 12.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 04:09 PM

    You need to check everything end-to-end including the vswitch where these vmkernel ports are connected and your physical switches.



  • 13.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:13 PM

    Hello

    I changed the MTU to 9000 on all Hosts vswitch and vmkernal adapters.

    I saw Update vSAN configuration in recent tasks

    And then the vsandatastore had 6.2 GB

    Then I thought wow it is working

    After I updated the last host then the Vsandatastore went back to ZERO

    What else can we check

    Something is wrong here?



  • 14.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:15 PM

    Does your vmkping work now with the larger packet size? If no, you have not fixed the problem completely.



  • 15.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:20 PM

    [root@TGCSESXI-4:~] esxcfg-nics -l
    Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description
    vmnic0  0000:02:00.0 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b5 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
    vmnic1  0000:02:00.1 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b6 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
    vmnic2  0000:04:00.0 bnx2        Up   1000Mbps   Full   00:26:55:87:67:dc 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
    vmnic3  0000:04:00.1 bnx2        Up   1000Mbps   Full   00:26:55:87:67:de 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
    [root@TGCSESXI-4:~] esxcfg-vmknic -l
    Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
    vmk0       Management Network                      IPv4      10.2.8.75                               255.255.252.0   10.2.11.255     d4:ae:52:77:6c:b5 9000    65535     true    STATIC              defaultTcpipStack
    vmk0       Management Network                      IPv6      fe80::d6ae:52ff:fe77:6cb5               64                              d4:ae:52:77:6c:b5 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk1       VMkernel                                IPv4      10.2.8.73                               255.255.252.0   10.2.11.255     00:50:56:6a:33:1a 9000    65535     true    STATIC              defaultTcpipStack
    vmk1       VMkernel                                IPv6      fe80::250:56ff:fe6a:331a                64                              00:50:56:6a:33:1a 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk2       Vmotion                                 IPv4      10.2.8.92                               255.255.252.0   10.2.11.255     00:50:56:63:b4:8d 9000    65535     true    STATIC              defaultTcpipStack
    vmk2       Vmotion                                 IPv6      fe80::250:56ff:fe63:b48d                64                              00:50:56:63:b4:8d 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
    vmk3       Black Armor                             IPv4      10.2.8.84                               255.255.252.0   10.2.11.255     00:50:56:66:bb:ab 9000    65535     true    STATIC              defaultTcpipStack
    vmk3       Black Armor                             IPv6      fe80::250:56ff:fe66:bbab                64                              00:50:56:66:bb:ab 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
    [root@TGCSESXI-4:~] esxcfg-vswitch -l
    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
    vSwitch0         1792        12          128               9000    vmnic0,vmnic1

      PortGroup Name        VLAN ID  Used Ports  Uplinks
      VM Network            0        3           vmnic0,vmnic1
      Black Armor           0        1           vmnic0,vmnic1
      Vmotion               0        1           vmnic0,vmnic1
      VMkernel              0        1           vmnic0,vmnic1
      Management Network    0        1           vmnic0,vmnic1

    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
    vSwitch1         1792        4           128               9000    vmnic2

      PortGroup Name        VLAN ID  Used Ports  Uplinks
      VMkernal-iSCSI-1      0        1           vmnic2

    Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
    vSwitch2         1792        3           128               9000    vmnic3

      PortGroup Name        VLAN ID  Used Ports  Uplinks
      VMkernal-iSCSI-2      0        0           vmnic3

    [root@TGCSESXI-4:~]

    I just changed all vswitchs and vmkernal adapters to MTU 9000 as you can see

    The ping

    [root@TGCSESXI-9:~] vmkping -I vmk2 -s 1472 -d 10.2.8.62
    PING 10.2.8.62 (10.2.8.62): 1472 data bytes
    1480 bytes from 10.2.8.62: icmp_seq=0 ttl=64 time=0.312 ms
    1480 bytes from 10.2.8.62: icmp_seq=1 ttl=64 time=0.249 ms
    1480 bytes from 10.2.8.62: icmp_seq=2 ttl=64 time=0.344 ms

    --- 10.2.8.62 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.249/0.302/0.344 ms
    [root@TGCSESXI-9:~]

    Hosts TGCSESXI-4 still shows error after changing MTU to 9000 and I also changed the MTU to 9000 to all hosts in the cluster.



  • 16.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:29 PM

    You're still pinging with a small packet size, so you can't evaluate your MTU change. You need to raise the transmit size to something over 8,000.



  • 17.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:48 PM

    [root@TGCSESXI-9:~] vmkping -I vmk2 -s 8000 -d 10.2.8.62
    PING 10.2.8.62 (10.2.8.62): 8000 data bytes

    --- 10.2.8.62 ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

    Is that what you mean?

    this did not produce any results



  • 18.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 06:50 PM

    Yes, so you still don't have MTU set properly, most likely on your external networking infrastructure (which is outside the scope of vSphere and vSAN). So I'll say again:  You need to check and update the MTU in your switching/routing infra in order to have this work end-to-end.



  • 19.  RE: VSAN Datastore ZERO

    Posted Jun 19, 2019 07:18 PM

    insearchof

    Please read and understand what I said:

    "You still haven't indicated if you are even using Jumbo frames (9000 MTU) which I did ask twice, so the above isn't actually an indication of whether you have an issue with MTU mismatch.

    Check what MTU is actually configured on the problem node(s) with:

    # esxcfg-nics -l

    # esxcfg-vmknic -l

    # esxcfg-vswitch -l

    If it is 1500 throughout then you are not using Jumbo frames and should be testing connectivity (e.g. from node TGCSESXI-9) using:

    # vmkping -I vmk2 -s 1472 -d 10.2.8.62

    Bob"

    I did not tell you to make random changes to the MTU in your environment but merely to confirm what MTU you had intended to be configured and was supported by the environment - just an FYI, making the same changes you made in a functional 1500 MTU environment would likely cause a full vSAN network outage.

    This is why daphnissov​ said you need to check it "end-to-end" he meant you need to confirm (and understand the implications of) the MTU supported and configured in the environment.

    Think of it this way: having MTU set to 9000 on the vmk/vswitch but only 1500 supported/configured on the physical switch(es) the packets are going to be fragmented and it won't work in any stable manner. Conversely if you had 1500 set on the vmk/vswitch but 9000 on the physical switch(es) there would be no issue as 1500 bytes packages can fit in this with no issues. You look to have configured the former here (as you still can't send large packets) - if you aren't aware/sure of what the switch settings are then configure everything as 1500 and test whether the network is stable (e.g. vmkping vmkX -s 1472 -d -i 0.1 -c 100 <Dest-IP>).

    Also, please stop replying to the same answers for the same problem in 2 different threads - this is pointless and only adds to the confusion - I would strongly advise you read the links from daphnissov signature, it should  really help you help yourself (and not just when asking strangers for help on the internet but also when approaching problems and how to scope and progress issues productively.

    Edit: corrected command syntax

    Bob



  • 20.  RE: VSAN Datastore ZERO

    Posted Jun 20, 2019 12:34 AM

    Guys

    This is crazy and upsetting,

    I changed the MTU to 9000 on my Cisco switch where all the ESXI hosts are connected.

    After reloading the switch to my surprise the VSANDatastore had 6.8 TB of disk space.

    When I come home to check the ESXI hosts and there datastores it was still there. Great news

    I went out for dinner and when I came back got back on and now there are a ZERO again.

    What is up with this?

    very frustrated

    Please help



  • 21.  RE: VSAN Datastore ZERO

    Posted Jun 20, 2019 12:54 AM

    Have you read one word of what TheBobkin​ has told you? You certainly don't seem to have despite his (and my) best efforts. You really need to back-track through this post (and your, for whatever reason, duplicate post) and validate the things that have been asked of you. If you're not going to take the advice and provide the information requested, we cannot help you here. So I'm out of this one...



  • 22.  RE: VSAN Datastore ZERO

    Posted Jun 20, 2019 12:57 AM

    It came back on

    So why would it disappear then come back?



  • 23.  RE: VSAN Datastore ZERO

    Posted Jun 20, 2019 08:22 AM

    So why would it disappear then come back?

    Probably because you have configured something wrong and/or because your hardware is not supported for ESXi 6.5 and vSAN 6.5/6.6... This can lead to unpredictable and weird behavior.

    But the others have already written this in several posts and you are either unwilling to understand or you lack the technical understanding to implement these suggestions. So, how should we help you? You can write as often as you want that it doesn't work, but that way you won't solve the problem and it's not clear if it can be solved at all (because the hardware isn't supported).



  • 24.  RE: VSAN Datastore ZERO

    Posted Jun 20, 2019 09:27 AM

    Looks like it is holding steady now.

    On the Cluster I I have this message

    VSAN Health Alarm ESXI-VSAN SERVICE Installation

    I can reset it to green  but is there something I can check.,

    Thank you,



  • 25.  RE: VSAN Datastore ZERO

    Posted Jun 22, 2019 03:06 AM

    Guys

    Very puzzled

    I came home after being away for a few days and found the VSAN Data store is back to Zero.

    What else can I check ?



  • 26.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 03:29 PM

    To All

    This Host has this error warning and wondering if this is the cause

    Also do I need to open the VSAN rule sets on the Hosts firewall settings?

    I have another issue opened with this error  but thought it might be relevant

    I ran this on one host that seems ok and the one in the picture above

    [root@TGCSESXI-9:~]
    cmmds-tool find -t NODE_DECOM_STATE

    owner=598cfc1b-c9e1-d832-d6a2-f04da207add3(Health:
    Healthy) uuid=598cfc1b-c9e1-d                                                                                      
    832-d6a2-f04da207add3 type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
    0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

    owner=5c69a839-08c6-a636-c148-782bcb054c38(Health:
    Healthy) uuid=5c69a839-08c6-a                                                                                      
    636-c148-782bcb054c38 type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
    0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

    owner=5a598968-4aa6-8b4a-0925-782bcb1399f7(Health:
    Healthy) uuid=5a598968-4aa6-8                                                                                      
    b4a-0925-782bcb1399f7 type=NODE_DECOM_STATE rev=5 minHostVer=0  [content = (i0 i                                                                                      
    0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

    owner=5710250b-cfcf-67fc-9dc0-a4badb376d3f(Health:
    Healthy) uuid=5710250b-cfcf-6                                                                                      
    7fc-9dc0-a4badb376d3f type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
    0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

    Host with Issue

    [root@TGCSESXI-4:~]
    cmmds-tool find -t NODE_DECOM_STATE

    [root@TGCSESXI-4:~]

    Thank you



  • 27.  RE: VSAN Datastore ZERO

    Posted Jun 17, 2019 12:59 PM

    Did you check the Firmware Version?

    You can look here in the compatibility list:

    VMware Compatibility Guide - vsan