insearchof
Expert
Expert

VSAN Datastore ZERO

Jump to solution

VMware 6.5

Just install SSD drives on my ESXI Hosts and configured VSAN.

After claiming the drive on 4 hosts

The VSANDatastore was created on all the hosts but has Zero space available

Any reason why ??

Thank you

Tom

0 Kudos
1 Solution

Accepted Solutions
TheBobkin
VMware Employee
VMware Employee

Check whether the nodes have a non-zero vSAN decom state:

# cmmds-tool find -t NODE_DECOM_STATE

Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.

As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI  advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a  correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:

# vmkping -I vmkX -s 8972 -d <DestIP>

Bob

View solution in original post

0 Kudos
26 Replies
TheBobkin
VMware Employee
VMware Employee

Hello Tom,

By claimed SSDs do you mean that you have configured vSAN Disk-Groups?

If these have been created then other causes can be hosts in Maintenance Mode or the cluster being network-partitioned - vSAN Health check should inform further:

Cluster > Monitor > vSAN > Health

Bob

0 Kudos
insearchof
Expert
Expert

vsanhealth.PNG

Hi

I did not create a disk group although I see GOUP 1 on the display

The error above not helpful what is cluster error?

vsancap.PNG

0 Kudos
depping
Leadership
Leadership

Check the physical disks section to see if any disks were claimed, cause it seems something has failed. I would also recommend using the H5 interface if you have vSphere 6.7

0 Kudos
insearchof
Expert
Expert

Hello

I am running VMWARE 6.5 not 6.7

I believe all my claimed disks are H5  see images

vsandisk.PNG

vsandiskhd.PNG

vsandcerror.PNG

The error on the Data Center reflects that I do not have any capacity

The Disk Format version column shows 5 that's H5 you mentioned correct?

IF the disks are claimed which I did manually I do not understand why I do not have a capacity

Any other way to check command line ??

Thank you.

Tom

0 Kudos
matze007
Contributor
Contributor

Did you check the Firmware Version?

You can look here in the compatibility list:

VMware Compatibility Guide - vsan

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Check whether the nodes have a non-zero vSAN decom state:

# cmmds-tool find -t NODE_DECOM_STATE

Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.

As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI  advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a  correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:

# vmkping -I vmkX -s 8972 -d <DestIP>

Bob

View solution in original post

0 Kudos
MJMSRI
Enthusiast
Enthusiast

Not a very good test? But then you advised to use the same test and simply changed the command slightly from 9000 to 8972 and removed -c which is for count so the command runs for a set amount.

0 Kudos
TheBobkin
VMware Employee
VMware Employee

"simply changed the command slightly from 9000 to 8972"

Yes because -s 9000 -d won't pass on a functional Jumbo-frame enabled network (unless it has been raised to something higher than necessary on the swtich) - test it if you do not believe me.

And without -d it doesn't validate whether full Jumbo-frame can pass (which is what I assume you were advising to test as otherwise I don't know why you would specify a size value at all).

Bob

0 Kudos
insearchof
Expert
Expert

To All

This Host has this error warning and wondering if this is the cause

Also do I need to open the VSAN rule sets on the Hosts firewall settings?

I have another issue opened with this error  but thought it might be relevant

vsanissue.PNG

I ran this on one host that seems ok and the one in the picture above

[root@TGCSESXI-9:~]
cmmds-tool find -t NODE_DECOM_STATE

owner=598cfc1b-c9e1-d832-d6a2-f04da207add3(Health:
Healthy) uuid=598cfc1b-c9e1-d                                                                                      
832-d6a2-f04da207add3 type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5c69a839-08c6-a636-c148-782bcb054c38(Health:
Healthy) uuid=5c69a839-08c6-a                                                                                      
636-c148-782bcb054c38 type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5a598968-4aa6-8b4a-0925-782bcb1399f7(Health:
Healthy) uuid=5a598968-4aa6-8                                                                                      
b4a-0925-782bcb1399f7 type=NODE_DECOM_STATE rev=5 minHostVer=0  [content = (i0 i                                                                                      
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5710250b-cfcf-67fc-9dc0-a4badb376d3f(Health:
Healthy) uuid=5710250b-cfcf-6                                                                                      
7fc-9dc0-a4badb376d3f type=NODE_DECOM_STATE rev=2 minHostVer=0  [content = (i0 i                                                                                      
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

Host with Issue

[root@TGCSESXI-4:~]
cmmds-tool find -t NODE_DECOM_STATE

[root@TGCSESXI-4:~]

Thank you

0 Kudos
insearchof
Expert
Expert

[root@TGCSESXI-4:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-4:~]

???

I tried on another host with same results

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-9:~]

0 Kudos
daphnissov
Immortal
Immortal

These messages indicate you don't have jumbo frames configured properly on your external switching infrastructure.

0 Kudos
insearchof
Expert
Expert

How do I check that?

Where do I do that also?

On the Network adapters on the ESXI host?

0 Kudos
daphnissov
Immortal
Immortal

You need to check everything end-to-end including the vswitch where these vmkernel ports are connected and your physical switches.

0 Kudos
insearchof
Expert
Expert

Hello

I changed the MTU to 9000 on all Hosts vswitch and vmkernal adapters.

I saw Update vSAN configuration in recent tasks

And then the vsandatastore had 6.2 GB

Then I thought wow it is working

After I updated the last host then the Vsandatastore went back to ZERO

What else can we check

Something is wrong here?

0 Kudos
daphnissov
Immortal
Immortal

Does your vmkping work now with the larger packet size? If no, you have not fixed the problem completely.

0 Kudos
insearchof
Expert
Expert

[root@TGCSESXI-4:~] esxcfg-nics -l
Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description
vmnic0  0000:02:00.0 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b5 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic1  0000:02:00.1 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b6 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic2  0000:04:00.0 bnx2        Up   1000Mbps   Full   00:26:55:87:67:dc 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
vmnic3  0000:04:00.1 bnx2        Up   1000Mbps   Full   00:26:55:87:67:de 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
[root@TGCSESXI-4:~] esxcfg-vmknic -l
Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
vmk0       Management Network                      IPv4      10.2.8.75                               255.255.252.0   10.2.11.255     d4:ae:52:77:6c:b5 9000    65535     true    STATIC              defaultTcpipStack
vmk0       Management Network                      IPv6      fe80::d6ae:52ff:fe77:6cb5               64                              d4:ae:52:77:6c:b5 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk1       VMkernel                                IPv4      10.2.8.73                               255.255.252.0   10.2.11.255     00:50:56:6a:33:1a 9000    65535     true    STATIC              defaultTcpipStack
vmk1       VMkernel                                IPv6      fe80::250:56ff:fe6a:331a                64                              00:50:56:6a:33:1a 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk2       Vmotion                                 IPv4      10.2.8.92                               255.255.252.0   10.2.11.255     00:50:56:63:b4:8d 9000    65535     true    STATIC              defaultTcpipStack
vmk2       Vmotion                                 IPv6      fe80::250:56ff:fe63:b48d                64                              00:50:56:63:b4:8d 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk3       Black Armor                             IPv4      10.2.8.84                               255.255.252.0   10.2.11.255     00:50:56:66:bb:ab 9000    65535     true    STATIC              defaultTcpipStack
vmk3       Black Armor                             IPv6      fe80::250:56ff:fe66:bbab                64                              00:50:56:66:bb:ab 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
[root@TGCSESXI-4:~] esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch0         1792        12          128               9000    vmnic0,vmnic1

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VM Network            0        3           vmnic0,vmnic1
  Black Armor           0        1           vmnic0,vmnic1
  Vmotion               0        1           vmnic0,vmnic1
  VMkernel              0        1           vmnic0,vmnic1
  Management Network    0        1           vmnic0,vmnic1

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch1         1792        4           128               9000    vmnic2

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VMkernal-iSCSI-1      0        1           vmnic2

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch2         1792        3           128               9000    vmnic3

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VMkernal-iSCSI-2      0        0           vmnic3

[root@TGCSESXI-4:~]

I just changed all vswitchs and vmkernal adapters to MTU 9000 as you can see

The ping

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 1472 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 1472 data bytes
1480 bytes from 10.2.8.62: icmp_seq=0 ttl=64 time=0.312 ms
1480 bytes from 10.2.8.62: icmp_seq=1 ttl=64 time=0.249 ms
1480 bytes from 10.2.8.62: icmp_seq=2 ttl=64 time=0.344 ms

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.249/0.302/0.344 ms
[root@TGCSESXI-9:~]

Hosts TGCSESXI-4 still shows error after changing MTU to 9000 and I also changed the MTU to 9000 to all hosts in the cluster.

0 Kudos
daphnissov
Immortal
Immortal

You're still pinging with a small packet size, so you can't evaluate your MTU change. You need to raise the transmit size to something over 8,000.

0 Kudos
insearchof
Expert
Expert

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8000 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8000 data bytes

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

Is that what you mean?

this did not produce any results

0 Kudos
daphnissov
Immortal
Immortal

Yes, so you still don't have MTU set properly, most likely on your external networking infrastructure (which is outside the scope of vSphere and vSAN). So I'll say again:  You need to check and update the MTU in your switching/routing infra in order to have this work end-to-end.

0 Kudos