Solved: Re: VSAN Datastore ZERO

insearchof · ‎06-14-2019

VMware 6.5

Just install SSD drives on my ESXI Hosts and configured VSAN.

After claiming the drive on 4 hosts

The VSANDatastore was created on all the hosts but has Zero space available

Any reason why ??

Thank you

Tom

TheBobkin · ‎06-17-2019

Check whether the nodes have a non-zero vSAN decom state:

# cmmds-tool find -t NODE_DECOM_STATE

Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.

As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:

# vmkping -I vmkX -s 8972 -d <DestIP>

Bob

View solution in original post

TheBobkin · ‎06-15-2019

Hello Tom,

By claimed SSDs do you mean that you have configured vSAN Disk-Groups?

If these have been created then other causes can be hosts in Maintenance Mode or the cluster being network-partitioned - vSAN Health check should inform further:

Cluster > Monitor > vSAN > Health

Bob

insearchof · ‎06-15-2019

Hi

I did not create a disk group although I see GOUP 1 on the display

The error above not helpful what is cluster error?

depping · ‎06-17-2019

Check the physical disks section to see if any disks were claimed, cause it seems something has failed. I would also recommend using the H5 interface if you have vSphere 6.7

insearchof · ‎06-17-2019

Hello

I am running VMWARE 6.5 not 6.7

I believe all my claimed disks are H5 see images

The error on the Data Center reflects that I do not have any capacity

The Disk Format version column shows 5 that's H5 you mentioned correct?

IF the disks are claimed which I did manually I do not understand why I do not have a capacity

Any other way to check command line ??

Thank you.

Tom

matze007 · ‎06-17-2019

Did you check the Firmware Version?

You can look here in the compatibility list:

VMware Compatibility Guide - vsan

TheBobkin · ‎06-17-2019

Check whether the nodes have a non-zero vSAN decom state:

# cmmds-tool find -t NODE_DECOM_STATE

Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.

As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:

# vmkping -I vmkX -s 8972 -d <DestIP>

Bob

MJMSRI · ‎06-17-2019

Not a very good test? But then you advised to use the same test and simply changed the command slightly from 9000 to 8972 and removed -c which is for count so the command runs for a set amount.

TheBobkin · ‎06-17-2019

"simply changed the command slightly from 9000 to 8972"

Yes because -s 9000 -d won't pass on a functional Jumbo-frame enabled network (unless it has been raised to something higher than necessary on the swtich) - test it if you do not believe me.

And without -d it doesn't validate whether full Jumbo-frame can pass (which is what I assume you were advising to test as otherwise I don't know why you would specify a size value at all).

Bob

insearchof · ‎06-17-2019

To All

This Host has this error warning and wondering if this is the cause

Also do I need to open the VSAN rule sets on the Hosts firewall settings?

I have another issue opened with this error but thought it might be relevant

I ran this on one host that seems ok and the one in the picture above

[root@TGCSESXI-9:~]
cmmds-tool find -t NODE_DECOM_STATE

owner=598cfc1b-c9e1-d832-d6a2-f04da207add3(Health:
Healthy) uuid=598cfc1b-c9e1-d
832-d6a2-f04da207add3 type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5c69a839-08c6-a636-c148-782bcb054c38(Health:
Healthy) uuid=5c69a839-08c6-a
636-c148-782bcb054c38 type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5a598968-4aa6-8b4a-0925-782bcb1399f7(Health:
Healthy) uuid=5a598968-4aa6-8
b4a-0925-782bcb1399f7 type=NODE_DECOM_STATE rev=5 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

owner=5710250b-cfcf-67fc-9dc0-a4badb376d3f(Health:
Healthy) uuid=5710250b-cfcf-6
7fc-9dc0-a4badb376d3f type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)

Host with Issue

[root@TGCSESXI-4:~]
cmmds-tool find -t NODE_DECOM_STATE

[root@TGCSESXI-4:~]

Thank you

insearchof · ‎06-17-2019

[root@TGCSESXI-4:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-4:~]

???

I tried on another host with same results

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-9:~]

daphnissov · ‎06-17-2019

These messages indicate you don't have jumbo frames configured properly on your external switching infrastructure.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

insearchof · ‎06-19-2019

How do I check that?

Where do I do that also?

On the Network adapters on the ESXI host?

daphnissov · ‎06-19-2019

You need to check everything end-to-end including the vswitch where these vmkernel ports are connected and your physical switches.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

insearchof · ‎06-19-2019

Hello

I changed the MTU to 9000 on all Hosts vswitch and vmkernal adapters.

I saw Update vSAN configuration in recent tasks

And then the vsandatastore had 6.2 GB

Then I thought wow it is working

After I updated the last host then the Vsandatastore went back to ZERO

What else can we check

Something is wrong here?

daphnissov · ‎06-19-2019

Does your vmkping work now with the larger packet size? If no, you have not fixed the problem completely.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

insearchof · ‎06-19-2019

[root@TGCSESXI-4:~] esxcfg-nics -l
Name    PCI          Driver      Link Speed      Duplex MAC Address       MTU    Description
vmnic0 0000:02:00.0 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b5 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic1 0000:02:00.1 bnx2        Up   1000Mbps   Full   d4:ae:52:77:6c:b6 9000   QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic2 0000:04:00.0 bnx2        Up   1000Mbps   Full   00:26:55:87:67:dc 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
vmnic3 0000:04:00.1 bnx2        Up   1000Mbps   Full   00:26:55:87:67:de 9000   QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
[root@TGCSESXI-4:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack
vmk0       Management Network                      IPv4      10.2.8.75                               255.255.252.0   10.2.11.255     d4:ae:52:77:6c:b5 9000    65535     true    STATIC              defaultTcpipStack
vmk0       Management Network                      IPv6      fe80::d6ae:52ff:fe77:6cb5               64                              d4:ae:52:77:6c:b5 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk1       VMkernel                                IPv4      10.2.8.73                               255.255.252.0   10.2.11.255     00:50:56:6a:33:1a 9000    65535     true    STATIC              defaultTcpipStack
vmk1       VMkernel                                IPv6      fe80::250:56ff:fe6a:331a                64                              00:50:56:6a:33:1a 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk2       Vmotion                                 IPv4      10.2.8.92                               255.255.252.0   10.2.11.255     00:50:56:63:b4:8d 9000    65535     true    STATIC              defaultTcpipStack
vmk2       Vmotion                                 IPv6      fe80::250:56ff:fe63:b48d                64                              00:50:56:63:b4:8d 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk3       Black Armor                             IPv4      10.2.8.84                               255.255.252.0   10.2.11.255     00:50:56:66:bb:ab 9000    65535     true    STATIC              defaultTcpipStack
vmk3       Black Armor                             IPv6      fe80::250:56ff:fe66:bbab                64                              00:50:56:66:bb:ab 9000    65535     true    STATIC, PREFERRED   defaultTcpipStack
[root@TGCSESXI-4:~] esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports Configured Ports MTU     Uplinks
vSwitch0         1792        12          128               9000    vmnic0,vmnic1

PortGroup Name        VLAN ID Used Ports Uplinks
VM Network            0        3           vmnic0,vmnic1
Black Armor           0        1           vmnic0,vmnic1
Vmotion               0        1           vmnic0,vmnic1
VMkernel              0        1           vmnic0,vmnic1
Management Network    0        1           vmnic0,vmnic1

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 1792 4 128 9000 vmnic2

PortGroup Name VLAN ID Used Ports Uplinks
VMkernal-iSCSI-1 0 1 vmnic2

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch2 1792 3 128 9000 vmnic3

PortGroup Name VLAN ID Used Ports Uplinks
VMkernal-iSCSI-2 0 0 vmnic3

[root@TGCSESXI-4:~]

I just changed all vswitchs and vmkernal adapters to MTU 9000 as you can see

The ping

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 1472 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 1472 data bytes
1480 bytes from 10.2.8.62: icmp_seq=0 ttl=64 time=0.312 ms
1480 bytes from 10.2.8.62: icmp_seq=1 ttl=64 time=0.249 ms
1480 bytes from 10.2.8.62: icmp_seq=2 ttl=64 time=0.344 ms

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.249/0.302/0.344 ms
[root@TGCSESXI-9:~]

Hosts TGCSESXI-4 still shows error after changing MTU to 9000 and I also changed the MTU to 9000 to all hosts in the cluster.

daphnissov · ‎06-19-2019

You're still pinging with a small packet size, so you can't evaluate your MTU change. You need to raise the transmit size to something over 8,000.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

insearchof · ‎06-19-2019

[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8000 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8000 data bytes

--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

Is that what you mean?

this did not produce any results

daphnissov · ‎06-19-2019

Yes, so you still don't have MTU set properly, most likely on your external networking infrastructure (which is outside the scope of vSphere and vSAN). So I'll say again: You need to check and update the MTU in your switching/routing infra in order to have this work end-to-end.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net