VMware 6.5
Just install SSD drives on my ESXI Hosts and configured VSAN.
After claiming the drive on 4 hosts
The VSANDatastore was created on all the hosts but has Zero space available
Any reason why ??
Thank you
Tom
Check whether the nodes have a non-zero vSAN decom state:
# cmmds-tool find -t NODE_DECOM_STATE
Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.
As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:
# vmkping -I vmkX -s 8972 -d <DestIP>
Bob
Hello Tom,
By claimed SSDs do you mean that you have configured vSAN Disk-Groups?
If these have been created then other causes can be hosts in Maintenance Mode or the cluster being network-partitioned - vSAN Health check should inform further:
Cluster > Monitor > vSAN > Health
Bob
Hi
I did not create a disk group although I see GOUP 1 on the display
The error above not helpful what is cluster error?
Check the physical disks section to see if any disks were claimed, cause it seems something has failed. I would also recommend using the H5 interface if you have vSphere 6.7
Hello
I am running VMWARE 6.5 not 6.7
I believe all my claimed disks are H5 see images
The error on the Data Center reflects that I do not have any capacity
The Disk Format version column shows 5 that's H5 you mentioned correct?
IF the disks are claimed which I did manually I do not understand why I do not have a capacity
Any other way to check command line ??
Thank you.
Tom
Did you check the Firmware Version?
You can look here in the compatibility list:
Check whether the nodes have a non-zero vSAN decom state:
# cmmds-tool find -t NODE_DECOM_STATE
Per your other Communities post (assuming it is the same cluster and igf so, no idea why you started another thread) you are losing packets which strongly indicates you are having network issues and/or it is is misconfigured - if the cluster is partitioned or flapping then this would explain 0 bytes storage.
As an aside "vmkping -I VMK1 10.10.10.10 -s 9000 -c 100" as MJMSRI advised is not a very good test as a) you have not indicated whether you are using jumbo frames (and thus would need to even check large MTU), b) 9000 byte test pings WILL actually pass through 1500 MTU network unless you set 'don't fragment' flag (-d) and c) in a correctly configured VMware environment using jumbo frames 9000 won't actually pass as we have our own additional packet headers (28 bytes). The correct way to check it (if you DO have Jumbo frames enabled) would be:
# vmkping -I vmkX -s 8972 -d <DestIP>
Bob
Not a very good test? But then you advised to use the same test and simply changed the command slightly from 9000 to 8972 and removed -c which is for count so the command runs for a set amount.
"simply changed the command slightly from 9000 to 8972"
Yes because -s 9000 -d won't pass on a functional Jumbo-frame enabled network (unless it has been raised to something higher than necessary on the swtich) - test it if you do not believe me.
And without -d it doesn't validate whether full Jumbo-frame can pass (which is what I assume you were advising to test as otherwise I don't know why you would specify a size value at all).
Bob
To All
This Host has this error warning and wondering if this is the cause
Also do I need to open the VSAN rule sets on the Hosts firewall settings?
I have another issue opened with this error but thought it might be relevant
I ran this on one host that seems ok and the one in the picture above
[root@TGCSESXI-9:~]
cmmds-tool find -t NODE_DECOM_STATE
owner=598cfc1b-c9e1-d832-d6a2-f04da207add3(Health:
Healthy) uuid=598cfc1b-c9e1-d
832-d6a2-f04da207add3 type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)
owner=5c69a839-08c6-a636-c148-782bcb054c38(Health:
Healthy) uuid=5c69a839-08c6-a
636-c148-782bcb054c38 type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)
owner=5a598968-4aa6-8b4a-0925-782bcb1399f7(Health:
Healthy) uuid=5a598968-4aa6-8
b4a-0925-782bcb1399f7 type=NODE_DECOM_STATE rev=5 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)
owner=5710250b-cfcf-67fc-9dc0-a4badb376d3f(Health:
Healthy) uuid=5710250b-cfcf-6
7fc-9dc0-a4badb376d3f type=NODE_DECOM_STATE rev=2 minHostVer=0 [content = (i0 i
0 UUID_NULL i0 [ ] i0 i0 i0)], errorStr=(null)
Host with Issue
[root@TGCSESXI-4:~]
cmmds-tool find -t NODE_DECOM_STATE
[root@TGCSESXI-4:~]
Thank you
[root@TGCSESXI-4:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)
--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-4:~]
???
I tried on another host with same results
[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8972 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)
sendto() failed (Message too long)
--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@TGCSESXI-9:~]
These messages indicate you don't have jumbo frames configured properly on your external switching infrastructure.
How do I check that?
Where do I do that also?
On the Network adapters on the ESXI host?
You need to check everything end-to-end including the vswitch where these vmkernel ports are connected and your physical switches.
Hello
I changed the MTU to 9000 on all Hosts vswitch and vmkernal adapters.
I saw Update vSAN configuration in recent tasks
And then the vsandatastore had 6.2 GB
Then I thought wow it is working
After I updated the last host then the Vsandatastore went back to ZERO
What else can we check
Something is wrong here?
Does your vmkping work now with the larger packet size? If no, you have not fixed the problem completely.
[root@TGCSESXI-4:~] esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:02:00.0 bnx2 Up 1000Mbps Full d4:ae:52:77:6c:b5 9000 QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic1 0000:02:00.1 bnx2 Up 1000Mbps Full d4:ae:52:77:6c:b6 9000 QLogic Corporation QLogic NetXtreme II BCM5716 1000Base-T
vmnic2 0000:04:00.0 bnx2 Up 1000Mbps Full 00:26:55:87:67:dc 9000 QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
vmnic3 0000:04:00.1 bnx2 Up 1000Mbps Full 00:26:55:87:67:de 9000 QLogic Corporation NC382T PCI Express Dual Port Multifunction Gigabit Server Adapter
[root@TGCSESXI-4:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 Management Network IPv4 10.2.8.75 255.255.252.0 10.2.11.255 d4:ae:52:77:6c:b5 9000 65535 true STATIC defaultTcpipStack
vmk0 Management Network IPv6 fe80::d6ae:52ff:fe77:6cb5 64 d4:ae:52:77:6c:b5 9000 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 VMkernel IPv4 10.2.8.73 255.255.252.0 10.2.11.255 00:50:56:6a:33:1a 9000 65535 true STATIC defaultTcpipStack
vmk1 VMkernel IPv6 fe80::250:56ff:fe6a:331a 64 00:50:56:6a:33:1a 9000 65535 true STATIC, PREFERRED defaultTcpipStack
vmk2 Vmotion IPv4 10.2.8.92 255.255.252.0 10.2.11.255 00:50:56:63:b4:8d 9000 65535 true STATIC defaultTcpipStack
vmk2 Vmotion IPv6 fe80::250:56ff:fe63:b48d 64 00:50:56:63:b4:8d 9000 65535 true STATIC, PREFERRED defaultTcpipStack
vmk3 Black Armor IPv4 10.2.8.84 255.255.252.0 10.2.11.255 00:50:56:66:bb:ab 9000 65535 true STATIC defaultTcpipStack
vmk3 Black Armor IPv6 fe80::250:56ff:fe66:bbab 64 00:50:56:66:bb:ab 9000 65535 true STATIC, PREFERRED defaultTcpipStack
[root@TGCSESXI-4:~] esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 1792 12 128 9000 vmnic0,vmnic1
PortGroup Name VLAN ID Used Ports Uplinks
VM Network 0 3 vmnic0,vmnic1
Black Armor 0 1 vmnic0,vmnic1
Vmotion 0 1 vmnic0,vmnic1
VMkernel 0 1 vmnic0,vmnic1
Management Network 0 1 vmnic0,vmnic1
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 1792 4 128 9000 vmnic2
PortGroup Name VLAN ID Used Ports Uplinks
VMkernal-iSCSI-1 0 1 vmnic2
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch2 1792 3 128 9000 vmnic3
PortGroup Name VLAN ID Used Ports Uplinks
VMkernal-iSCSI-2 0 0 vmnic3
[root@TGCSESXI-4:~]
I just changed all vswitchs and vmkernal adapters to MTU 9000 as you can see
The ping
[root@TGCSESXI-9:~] vmkping -I vmk2 -s 1472 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 1472 data bytes
1480 bytes from 10.2.8.62: icmp_seq=0 ttl=64 time=0.312 ms
1480 bytes from 10.2.8.62: icmp_seq=1 ttl=64 time=0.249 ms
1480 bytes from 10.2.8.62: icmp_seq=2 ttl=64 time=0.344 ms
--- 10.2.8.62 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.249/0.302/0.344 ms
[root@TGCSESXI-9:~]
Hosts TGCSESXI-4 still shows error after changing MTU to 9000 and I also changed the MTU to 9000 to all hosts in the cluster.
You're still pinging with a small packet size, so you can't evaluate your MTU change. You need to raise the transmit size to something over 8,000.
[root@TGCSESXI-9:~] vmkping -I vmk2 -s 8000 -d 10.2.8.62
PING 10.2.8.62 (10.2.8.62): 8000 data bytes
--- 10.2.8.62 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
Is that what you mean?
this did not produce any results
Yes, so you still don't have MTU set properly, most likely on your external networking infrastructure (which is outside the scope of vSphere and vSAN). So I'll say again: You need to check and update the MTU in your switching/routing infra in order to have this work end-to-end.