I am trying to get this going. I have a cluster right now with 1 host. I know you need 3 but I am just testing I have 1 - 2TB HDD that I converted to Flash (via web client) and 1 - 7.9TB HDD. If I turn on VSAN and set add disks to storage as Manual In Disk Management - claim disks shouldn't I see the host as eligible to those 2 disks?
This is on a Dell PowerEdge R710. I have the 1- 2TB as RAID0 and the I converted to Flash. The remaining 5 drives are RAID 5.
Good afternoon, not sure why you're having the problems. For testing I would probably do nested ESXi. Some really good, already prepared stuff here: Updated VSAN 6.0 Nested ESXi OVF Templates for 64 Nodes, All-Flash Array & Fault Domain Testing | vi...
Thank you, Zach.
Thanks for the reply.
I have 3 hosts. I is a nested ESXi. I have update the nested(Host3) and one other (my HP server-Host1) to 6.0 Update2. My Dell Server(Host2) is 6.0 update 1. I turned on VSAN again but only took drives from the nested server and Dell server. Now Host 1 (HP) and Nested hosts tell me they cant communicate with all other nodes in the Virtual SAN enabled cluster. Host 2 (DELL server) doesn't have this warning.
The nested ESXi is on the Dell PowerEdge R710 server.
I believe this is because host 3 is nested. I have turned on Promiscuous mode. I have this set to Accept for the VSAN VMkernal Port. I am not sure why I am getting this one Host 1 and 3.
When I go to each host and look at datastores the vsan datastore show a green check mark and status is normal.
Any suggestions?
First of all, It is recommended to have all the VSAN hosts on the same version. So, you should think of upgrading your vSphere Environment to 6.0u2 release.
There was a bug in vSphere 6.0 Update 1b that could give some false warnings like what you are getting.
Please have a look at the below links:
VSAN issue “Host cannot communicate with all other….” | Virtual Allan
6.0 U1b - Hosts cannot communicate
_________________________
Was your question answered correctly? If so, please remember to mark your question as answered when you get the correct answer and award points to the person providing the answer. This helps others searching for a similar issue.
Cheers!
-Shivam
All hosts are now 6.o update 2 and I am still getting the error that hosts cannot communicate with other nodes in the virtual san enabled cluster.
Could you please try restarting vCenter server services?
For Windows based vCenter - Stopping, starting, or restarting VMware vCenter Server 6.0 services (2109881) | VMware KB
For vCenter Server Appliance - Stopping, starting, or restarting VMware vCenter Server Appliance 6.0 services (2109887) | VMware KB
Also, check your network configuration. This error generally comes when there is network configuration issue.
For more details, please refer to VMware ® Virtual SAN™ 6.2 Network Design Guide
Cheers!
-Shivam
I have restarted the server and same thing. What is weird is if I go to Storage on the hosts it sees the vsan. What is weird is that it shows 24.4gb free then on a refresh it show 5.8TB free which is correct. It keeps going back and forth between 24.4gb free and 5.8TB free. Like I said I have a VSAN kernal port and each is set for VSAN traffic. This is a lab environment I do not have separate VLANS. Everything is on the 192.168.1.x network. If it couldn't communicate then how is each host seeing the VSAN?
Thanks for the replies I do appreciate it. The link for the design guide does not work for me.
I believe the issue is the nested ESXi. If I go to the other 2 hosts and refresh it is 5.8TB, it doesn't seem to change between 24.4gb and 5.8tb. . It seems to be only when I go to the nested one and refresh that this happens. What do I need to do to get this working?
It seems intermittent loss of connectivity between the hosts.
Could you please try to ping the nested ESXi host and the other node( HP ); also does non nested ESXi host communicate with eachother? ( Dell and HP make ones )
>> Use this command to list the vmk interface used for vSAN traffic.
esxcli vsan network list
>> Now use the below format to ping the nodes eachother.
vmkping -I vmkX IP_address_destination_vmk
If MTU is set to 9000; then use this
vmkping -I vmkX IP_address_destination_vmk -d -s 8972
>> Also check if muticast traffic is working:
tcpdump-uw -i vmkX -n -s0 -t -c 20 udp port 12345 or udp port 23451
This command will let you know if multicast traffic is being received from all the three ESXi hosts. Use below documents to address the connectivity issues
Virtual SAN Troubleshooting: Multicast - VMware vSphere Blog
Thanks for the reply.
NY-esxi3 is the nested host.
[root@ny-esxi3:~] esxcli vsan network list
Interface
VmkNic Name: vmk2
IP Protocol: IP
Interface UUID: a8a70258-786d-87a7-fe61-005056bb724a
Agent Group Multicast Address: 224.2.3.4
Agent Group IPv6 Multicast Address: ff19::2:3:4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group IPv6 Multicast Address: ff19::1:2:3
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Multicast TTL: 5
For the ping I am doing this from esxi3 (nested) I am trying to ping vmk2 which is the VSAN kernal port 192.168.1.101 (ny-esxi2). I hope that is what you mean.
[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101
PING 192.168.1.101 (192.168.1.101): 56 data bytes
--- 192.168.1.101 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@ny-esxi3:~]
--- 192.168.1.100 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.100
PING 192.168.1.100 (192.168.1.100): 56 data bytes
[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101
PING 192.168.1.101 (192.168.1.101): 56 data bytes
--- 192.168.1.101 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
Not sure if this is what you meant.
This is a screenshot of NY-ESXi2 network
NY-ESXi3 network:
I cant get to the other 2 hosts until I get home.
If I just do a vmkping I can ping 192.168.1.45 and 192.168.1.46 from the nested ESXi host
[root@ny-esxi3:~] vmkping 192.168.1.46
PING 192.168.1.46 (192.168.1.46): 56 data bytes
64 bytes from 192.168.1.46: icmp_seq=0 ttl=64 time=0.299 ms
64 bytes from 192.168.1.46: icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from 192.168.1.46: icmp_seq=2 ttl=64 time=0.216 ms
--- 192.168.1.46 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.216/0.247/0.299 ms
[root@ny-esxi3:~] vmkping 192.168.1.45
PING 192.168.1.45 (192.168.1.45): 56 data bytes
64 bytes from 192.168.1.45: icmp_seq=0 ttl=64 time=0.347 ms
64 bytes from 192.168.1.45: icmp_seq=1 ttl=64 time=0.340 ms
64 bytes from 192.168.1.45: icmp_seq=2 ttl=64 time=0.350 ms
[root@ny-esxi3:~] tcpdump-uw -i vmk2 -n -s0 -t -c 20 udp port 23451
tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmk2, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240
So, I am a little concerned that all of your virtual networks are sitting in the 192.168.1.x...
Can you move your VSAN connections to another subnet and segregate those connections? If you can't do that, I would recommend removing the extra VMK ports and just enabling VSAN on your management interface.
Your primary issue is networking, hands down. Isolate your VSAN traffic and try again. If you need to leave it connected like it is, please go into your fail over order for your VSAN VMK and make 1 nic active and the other NICs standby.
The capacity changing like that is a sign that your VSAN VMKs are disconnecting from each other.
I can't move them to another subnet. I have removed the VMK port on ESXi3 and enabled it on the management interface. For the management interface on ESXi3 I have one NIC active the other in standby. I am still getting the error.
I have removed the VSAN vmk from all hosts and have VSAN on the management port.
Thanks for the help. This has been driving me nuts.
Yes, you should do this on all of your hosts. vmk ports do not like to be jumped around on different physical NICs.
ok done on all hosts and still have the error. Again ESXi1 and ESXi2 show the correct capacity no matter how many times i refresh. It is only ESXi3 which is nested.
ESXi1 - HP ML310
ESXi2- Dell PowerEdge R710
ESXi3 - Nested VM on Dell PowerEdge
Just noticed this message. Virtual SAN cluster NY has one or more hosts that need disk format upgrade. ny-esxi2, ny-esxi3. For more detailed information of Virtual SAN upgrade, please see the 'Virtual SAN upgrade procedure' section in the documentation.
Can you include screen shots of all of your virtual switch, and VMK settings? Also the VM portgroup on host2 which is hosting host 3?
I just want to see how the networking is configured top to bottom.
If you need anything else just let me know.
ESXi1:
ESXi2:
ESXi3:
On a very first look, I noticed the difference in speed on uplinks (vmnics). The speed shows 10G (10000) on ESXi3 and It shows 1G (1000) on ESXi1 and ESXi2.
Also, check the MTU size on vSwitch and vmk ports. MTU size should be same across the environment.
Cheers!
-Shivam
It wont let me change that. I noticed that too but when I try to change to 1000 i get an error.
The reason host 3 shows as 10Gb is because you are using a vmxnet3 network adapter on the nested esxi host. So what I want you to show me is the actual configuration of the vswitch the nested esxi host resides in on host 2, like here
General
Name: DMZ
Port binding: Static binding
Port allocation: Elastic
Number of ports: 8
Network resource pool: (default)
Advanced
Configure reset at disconnect: Enabled
Override port policies
Block ports: Allowed
Traffic shaping: Disabled
Vendor configuration: Disabled
VLAN: Disabled
Uplink teaming: Disabled
Security policy: Disabled
NetFlow: Disabled
Traffic filtering and marking: Disabled
Security
Promiscuous mode: Reject
MAC address changes: Reject
Forged transmits: Reject
Ingress traffic shaping
Status: Disabled
Average bandwidth: --
Peak bandwidth: --
Burst size: --
Egress traffic shaping
Status: Disabled
Average bandwidth: --
Peak bandwidth: --
Burst size: --
VLAN
Type: VLAN
VLAN ID: xx
Teaming and failover
Load balancing: Route based on originating virtual port
Network failure detection: Link status only
Notify switches: Yes
Failback: Yes
Active uplinks: Uplink 1
Standby uplinks: Uplink 2
Unused uplinks: Uplink 3, Uplink 4, lag1
Monitoring
NetFlow: Disabled
Traffic filtering and marking
Status: Disabled
Miscellaneous
Block all ports: No
Do the same with your vmk ports
Port properties
Network label VSAN-DPG
TCP/IP stack Default
Enabled services Virtual SAN traffic
IPv4 settings
IPv4 address xxxxx (static)
Subnet mask xxxxx
Default gateway for IPv4 xxxxx
DNS server addresses xxxxx
xxxxxxx
NIC settings
MAC address xxxxxxxxxx
MTU 9000
Security
Promiscuous mode: Reject
MAC address changes: Reject
Forged transmits: Reject
Ingress traffic shaping
Status: Disabled
Average bandwidth: --
Peak bandwidth: --
Burst size: --
Egress traffic shaping
Status: Disabled
Average bandwidth: --
Peak bandwidth: --
Burst size: --
VLAN
Type: None
Teaming and failover
Load balancing: Route based on physical NIC load
Network failure detection: Link status only
Notify switches: Yes
Failback: Yes
Active uplinks: Uplink 1
Standby uplinks: Uplink 2
Unused uplinks:
Monitoring
NetFlow: Disabled
Traffic filtering and marking
Status: Disabled
Miscellaneous
Block all ports: No
Doug Arcidino is right. The reason you are getting this error is because your ESXi3 VM is using vmxnet3. You need to change the network adapter type on ESXi3 VM.
Here are the steps you should follow:
1) Go to ESXi2 and power-off ESXi3 VM.
2) Go to Edit Settings on ESXi3 VM.
3) Remove vmnic0 and vmnic1 (Network adapter 1 and Network adapter 2 respectively).
4) Add two new Ethernet adapter with type as e1000e or e1000 (e1000e preferred).
5) Power-On the ESXi3
6) Add newly created vmnics (would be shown as vmnic0 and vmnic1) to vSwitch0 on ESXi3 (vSwithc0 Properties > Network Adapters > Add)
Hope this would help to resolve the issue. Let me know how it goes.
Cheers!
-Shivam
ok sorry... I had to redo esxi3 and now it looks ok (used E1000E) nics. I have 1 error stating Virtual SAN cluster has one or more hosts that need disk format upgrade: esxi2, esxi3
I dont see how to do that..
This is not working. I tried to move a VM to this and get an error.
Trying to add a folder via web client I get:
The last operation failed for the entity with the following error message.
Cannot complete file creation operation.
There are currently 2 usable disks for the operation. This operation requires 1 more usable disks.
Remaining 2 disks not usuable because:
0 - Insufficient space for data/cache reservation.
0 - Maintenance mode or unhealthy disks.
0 - Disk-version or storage-type mismatch.
0 - Max component count reached.
2 - In unusable fault-domains due to policy constraints.
0 - In witness node.
Failed to create object.
Drives from Esxi2 (1- 2TB/ 1- 5.8TB and 1- 2TB SSD) and esxi3 (2-40GB / 1-40GB SSD) participating. Do I need to add from Esxi1 for this to work?