VMware Cloud Community
Intel233
Enthusiast
Enthusiast

VSAN issue

I am trying to get this going.  I have a cluster right now with 1 host.  I know you need 3 but I am just testing  I have 1 -  2TB HDD that I converted to Flash (via web client) and 1 - 7.9TB HDD.  If I turn on VSAN  and set add disks to storage as Manual  In Disk Management - claim disks shouldn't I see the host as eligible to those 2 disks?

This is on a Dell PowerEdge R710.   I have the 1- 2TB as RAID0 and the I converted to Flash.  The remaining 5 drives are RAID 5.

20 Replies
zdickinson
Expert
Expert

Good afternoon, not sure why you're having the problems.  For testing I would probably do nested ESXi.  Some really good, already prepared stuff here:  Updated VSAN 6.0 Nested ESXi OVF Templates for 64 Nodes, All-Flash Array & Fault Domain Testing | vi...

Thank you, Zach.

Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

Thanks for the reply.

I have 3 hosts.  I is a nested ESXi.  I have update the nested(Host3) and one other (my HP server-Host1) to 6.0 Update2.  My Dell Server(Host2) is 6.0 update 1.  I turned on VSAN again but only took drives from the nested  server and Dell server.   Now Host 1 (HP) and Nested hosts tell me they cant communicate with all other nodes in the Virtual SAN enabled cluster.  Host 2 (DELL server) doesn't have this warning. 

The nested ESXi is on the Dell PowerEdge R710 server.

I believe this is because host 3 is nested.  I have turned on Promiscuous mode.  I have this set to Accept for the VSAN VMkernal Port.  I am not sure why I am getting this one Host 1 and 3.

When I go to each host and look at datastores the vsan datastore show a green check mark and status is normal.

Any suggestions?

Reply
0 Kudos
admin
Immortal
Immortal

First of all, It is recommended to have all the VSAN hosts on the same version. So, you should think of upgrading your vSphere Environment to 6.0u2 release.

There was a bug in vSphere 6.0 Update 1b that could give some false warnings like what you are getting.

Please have a look at the below links:

"Host cannot communicate with all other nodes in virtual SAN enabled cluster" error (2143214) | VMwa...

VSAN issue “Host cannot communicate with all other….” | Virtual Allan

6.0 U1b - Hosts cannot communicate

_________________________

Was your question answered correctly? If so, please remember to mark your question as answered when you get the correct answer and award points to the person providing the answer. This helps others searching for a similar issue.

Cheers!

-Shivam

Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

All hosts are now 6.o update 2 and I am still getting the error that hosts cannot communicate with other nodes in the virtual san enabled cluster.

Reply
0 Kudos
admin
Immortal
Immortal

Could you please try restarting vCenter server services?

For Windows based vCenter - Stopping, starting, or restarting VMware vCenter Server 6.0 services (2109881) | VMware KB

For vCenter Server Appliance - Stopping, starting, or restarting VMware vCenter Server Appliance 6.0 services (2109887) | VMware KB

Also, check your network configuration. This error generally comes when there is network configuration issue.

For more details, please refer to VMware ® Virtual SAN™ 6.2 Network Design Guide


Cheers!

-Shivam

Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

I have restarted the server and same thing.  What is weird is if I go to Storage on the hosts it sees the vsan.  What is weird is that it shows 24.4gb free then on a refresh it show 5.8TB free which is correct.  It keeps going back and forth between 24.4gb free and 5.8TB free.  Like I said I have a VSAN kernal port and each is set for VSAN traffic.   This is a lab environment  I do not have separate VLANS.  Everything is on the 192.168.1.x network.   If it couldn't communicate then how is each host seeing the VSAN?

Thanks for the replies I do appreciate it.  The link for the design guide does not work for me.

I believe the issue is the nested ESXi.  If I go to the other 2 hosts and refresh it is 5.8TB, it doesn't seem to change between 24.4gb and 5.8tb.  .  It seems to be only when I go to the nested one and refresh that this happens.   What do I need to do to get this working?

Reply
0 Kudos
vpradeep01
VMware Employee
VMware Employee

It seems intermittent loss of connectivity between the hosts.

Could you please try to ping the nested ESXi host and the other node( HP ); also does non nested ESXi host communicate with eachother? ( Dell and HP make ones )

>> Use this command to list the vmk interface used for vSAN traffic.

esxcli vsan network list

>> Now use the below format to ping the nodes eachother.

vmkping -I vmkX IP_address_destination_vmk

If MTU is set to 9000; then use this


vmkping -I vmkX IP_address_destination_vmk -d -s 8972


>> Also check if muticast traffic is working:


tcpdump-uw -i vmkX -n -s0 -t -c 20 udp port 12345 or udp port 23451


This command will let you know if multicast traffic is being received from all the three ESXi hosts. Use below documents to address the connectivity issues


Virtual SAN Troubleshooting: Multicast - VMware vSphere Blog


http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vsan-troubleshooting-...

Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

Thanks for the reply.

NY-esxi3 is the nested host.

[root@ny-esxi3:~] esxcli vsan network list

Interface

   VmkNic Name: vmk2

   IP Protocol: IP

   Interface UUID: a8a70258-786d-87a7-fe61-005056bb724a

   Agent Group Multicast Address: 224.2.3.4

   Agent Group IPv6 Multicast Address: ff19::2:3:4

   Agent Group Multicast Port: 23451

   Master Group Multicast Address: 224.1.2.3

   Master Group IPv6 Multicast Address: ff19::1:2:3

   Master Group Multicast Port: 12345

   Host Unicast Channel Bound Port: 12321

   Multicast TTL: 5

For the ping I am doing this from esxi3 (nested)  I am trying to ping vmk2 which is the VSAN kernal port 192.168.1.101 (ny-esxi2).  I hope that is what you mean.

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101

PING 192.168.1.101 (192.168.1.101): 56 data bytes

--- 192.168.1.101 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

[root@ny-esxi3:~]

--- 192.168.1.100 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.100

PING 192.168.1.100 (192.168.1.100): 56 data bytes

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101

PING 192.168.1.101 (192.168.1.101): 56 data bytes

--- 192.168.1.101 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

Not sure if this is what you meant.

This is a screenshot of NY-ESXi2 network

pastedImage_1.png

NY-ESXi3 network:

pastedImage_2.png

I cant get to the other 2 hosts until I get home.

If I just do a vmkping I can ping 192.168.1.45 and 192.168.1.46 from the nested ESXi host

[root@ny-esxi3:~] vmkping 192.168.1.46

PING 192.168.1.46 (192.168.1.46): 56 data bytes

64 bytes from 192.168.1.46: icmp_seq=0 ttl=64 time=0.299 ms

64 bytes from 192.168.1.46: icmp_seq=1 ttl=64 time=0.227 ms

64 bytes from 192.168.1.46: icmp_seq=2 ttl=64 time=0.216 ms

--- 192.168.1.46 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.216/0.247/0.299 ms

[root@ny-esxi3:~] vmkping 192.168.1.45

PING 192.168.1.45 (192.168.1.45): 56 data bytes

64 bytes from 192.168.1.45: icmp_seq=0 ttl=64 time=0.347 ms

64 bytes from 192.168.1.45: icmp_seq=1 ttl=64 time=0.340 ms

64 bytes from 192.168.1.45: icmp_seq=2 ttl=64 time=0.350 ms

[root@ny-esxi3:~] tcpdump-uw -i vmk2 -n -s0 -t -c 20 udp port 23451

tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode

listening on vmk2, link-type EN10MB (Ethernet), capture size 65535 bytes

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

Reply
0 Kudos
darcidinovmw
VMware Employee
VMware Employee

So, I am a little concerned that all of your virtual networks are sitting in the 192.168.1.x...

Can you move your VSAN connections to another subnet and segregate those connections?  If you can't do that, I would recommend removing the extra VMK ports and just enabling VSAN on your management interface.

Your primary issue is networking, hands down. Isolate your VSAN traffic and try again. If you need to leave it connected like it is, please go into your fail over order for your VSAN VMK and make 1 nic active and the other NICs standby.

The capacity changing like that is a sign that your VSAN VMKs are disconnecting from each other.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.
Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

I can't move them to another subnet.  I have removed the VMK port on ESXi3 and enabled it on the management interface.  For the management interface on ESXi3 I have one NIC active the other in standby.   I am still getting the error.

I have removed the VSAN vmk from all hosts and have VSAN on the management port.

Thanks for the help.  This has been driving me nuts.

Reply
0 Kudos
darcidinovmw
VMware Employee
VMware Employee

Yes, you should do this on all of your hosts. vmk ports do not like to be jumped around on different physical NICs.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.
Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

ok done on all hosts and still have the error.   Again ESXi1 and ESXi2 show the correct capacity no matter how many times i refresh.  It is only ESXi3 which is nested.

ESXi1 - HP ML310

ESXi2- Dell PowerEdge R710

ESXi3 - Nested VM on Dell PowerEdge

Just noticed this message.  Virtual SAN cluster  NY has one or more hosts that need disk format upgrade.  ny-esxi2, ny-esxi3.  For more detailed information of Virtual SAN upgrade, please see the 'Virtual SAN upgrade procedure' section in the documentation.

Reply
0 Kudos
darcidinovmw
VMware Employee
VMware Employee

Can you include screen shots of all of your virtual switch, and VMK settings? Also the VM portgroup on host2 which is hosting host 3?

I just want to see how the networking is configured top to bottom.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.
Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

If you need anything else just let me know.

ESXi1:

esxi1-1.PNG

esxi1-2.PNG

esxi1-3.PNG

esxi1-4.PNG

ESXi2:

esxi2-1.PNG

esxi2-2.PNG

esxi2-3.PNG

esxi2-4.PNG

ESXi3:

esxi3-1.PNGesxi3-2.PNG

esxi3-3.PNGesxi3-4.PNG

Reply
0 Kudos
admin
Immortal
Immortal

On a very first look, I noticed the difference in speed on uplinks (vmnics). The speed shows 10G (10000) on ESXi3 and It shows 1G (1000) on ESXi1 and ESXi2.

Also, check the MTU size on vSwitch and vmk ports. MTU size should be same across the environment.

Cheers!

-Shivam

Reply
0 Kudos
Intel233
Enthusiast
Enthusiast

It wont let me change that.  I noticed that too but when I try to change to 1000 i get an error.

nic.PNG

Reply
0 Kudos
darcidinovmw
VMware Employee
VMware Employee

The reason host 3 shows as 10Gb is because you are using a vmxnet3 network adapter on the nested esxi host. So what I want you to show me is the actual configuration of the vswitch the nested esxi host resides in on host 2, like here

General

    Name:    DMZ

    Port binding:    Static binding

    Port allocation:    Elastic

    Number of ports:    8

    Network resource pool:    (default)

Advanced

    Configure reset at disconnect:    Enabled

Override port policies

    Block ports:    Allowed

    Traffic shaping:    Disabled

    Vendor configuration:    Disabled

    VLAN:    Disabled

    Uplink teaming:    Disabled

    Security policy:    Disabled

    NetFlow:    Disabled

    Traffic filtering and marking:    Disabled

Security

    Promiscuous mode:    Reject

    MAC address changes:    Reject

    Forged transmits:    Reject

Ingress traffic shaping

    Status:    Disabled

    Average bandwidth:    --

    Peak bandwidth:    --

    Burst size:    --

Egress traffic shaping

    Status:    Disabled

    Average bandwidth:    --

    Peak bandwidth:    --

    Burst size:    --

VLAN

    Type:    VLAN

    VLAN ID:    xx

Teaming and failover

    Load balancing:    Route based on originating virtual port

    Network failure detection:    Link status only

    Notify switches:    Yes

    Failback:    Yes

    Active uplinks:    Uplink 1

    Standby uplinks:    Uplink 2

    Unused uplinks:    Uplink 3, Uplink 4, lag1

Monitoring

    NetFlow:    Disabled

Traffic filtering and marking

    Status:    Disabled

Miscellaneous

    Block all ports:    No

Do the same with your vmk ports

Port properties

    Network label    VSAN-DPG

    TCP/IP stack    Default

    Enabled services    Virtual SAN traffic

IPv4 settings

    IPv4 address    xxxxx (static)

    Subnet mask  xxxxx

    Default gateway for IPv4    xxxxx

    DNS server addresses  xxxxx

               xxxxxxx

NIC settings

    MAC address    xxxxxxxxxx

    MTU    9000

Security

    Promiscuous mode:    Reject

    MAC address changes:    Reject

    Forged transmits:    Reject

Ingress traffic shaping

    Status:    Disabled

    Average bandwidth:    --

    Peak bandwidth:    --

    Burst size:    --

Egress traffic shaping

    Status:    Disabled

    Average bandwidth:    --

    Peak bandwidth:    --

    Burst size:    --

VLAN

    Type:    None

Teaming and failover

    Load balancing:    Route based on physical NIC load

    Network failure detection:    Link status only

    Notify switches:    Yes

    Failback:    Yes

    Active uplinks:    Uplink 1

    Standby uplinks:    Uplink 2

    Unused uplinks:   

Monitoring

    NetFlow:    Disabled

Traffic filtering and marking

    Status:    Disabled

Miscellaneous

    Block all ports:    No

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.
admin
Immortal
Immortal

Doug Arcidino is right. The reason you are getting this error is because your ESXi3 VM is using vmxnet3. You need to change the network adapter type on ESXi3 VM.

Here are the steps you should follow:

1) Go to ESXi2 and power-off ESXi3 VM.

2) Go to Edit Settings on ESXi3 VM.

3) Remove vmnic0 and vmnic1 (Network adapter 1 and Network adapter 2 respectively).

4) Add two new Ethernet adapter with type as e1000e or e1000 (e1000e preferred).

5) Power-On the ESXi3

6) Add newly created vmnics (would be shown as vmnic0 and vmnic1) to vSwitch0 on ESXi3 (vSwithc0 Properties > Network Adapters > Add)

Hope this would help to resolve the issue. Let me know how it goes.

Cheers!

-Shivam

Intel233
Enthusiast
Enthusiast

ok sorry... I had to redo esxi3 and now it looks ok (used E1000E) nics.   I have 1 error  stating Virtual SAN cluster has one or more hosts that need disk format upgrade:  esxi2, esxi3

I dont see how to do that..

This is not working.  I tried to move a VM to this and get an error.

movevm.PNG

Trying to add a folder via web client I get:

The last operation failed for the entity with the following error message.

Cannot complete file creation operation.

There are currently 2 usable disks for the operation. This operation requires 1 more usable disks.

Remaining 2 disks not usuable because:

0 - Insufficient space for data/cache reservation.

0 - Maintenance mode or unhealthy disks.

0 - Disk-version or storage-type mismatch.

0 - Max component count reached.

2 - In unusable fault-domains due to policy constraints.

0 - In witness node.

Failed to create object.

Drives from Esxi2 (1- 2TB/ 1- 5.8TB and 1- 2TB SSD) and esxi3 (2-40GB / 1-40GB SSD) participating.  Do I need to add from Esxi1 for this to work? 

Reply
0 Kudos