Re: VSAN issue

Intel233 · ‎10-09-2016

I am trying to get this going. I have a cluster right now with 1 host. I know you need 3 but I am just testing I have 1 - 2TB HDD that I converted to Flash (via web client) and 1 - 7.9TB HDD. If I turn on VSAN and set add disks to storage as Manual In Disk Management - claim disks shouldn't I see the host as eligible to those 2 disks?

This is on a Dell PowerEdge R710. I have the 1- 2TB as RAID0 and the I converted to Flash. The remaining 5 drives are RAID 5.

zdickinson · ‎10-10-2016

Good afternoon, not sure why you're having the problems. For testing I would probably do nested ESXi. Some really good, already prepared stuff here: Updated VSAN 6.0 Nested ESXi OVF Templates for 64 Nodes, All-Flash Array & Fault Domain Testing | vi...

Thank you, Zach.

Intel233 · ‎10-16-2016

Thanks for the reply.

I have 3 hosts. I is a nested ESXi. I have update the nested(Host3) and one other (my HP server-Host1) to 6.0 Update2. My Dell Server(Host2) is 6.0 update 1. I turned on VSAN again but only took drives from the nested server and Dell server. Now Host 1 (HP) and Nested hosts tell me they cant communicate with all other nodes in the Virtual SAN enabled cluster. Host 2 (DELL server) doesn't have this warning.

The nested ESXi is on the Dell PowerEdge R710 server.

I believe this is because host 3 is nested. I have turned on Promiscuous mode. I have this set to Accept for the VSAN VMkernal Port. I am not sure why I am getting this one Host 1 and 3.

When I go to each host and look at datastores the vsan datastore show a green check mark and status is normal.

Any suggestions?

admin · ‎10-16-2016

First of all, It is recommended to have all the VSAN hosts on the same version. So, you should think of upgrading your vSphere Environment to 6.0u2 release.

There was a bug in vSphere 6.0 Update 1b that could give some false warnings like what you are getting.

Please have a look at the below links:

"Host cannot communicate with all other nodes in virtual SAN enabled cluster" error (2143214) | VMwa...

VSAN issue “Host cannot communicate with all other….” | Virtual Allan

6.0 U1b - Hosts cannot communicate

_________________________

Was your question answered correctly? If so, please remember to mark your question as answered when you get the correct answer and award points to the person providing the answer. This helps others searching for a similar issue.

Cheers!

-Shivam

Intel233 · ‎10-19-2016

All hosts are now 6.o update 2 and I am still getting the error that hosts cannot communicate with other nodes in the virtual san enabled cluster.

admin · ‎10-20-2016

Could you please try restarting vCenter server services?

For Windows based vCenter - Stopping, starting, or restarting VMware vCenter Server 6.0 services (2109881) | VMware KB

For vCenter Server Appliance - Stopping, starting, or restarting VMware vCenter Server Appliance 6.0 services (2109887) | VMware KB‌

Also, check your network configuration. This error generally comes when there is network configuration issue.

For more details, please refer to VMware ® Virtual SAN™ 6.2 Network Design Guide

Cheers!

-Shivam

Intel233 · ‎10-20-2016

I have restarted the server and same thing. What is weird is if I go to Storage on the hosts it sees the vsan. What is weird is that it shows 24.4gb free then on a refresh it show 5.8TB free which is correct. It keeps going back and forth between 24.4gb free and 5.8TB free. Like I said I have a VSAN kernal port and each is set for VSAN traffic. This is a lab environment I do not have separate VLANS. Everything is on the 192.168.1.x network. If it couldn't communicate then how is each host seeing the VSAN?

Thanks for the replies I do appreciate it. The link for the design guide does not work for me.

I believe the issue is the nested ESXi. If I go to the other 2 hosts and refresh it is 5.8TB, it doesn't seem to change between 24.4gb and 5.8tb. . It seems to be only when I go to the nested one and refresh that this happens. What do I need to do to get this working?

vpradeep01 · ‎10-20-2016

It seems intermittent loss of connectivity between the hosts.

Could you please try to ping the nested ESXi host and the other node( HP ); also does non nested ESXi host communicate with eachother? ( Dell and HP make ones )

>> Use this command to list the vmk interface used for vSAN traffic.

esxcli vsan network list

>> Now use the below format to ping the nodes eachother.

vmkping -I vmkX IP_address_destination_vmk

If MTU is set to 9000; then use this

vmkping -I vmkX IP_address_destination_vmk -d -s 8972

>> Also check if muticast traffic is working:

tcpdump-uw -i vmkX -n -s0 -t -c 20 udp port 12345 or udp port 23451

This command will let you know if multicast traffic is being received from all the three ESXi hosts. Use below documents to address the connectivity issues

Virtual SAN Troubleshooting: Multicast - VMware vSphere Blog

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vsan-troubleshooting-...

Intel233 · ‎10-20-2016

Thanks for the reply.

NY-esxi3 is the nested host.

[root@ny-esxi3:~] esxcli vsan network list

Interface

VmkNic Name: vmk2

IP Protocol: IP

Interface UUID: a8a70258-786d-87a7-fe61-005056bb724a

Agent Group Multicast Address: 224.2.3.4

Agent Group IPv6 Multicast Address: ff19::2:3:4

Agent Group Multicast Port: 23451

Master Group Multicast Address: 224.1.2.3

Master Group IPv6 Multicast Address: ff19::1:2:3

Master Group Multicast Port: 12345

Host Unicast Channel Bound Port: 12321

Multicast TTL: 5

For the ping I am doing this from esxi3 (nested) I am trying to ping vmk2 which is the VSAN kernal port 192.168.1.101 (ny-esxi2). I hope that is what you mean.

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101

PING 192.168.1.101 (192.168.1.101): 56 data bytes

--- 192.168.1.101 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

[root@ny-esxi3:~]

--- 192.168.1.100 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.100

PING 192.168.1.100 (192.168.1.100): 56 data bytes

[root@ny-esxi3:~] vmkping -I vmk2 192.168.1.101

PING 192.168.1.101 (192.168.1.101): 56 data bytes

--- 192.168.1.101 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

Not sure if this is what you meant.

This is a screenshot of NY-ESXi2 network

NY-ESXi3 network:

I cant get to the other 2 hosts until I get home.

If I just do a vmkping I can ping 192.168.1.45 and 192.168.1.46 from the nested ESXi host

[root@ny-esxi3:~] vmkping 192.168.1.46

PING 192.168.1.46 (192.168.1.46): 56 data bytes

64 bytes from 192.168.1.46: icmp_seq=0 ttl=64 time=0.299 ms

64 bytes from 192.168.1.46: icmp_seq=1 ttl=64 time=0.227 ms

64 bytes from 192.168.1.46: icmp_seq=2 ttl=64 time=0.216 ms

--- 192.168.1.46 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.216/0.247/0.299 ms

[root@ny-esxi3:~] vmkping 192.168.1.45

PING 192.168.1.45 (192.168.1.45): 56 data bytes

64 bytes from 192.168.1.45: icmp_seq=0 ttl=64 time=0.347 ms

64 bytes from 192.168.1.45: icmp_seq=1 ttl=64 time=0.340 ms

64 bytes from 192.168.1.45: icmp_seq=2 ttl=64 time=0.350 ms

[root@ny-esxi3:~] tcpdump-uw -i vmk2 -n -s0 -t -c 20 udp port 23451

tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode

listening on vmk2, link-type EN10MB (Ethernet), capture size 65535 bytes

IP 192.168.1.102.52597 > 224.2.3.4.23451: UDP, length 240

darcidinovmw · ‎10-20-2016

So, I am a little concerned that all of your virtual networks are sitting in the 192.168.1.x...

Can you move your VSAN connections to another subnet and segregate those connections? If you can't do that, I would recommend removing the extra VMK ports and just enabling VSAN on your management interface.

Your primary issue is networking, hands down. Isolate your VSAN traffic and try again. If you need to leave it connected like it is, please go into your fail over order for your VSAN VMK and make 1 nic active and the other NICs standby.

The capacity changing like that is a sign that your VSAN VMKs are disconnecting from each other.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

Intel233 · ‎10-20-2016

I can't move them to another subnet. I have removed the VMK port on ESXi3 and enabled it on the management interface. For the management interface on ESXi3 I have one NIC active the other in standby. I am still getting the error.

I have removed the VSAN vmk from all hosts and have VSAN on the management port.

Thanks for the help. This has been driving me nuts.

darcidinovmw · ‎10-20-2016

Yes, you should do this on all of your hosts. vmk ports do not like to be jumped around on different physical NICs.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

Intel233 · ‎10-20-2016

ok done on all hosts and still have the error. Again ESXi1 and ESXi2 show the correct capacity no matter how many times i refresh. It is only ESXi3 which is nested.

ESXi1 - HP ML310

ESXi2- Dell PowerEdge R710

ESXi3 - Nested VM on Dell PowerEdge

Just noticed this message. Virtual SAN cluster NY has one or more hosts that need disk format upgrade. ny-esxi2, ny-esxi3. For more detailed information of Virtual SAN upgrade, please see the 'Virtual SAN upgrade procedure' section in the documentation.

darcidinovmw · ‎10-20-2016

Can you include screen shots of all of your virtual switch, and VMK settings? Also the VM portgroup on host2 which is hosting host 3?

I just want to see how the networking is configured top to bottom.

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

Intel233 · ‎10-20-2016

If you need anything else just let me know.

ESXi1:

ESXi2:

ESXi3:

admin · ‎10-20-2016

On a very first look, I noticed the difference in speed on uplinks (vmnics). The speed shows 10G (10000) on ESXi3 and It shows 1G (1000) on ESXi1 and ESXi2.

Also, check the MTU size on vSwitch and vmk ports. MTU size should be same across the environment.

Cheers!

-Shivam

Intel233 · ‎10-20-2016

It wont let me change that. I noticed that too but when I try to change to 1000 i get an error.

darcidinovmw · ‎10-20-2016

The reason host 3 shows as 10Gb is because you are using a vmxnet3 network adapter on the nested esxi host. So what I want you to show me is the actual configuration of the vswitch the nested esxi host resides in on host 2, like here

General

Name: DMZ

Port binding: Static binding

Port allocation: Elastic

Number of ports: 8

Network resource pool: (default)

Advanced

Configure reset at disconnect: Enabled

Override port policies

Block ports: Allowed

Traffic shaping: Disabled

Vendor configuration: Disabled

VLAN: Disabled

Uplink teaming: Disabled

Security policy: Disabled

NetFlow: Disabled

Traffic filtering and marking: Disabled

Security

Promiscuous mode: Reject

MAC address changes: Reject

Forged transmits: Reject

Ingress traffic shaping

Status: Disabled

Average bandwidth: --

Peak bandwidth: --

Burst size: --

Egress traffic shaping

Status: Disabled

Average bandwidth: --

Peak bandwidth: --

Burst size: --

VLAN

Type: VLAN

VLAN ID: xx

Teaming and failover

Load balancing: Route based on originating virtual port

Network failure detection: Link status only

Notify switches: Yes

Failback: Yes

Active uplinks: Uplink 1

Standby uplinks: Uplink 2

Unused uplinks: Uplink 3, Uplink 4, lag1

Monitoring

NetFlow: Disabled

Traffic filtering and marking

Status: Disabled

Miscellaneous

Block all ports: No

Do the same with your vmk ports

Port properties

Network label VSAN-DPG

TCP/IP stack Default

Enabled services Virtual SAN traffic

IPv4 settings

IPv4 address xxxxx (static)

Subnet mask xxxxx

Default gateway for IPv4 xxxxx

DNS server addresses xxxxx

xxxxxxx

NIC settings

MAC address xxxxxxxxxx

MTU 9000

Security

Promiscuous mode: Reject

MAC address changes: Reject

Forged transmits: Reject

Ingress traffic shaping

Status: Disabled

Average bandwidth: --

Peak bandwidth: --

Burst size: --

Egress traffic shaping

Status: Disabled

Average bandwidth: --

Peak bandwidth: --

Burst size: --

VLAN

Type: None

Teaming and failover

Load balancing: Route based on physical NIC load

Network failure detection: Link status only

Notify switches: Yes

Failback: Yes

Active uplinks: Uplink 1

Standby uplinks: Uplink 2

Unused uplinks:

Monitoring

NetFlow: Disabled

Traffic filtering and marking

Status: Disabled

Miscellaneous

Block all ports: No

Doug Arcidino VCP-DCV 4/5/6, VCP-DTM 5/6/7, VCAP-DCV Deploy/Design 6 If this answer was helpful, please mark it as answer I work for VMware Disclaimer: Any views or opinions expressed here are strictly my own. I am solely responsible for all content published here. Content published here is not read, reviewed or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware.

admin · ‎10-20-2016

Doug Arcidino is right. The reason you are getting this error is because your ESXi3 VM is using vmxnet3. You need to change the network adapter type on ESXi3 VM.

Here are the steps you should follow:

1) Go to ESXi2 and power-off ESXi3 VM.

2) Go to Edit Settings on ESXi3 VM.

3) Remove vmnic0 and vmnic1 (Network adapter 1 and Network adapter 2 respectively).

4) Add two new Ethernet adapter with type as e1000e or e1000 (e1000e preferred).

5) Power-On the ESXi3

6) Add newly created vmnics (would be shown as vmnic0 and vmnic1) to vSwitch0 on ESXi3 (vSwithc0 Properties > Network Adapters > Add)

Hope this would help to resolve the issue. Let me know how it goes.

Cheers!

-Shivam

Intel233 · ‎10-20-2016

ok sorry... I had to redo esxi3 and now it looks ok (used E1000E) nics. I have 1 error stating Virtual SAN cluster has one or more hosts that need disk format upgrade: esxi2, esxi3

I dont see how to do that..

This is not working. I tried to move a VM to this and get an error.

Trying to add a folder via web client I get:

The last operation failed for the entity with the following error message.

Cannot complete file creation operation.

There are currently 2 usable disks for the operation. This operation requires 1 more usable disks.

Remaining 2 disks not usuable because:

0 - Insufficient space for data/cache reservation.

0 - Maintenance mode or unhealthy disks.

0 - Disk-version or storage-type mismatch.

0 - Max component count reached.

2 - In unusable fault-domains due to policy constraints.

0 - In witness node.

Failed to create object.

Drives from Esxi2 (1- 2TB/ 1- 5.8TB and 1- 2TB SSD) and esxi3 (2-40GB / 1-40GB SSD) participating. Do I need to add from Esxi1 for this to work?