brugh2,
It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.
Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?
Also, we can try a couple of things.
1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*
2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).
3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**
--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:
# vmkping -s 8500 -d <destination address>
4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.
5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.
Please let me know how things go!
* The output should look something like this (from my infrastructure):
Interface
VmkNic Name: vmk1
IP Protocol: IPv4
Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604
Agent Group Multicast Address: 224.2.3.4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group Multicast Port: 12345
Multicast TTL: 5
** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):
~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.
vmk1 VSAN IPv4 172.200.200.207 255.255.255.0 172.200.200.255 00:50:56:68:00:fb 1500 65535 true STATIC
~ # esxcfg-vswitch -l
[ ... ]
Switch Name | Num Ports Used Ports Configured Ports MTU | Uplinks | |||
vSwitch1 | 2352 | 6 | 128 | 1500 | vmnic2,vmnic3 |
PortGroup Name | VLAN ID Used Ports Uplinks | ||
VSAN | 0 | 1 | vmnic2,vmnic3 |
^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
[ ... ]
vmnic2 0000:41:00.00 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:40 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet
vmnic3 0000:41:00.01 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:42 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet
^^ these are the two physical NICs used as uplinks by the vmknic port group.
All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.
ok, i just realized a little more information on my setup might help.
i have 3 servers with 5 disks and 2 ssd's. i can vmkping the other vsan kernelports on all servers.
it seems that the vsan configuration still exists:
~ # vdq -i -H
Mappings:
DiskMapping[0]:
SSD: eui.a31ab98418a7460700247110375fb631
MD: naa.600605b008bcb1201b3529cdfc5b309e
MD: naa.600605b008bcb1201b352b3411b960ee
DiskMapping[2]:
SSD: naa.600605b008bcb1201b352b3411ba71fe
MD: naa.600605b008bcb1201b352b3411b87db4
MD: naa.600605b008bcb1201b352b3411b8a008
MD: naa.600605b008bcb1201b352b3411b9a715
but it does look like the vsan membership isn't what it should be;
~ # esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2014-11-13T21:14:37Z
Local Node UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 54985fab-d608-465f-b5eb-bf9896677959
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member UUIDs: 539a9749-dfe0-e17f-c529-1005ca9e2fa8
Sub-Cluster Membership UUID: dc136554-cf4b-4d8b-29e6-1005ca9e2fa8
where it shows only one member in Sub-Cluster Member UUIDs. i think there should be 3.
still now clue how to proceed. i still have a blank VSAN...
brugh2,
It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.
Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?
Also, we can try a couple of things.
1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*
2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).
3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**
--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:
# vmkping -s 8500 -d <destination address>
4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.
5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.
Please let me know how things go!
* The output should look something like this (from my infrastructure):
Interface
VmkNic Name: vmk1
IP Protocol: IPv4
Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604
Agent Group Multicast Address: 224.2.3.4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group Multicast Port: 12345
Multicast TTL: 5
** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):
~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.
vmk1 VSAN IPv4 172.200.200.207 255.255.255.0 172.200.200.255 00:50:56:68:00:fb 1500 65535 true STATIC
~ # esxcfg-vswitch -l
[ ... ]
Switch Name | Num Ports Used Ports Configured Ports MTU | Uplinks | |||
vSwitch1 | 2352 | 6 | 128 | 1500 | vmnic2,vmnic3 |
PortGroup Name | VLAN ID Used Ports Uplinks | ||
VSAN | 0 | 1 | vmnic2,vmnic3 |
^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
[ ... ]
vmnic2 0000:41:00.00 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:40 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet
vmnic3 0000:41:00.01 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:42 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet
^^ these are the two physical NICs used as uplinks by the vmknic port group.
All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.
Hi jetaylor
i'm not using jumbo frames. i tried vmkping on all hosts with the max framesize of 1472 and they all ping eachother perfectly.
as for other network settings, the switch settings are the same as before and vsan worked fine so i'm assuming that switch config is still good.
i did a reboot of all hosts 15 minutes apart in the meantime. on 2 of the 3 hosts i got part of the vsan cluster back. 2 now say that the cluster consists of 2 hosts and 1 still things it's alone.
that also means some of the VMs are back, just the objects that i think were owned by the one host are empty. weird thing is, if i do a 'du -shm' of the empty directories it says there's data in it but 'osfs-ls' comes up empty.
i'm hesitant to remove the 1 host from the cluster from commandline and putting it back it.
If they are pinging normally, we are probably in a weird state with regard to multicast. Did you explicitly configure multicast/IGMP on your physical switch, or did it just "find a way?"
If we are back up (even if degraded), that is a great thing, but we definitely want to get host 1 back into the mix.
I am also disinclined to do an esxcli vsan cluster leave/esxcli vsan cluster join right now, too. That shouldn't necessarily resolve the problem, either, since it appears to be a communication issue.
All VSAN data movement occurs as unicast traffic but the clustering communication and quorum-maintenance are multicast. If multicast isn't working, we end up partitioned even if unicast is working (which it clearly is).
If no physical-switch configuration was done and it all "just worked" when it was spun up, then it is probably either automatically handling multicast or it is converting the traffic to broadcast.
If we didn't do any switch config, we can try powering down (not rebooting) the node 15 minutes or so so all information about the host decays out of the switch (MAC tables clear up, etc.). When we power back on it, it will hopefully repopulate everything as expecting and come back up.
I am loathe to reboot the physical switch, etc. since we do have our VMs back online and we don't want to risk taking everything down again right now, since we are back to running.
here's the output from the configuration of the 1 server that thinks it's alone:
~ # esxcli vsan network list
Interface
VmkNic Name: vmk4
IP Protocol: IPv4
Interface UUID: 02b9e053-e4e8-0216-9412-1005ca9e2fa8
Agent Group Multicast Address: 224.2.3.4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group Multicast Port: 12345
Multicast TTL: 5
~ # esxcfg-vmknic -l |grep vmk4
vmk4 12 IPv4 192.168.1.11 255.255.255.0 192.168.1.255 00:50:56:7d:a1:db 1500 65535 true STATIC
vmk4 12 IPv6 fe80::250:56ff:fe7d:a1db 64 00:50:56:7d:a1:db 1500 65535 true STATIC, PREFERRED
DVS Name Num Ports Used Ports Configured Ports MTU Uplinks
DSwitch 5632 8 512 1500 vmnic5,vmnic4
DVPort ID In Use Client
576 1 vmnic4
577 1 vmnic5
16 1 vmk0
0 1 vmk1
12 1 vmk4
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:02:00.00 igb Up 1000Mbps Full 10:05:ca:9e:2f:a8 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:02:00.01 igb Down 0Mbps Half 10:05:ca:9e:2f:a9 1500 Intel Corporation I350 Gigabit Network Connection
vmnic2 0000:02:00.02 igb Down 0Mbps Half 10:05:ca:9e:2f:aa 1500 Intel Corporation I350 Gigabit Network Connection
vmnic3 0000:02:00.03 igb Down 0Mbps Half 10:05:ca:9e:2f:ab 1500 Intel Corporation I350 Gigabit Network Connection
vmnic4 0000:88:00.00 enic Up 10000Mbps Full 10:05:ca:a8:c7:b8 1500 Cisco Systems Inc Cisco VIC Ethernet NIC
vmnic5 0000:89:00.00 enic Up 10000Mbps Full 10:05:ca:a8:c7:b9 1500 Cisco Systems Inc Cisco VIC Ethernet NIC
Are you possibly using LACP or a static etherchannel to handle the traffic?
i'm shutting down the 1 host and will boot it back up in 15min, see if that helps. the switches are configured to work with vsan btw. i have a networking guy that put in the right config in (and saved it ). will have him check tomorrow just to be sure.
i wonder though, if the 1 host was just alone i think i would see some files on the vsan datastore, even if it thinks it's isolated. but the vsanDatastore directory is completely empty on that one host. the other 2 have some files but not all of them. some VM directories are missing and some VM directories that are there, are empty. hopefully the vms will come back if the cluster config gets sorted out. would hate to loose some of the VMs on there..
not to my knowledge, the network was kept as flat as possible.
btw, i havent been able to boot any VMs yet. when i try, i get an error and the log shows:
VSAN: VsanIoctlCtrlNode:1746: aec35854-5321-3c01-1eea-1005ca9e2fa8: RPC to DOM returned: No connection
sounds pretty bad. the 2 nodes are not happy yet either.
The NO_CONNECTION is likely due to us not having (or have lost) quorum on a per-object basis.
Are the two nodes that converged still converged (e.g., do they show two members when you run the esxcli vsan cluster get)?
In addition, can you please run the following command and paste up the output?
# cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT
brugh2,
Please also check your PM.
the 2 nodes show 2 members when i type vsan cluster get and they both show the same 2 UUIDs show that looks ok.
as for the cmmds, there's a lot of output, no idea if that's complete or not. the owner is one of the 2 hosts.
~ # cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT
{
"entries":
[
{
"uuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8",
"owner": "539a94ed-a020-3482-534b-1005ca9d8466",
"health": "Healthy",
"revision": "15",
"type": "DOM_OBJECT",
"flag": "2",
"md5sum": "075b35a62336396933cfd22fc2a18470",
"valueLen": "2616",
"content": {"type": "Configuration", "attributes": {"CSN": 41, "addressSpace": 274877906944, "compositeUuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8"}, "child-1": {"type": "RAID_1", "attributes": {}, "child-1": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-9e6c-ffdb-4ec6-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-b83e-02dc-173f-1005ca9e2fa8", "diskUuid": "52c54a3f-9c2c-8c6d-f46d-610a7998993a"}}, "child-2": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 5, "componentStateTS": 1415373588, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-1807-88d5-a8d1-1005ca9e2fa8", "diskUuid": "52afb34e-d1e6-d142-1c71-7098718817e6"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415906710, "staleLsn": 9162486, "staleCsn": 40, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-bb12-8bd5-1a8a-1005ca9e2fa8", "diskUuid": "521b3674-ae89-6355-70c7-cd5ef9d9d014"}}}, "child-2": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415915399, "staleLsn": 0, "staleCsn": 0, "isWitness": 1, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-2be7-8ed5-edb2-1005ca9e2fa8", "diskUuid": "5242203b-d4c3-e0e0-b840-ae94aa12707c"}, "child-3": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415371419, "isWitness": 1, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-2e29-90d5-8402-1005ca9e2fa8", "diskUuid": "526674f8-c398-1ffc-b0f3-eea569b05ab2"}, "child-4": {"type": "Witness", "attributes": {"componentState": 6, "componentStateTS": 1415910583, "isWitness": 1, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "9bda5c54-af88-91d5-1630-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}},
"errorStr": "(null)"
}
]
}
Okay, so that output indicates that we have missing components on the following:
Host 539a9749-dfe0-e17f-c529-1005ca9e2fa8 --> The host from your initial post.
Disk: 5268d4b5-d67b-4962-f3a8-26542a6c0558
Disk: 52c54a3f-9c2c-8c6d-f46d-610a7998993a
Host: 539a9751-676c-b12c-db63-1005ca9decca
Disk: 521b3674-ae89-6355-70c7-cd5ef9d9d014
(at least for this one object). We don't have a complete mirror and thus this object (among others, presumably) is offline.
It is possible that during the blackout one host went down slightly later and thus has more-recent data, so other things are being held offline. We will need to look more closely to find out more.
As we discussed offline, I will try to follow up with you tomorrow.
'held offline' sounds like 'not lost' which would be wonderful. http://kb.vmware.com/kb/2059091 and http://kb.vmware.com/kb/1012864 to me suggest that when i add it back to the cluster (probably remove it from it's own cluster remnant first), it would start syncing things and eventually all data would be back online. i will do that as a last resort but hopefully get some troubleshooting done first to make sure that such an action doesnt actually destroy any data that may still be there.
i get an access denied trying to send PMs. but i'll try the commands to verify multicast connectivity. wont be available before 14:30 ET but will keep you posted on updates!
turns out that the switch port where the nic of one of the hosts was on, got lost from the igmp group. that meant multicast didnt get through and the host wouldnt rejoin the cluster. unfortunatly it was the last one to go down and had the newest writes. so only after it joined the cluster (by disabling the faulty uplink) were things starting to sync and after some reboots the whole cluster was back up and running. seems vsan can handle this kind of abuse but sometimes you need to help it a bit
thank you jeffrey for your time and it was very nice to see how you went about and fixed it and that indeed it was fixable. not only the cluster but my confidence in vsan is all restored now thanks!