Solved: Re: HELP! VSAN is empty after black out!

brugh2 · ‎11-13-2014

we had an outage and all ESX servers rebooted. after they came back the /vmfs/volumes/vsanDatastore was empty. all our VMs were on VSAN, including vcenter.

how can i recover my VM's?!? My entire infrastructure is down.

jetaylor · ‎11-13-2014

brugh2,

It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.

Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?

Also, we can try a couple of things.

1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*

2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).

3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**

--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:

# vmkping -s 8500 -d <destination address>

4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.

5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.

Please let me know how things go!

* The output should look something like this (from my infrastructure):

Interface

VmkNic Name: vmk1

IP Protocol: IPv4

Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604

Agent Group Multicast Address: 224.2.3.4

Agent Group Multicast Port: 23451

Master Group Multicast Address: 224.1.2.3

Master Group Multicast Port: 12345

Multicast TTL: 5

** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):

~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.

vmk1 VSAN IPv4 172.200.200.207 255.255.255.0 172.200.200.255 00:50:56:68:00:fb 1500 65535 true STATIC

~ # esxcfg-vswitch -l

[ ... ]

Switch Name	Num Ports Used Ports Configured Ports MTU	Uplinks
vSwitch1	2352	6	128	1500	vmnic2,vmnic3

PortGroup Name	VLAN ID Used Ports Uplinks
VSAN	0	1	vmnic2,vmnic3

^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).

~ # esxcfg-nics -l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

[ ... ]

vmnic2 0000:41:00.00 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:40 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

vmnic3 0000:41:00.01 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:42 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

^^ these are the two physical NICs used as uplinks by the vmknic port group.

All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.

View solution in original post

brugh2 · ‎11-13-2014

ok, i just realized a little more information on my setup might help.

i have 3 servers with 5 disks and 2 ssd's. i can vmkping the other vsan kernelports on all servers.

it seems that the vsan configuration still exists:

~ # vdq -i -H

Mappings:

DiskMapping[0]:

SSD: eui.a31ab98418a7460700247110375fb631

MD: naa.600605b008bcb1201b3529cdfc5b309e

MD: naa.600605b008bcb1201b352b3411b960ee

DiskMapping[2]:

SSD: naa.600605b008bcb1201b352b3411ba71fe

MD: naa.600605b008bcb1201b352b3411b87db4

MD: naa.600605b008bcb1201b352b3411b8a008

MD: naa.600605b008bcb1201b352b3411b9a715

but it does look like the vsan membership isn't what it should be;

~ # esxcli vsan cluster get

Cluster Information

Enabled: true

Current Local Time: 2014-11-13T21:14:37Z

Local Node UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

Local Node State: MASTER

Local Node Health State: HEALTHY

Sub-Cluster Master UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

Sub-Cluster Backup UUID:

Sub-Cluster UUID: 54985fab-d608-465f-b5eb-bf9896677959

Sub-Cluster Membership Entry Revision: 0

Sub-Cluster Member UUIDs: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

Sub-Cluster Membership UUID: dc136554-cf4b-4d8b-29e6-1005ca9e2fa8

where it shows only one member in Sub-Cluster Member UUIDs. i think there should be 3.

still now clue how to proceed. i still have a blank VSAN...

jetaylor · ‎11-13-2014

brugh2,

It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.

Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?

Also, we can try a couple of things.

1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*

2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).

3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**

--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:

# vmkping -s 8500 -d <destination address>

4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.

5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.

Please let me know how things go!

* The output should look something like this (from my infrastructure):

Interface

VmkNic Name: vmk1

IP Protocol: IPv4

Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604

Agent Group Multicast Address: 224.2.3.4

Agent Group Multicast Port: 23451

Master Group Multicast Address: 224.1.2.3

Master Group Multicast Port: 12345

Multicast TTL: 5

** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):

~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.

vmk1 VSAN IPv4 172.200.200.207 255.255.255.0 172.200.200.255 00:50:56:68:00:fb 1500 65535 true STATIC

~ # esxcfg-vswitch -l

[ ... ]

Switch Name	Num Ports Used Ports Configured Ports MTU	Uplinks
vSwitch1	2352	6	128	1500	vmnic2,vmnic3

PortGroup Name	VLAN ID Used Ports Uplinks
VSAN	0	1	vmnic2,vmnic3

^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).

~ # esxcfg-nics -l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

[ ... ]

vmnic2 0000:41:00.00 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:40 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

vmnic3 0000:41:00.01 bnx2x Up 10000Mbps Full 00:10:18:f1:b8:42 1500 Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

^^ these are the two physical NICs used as uplinks by the vmknic port group.

All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.

brugh2 · ‎11-13-2014

Hi jetaylor

i'm not using jumbo frames. i tried vmkping on all hosts with the max framesize of 1472 and they all ping eachother perfectly.

as for other network settings, the switch settings are the same as before and vsan worked fine so i'm assuming that switch config is still good.

i did a reboot of all hosts 15 minutes apart in the meantime. on 2 of the 3 hosts i got part of the vsan cluster back. 2 now say that the cluster consists of 2 hosts and 1 still things it's alone.

that also means some of the VMs are back, just the objects that i think were owned by the one host are empty. weird thing is, if i do a 'du -shm' of the empty directories it says there's data in it but 'osfs-ls' comes up empty.

i'm hesitant to remove the 1 host from the cluster from commandline and putting it back it.

jetaylor · ‎11-13-2014

If they are pinging normally, we are probably in a weird state with regard to multicast. Did you explicitly configure multicast/IGMP on your physical switch, or did it just "find a way?"

If we are back up (even if degraded), that is a great thing, but we definitely want to get host 1 back into the mix.

I am also disinclined to do an esxcli vsan cluster leave/esxcli vsan cluster join right now, too. That shouldn't necessarily resolve the problem, either, since it appears to be a communication issue.

All VSAN data movement occurs as unicast traffic but the clustering communication and quorum-maintenance are multicast. If multicast isn't working, we end up partitioned even if unicast is working (which it clearly is).

If no physical-switch configuration was done and it all "just worked" when it was spun up, then it is probably either automatically handling multicast or it is converting the traffic to broadcast.

If we didn't do any switch config, we can try powering down (not rebooting) the node 15 minutes or so so all information about the host decays out of the switch (MAC tables clear up, etc.). When we power back on it, it will hopefully repopulate everything as expecting and come back up.

I am loathe to reboot the physical switch, etc. since we do have our VMs back online and we don't want to risk taking everything down again right now, since we are back to running.

brugh2 · ‎11-13-2014

here's the output from the configuration of the 1 server that thinks it's alone:

~ # esxcli vsan network list

Interface

VmkNic Name: vmk4

IP Protocol: IPv4

Interface UUID: 02b9e053-e4e8-0216-9412-1005ca9e2fa8

Agent Group Multicast Address: 224.2.3.4

Agent Group Multicast Port: 23451

Master Group Multicast Address: 224.1.2.3

Master Group Multicast Port: 12345

Multicast TTL: 5

~ # esxcfg-vmknic -l |grep vmk4

vmk4 12 IPv4 192.168.1.11 255.255.255.0 192.168.1.255 00:50:56:7d:a1:db 1500 65535 true STATIC

vmk4 12 IPv6 fe80::250:56ff:fe7d:a1db 64 00:50:56:7d:a1:db 1500 65535 true STATIC, PREFERRED

DVS Name Num Ports Used Ports Configured Ports MTU Uplinks

DSwitch 5632 8 512 1500 vmnic5,vmnic4

DVPort ID In Use Client

576 1 vmnic4

577 1 vmnic5

16 1 vmk0

0 1 vmk1

12 1 vmk4

~ # esxcfg-nics -l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

vmnic0 0000:02:00.00 igb Up 1000Mbps Full 10:05:ca:9e:2f:a8 1500 Intel Corporation I350 Gigabit Network Connection

vmnic1 0000:02:00.01 igb Down 0Mbps Half 10:05:ca:9e:2f:a9 1500 Intel Corporation I350 Gigabit Network Connection

vmnic2 0000:02:00.02 igb Down 0Mbps Half 10:05:ca:9e:2f:aa 1500 Intel Corporation I350 Gigabit Network Connection

vmnic3 0000:02:00.03 igb Down 0Mbps Half 10:05:ca:9e:2f:ab 1500 Intel Corporation I350 Gigabit Network Connection

vmnic4 0000:88:00.00 enic Up 10000Mbps Full 10:05:ca:a8:c7:b8 1500 Cisco Systems Inc Cisco VIC Ethernet NIC

vmnic5 0000:89:00.00 enic Up 10000Mbps Full 10:05:ca:a8:c7:b9 1500 Cisco Systems Inc Cisco VIC Ethernet NIC

jetaylor · ‎11-13-2014

Are you possibly using LACP or a static etherchannel to handle the traffic?

brugh2 · ‎11-13-2014

i'm shutting down the 1 host and will boot it back up in 15min, see if that helps. the switches are configured to work with vsan btw. i have a networking guy that put in the right config in (and saved it ). will have him check tomorrow just to be sure.

i wonder though, if the 1 host was just alone i think i would see some files on the vsan datastore, even if it thinks it's isolated. but the vsanDatastore directory is completely empty on that one host. the other 2 have some files but not all of them. some VM directories are missing and some VM directories that are there, are empty. hopefully the vms will come back if the cluster config gets sorted out. would hate to loose some of the VMs on there..

brugh2 · ‎11-13-2014

not to my knowledge, the network was kept as flat as possible.

brugh2 · ‎11-13-2014

btw, i havent been able to boot any VMs yet. when i try, i get an error and the log shows:

VSAN: VsanIoctlCtrlNode:1746: aec35854-5321-3c01-1eea-1005ca9e2fa8: RPC to DOM returned: No connection

sounds pretty bad. the 2 nodes are not happy yet either.

jetaylor · ‎11-13-2014

The NO_CONNECTION is likely due to us not having (or have lost) quorum on a per-object basis.

Are the two nodes that converged still converged (e.g., do they show two members when you run the esxcli vsan cluster get)?

In addition, can you please run the following command and paste up the output?

# cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT

jetaylor · ‎11-13-2014

brugh2,

Please also check your PM.

brugh2 · ‎11-13-2014

the 2 nodes show 2 members when i type vsan cluster get and they both show the same 2 UUIDs show that looks ok.

as for the cmmds, there's a lot of output, no idea if that's complete or not. the owner is one of the 2 hosts.

~ # cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT

{

"entries":

[

{

"uuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8",

"owner": "539a94ed-a020-3482-534b-1005ca9d8466",

"health": "Healthy",

"revision": "15",

"type": "DOM_OBJECT",

"flag": "2",

"md5sum": "075b35a62336396933cfd22fc2a18470",

"valueLen": "2616",

"content": {"type": "Configuration", "attributes": {"CSN": 41, "addressSpace": 274877906944, "compositeUuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8"}, "child-1": {"type": "RAID_1", "attributes": {}, "child-1": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-9e6c-ffdb-4ec6-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-b83e-02dc-173f-1005ca9e2fa8", "diskUuid": "52c54a3f-9c2c-8c6d-f46d-610a7998993a"}}, "child-2": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 5, "componentStateTS": 1415373588, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-1807-88d5-a8d1-1005ca9e2fa8", "diskUuid": "52afb34e-d1e6-d142-1c71-7098718817e6"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415906710, "staleLsn": 9162486, "staleCsn": 40, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-bb12-8bd5-1a8a-1005ca9e2fa8", "diskUuid": "521b3674-ae89-6355-70c7-cd5ef9d9d014"}}}, "child-2": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415915399, "staleLsn": 0, "staleCsn": 0, "isWitness": 1, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-2be7-8ed5-edb2-1005ca9e2fa8", "diskUuid": "5242203b-d4c3-e0e0-b840-ae94aa12707c"}, "child-3": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415371419, "isWitness": 1, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-2e29-90d5-8402-1005ca9e2fa8", "diskUuid": "526674f8-c398-1ffc-b0f3-eea569b05ab2"}, "child-4": {"type": "Witness", "attributes": {"componentState": 6, "componentStateTS": 1415910583, "isWitness": 1, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "9bda5c54-af88-91d5-1630-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}},

"errorStr": "(null)"

}

]

}

jetaylor · ‎11-13-2014

Okay, so that output indicates that we have missing components on the following:

Host 539a9749-dfe0-e17f-c529-1005ca9e2fa8 --> The host from your initial post.

Disk: 5268d4b5-d67b-4962-f3a8-26542a6c0558

Disk: 52c54a3f-9c2c-8c6d-f46d-610a7998993a

Host: 539a9751-676c-b12c-db63-1005ca9decca

Disk: 521b3674-ae89-6355-70c7-cd5ef9d9d014

(at least for this one object). We don't have a complete mirror and thus this object (among others, presumably) is offline.

It is possible that during the blackout one host went down slightly later and thus has more-recent data, so other things are being held offline. We will need to look more closely to find out more.

As we discussed offline, I will try to follow up with you tomorrow.

brugh2 · ‎11-14-2014

'held offline' sounds like 'not lost' which would be wonderful. http://kb.vmware.com/kb/2059091 and http://kb.vmware.com/kb/1012864 to me suggest that when i add it back to the cluster (probably remove it from it's own cluster remnant first), it would start syncing things and eventually all data would be back online. i will do that as a last resort but hopefully get some troubleshooting done first to make sure that such an action doesnt actually destroy any data that may still be there.

brugh2 · ‎11-14-2014

i get an access denied trying to send PMs. but i'll try the commands to verify multicast connectivity. wont be available before 14:30 ET but will keep you posted on updates!

brugh2 · ‎11-14-2014

turns out that the switch port where the nic of one of the hosts was on, got lost from the igmp group. that meant multicast didnt get through and the host wouldnt rejoin the cluster. unfortunatly it was the last one to go down and had the newest writes. so only after it joined the cluster (by disabling the faulty uplink) were things starting to sync and after some reboots the whole cluster was back up and running. seems vsan can handle this kind of abuse but sometimes you need to help it a bit

thank you jeffrey for your time and it was very nice to see how you went about and fixed it and that indeed it was fixable. not only the cluster but my confidence in vsan is all restored now thanks!