VMware Cloud Community
brugh2
Contributor
Contributor
Jump to solution

HELP! VSAN is empty after black out!

we had an outage and all ESX servers rebooted. after they came back the /vmfs/volumes/vsanDatastore was empty. all our VMs were on VSAN, including vcenter.

how can i recover my VM's?!? My entire infrastructure is down.

Tags (2)
1 Solution

Accepted Solutions
jetaylor
VMware Employee
VMware Employee
Jump to solution

brugh2,

It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.

Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?

Also, we can try a couple of things.

1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*

2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).

3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**

--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:

     # vmkping -s 8500 -d <destination address>

4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.

5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.

Please let me know how things go!

* The output should look something like this (from my infrastructure):

Interface

   VmkNic Name: vmk1

   IP Protocol: IPv4

   Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604

   Agent Group Multicast Address: 224.2.3.4

   Agent Group Multicast Port: 23451

   Master Group Multicast Address: 224.1.2.3

   Master Group Multicast Port: 12345

   Multicast TTL: 5

** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):

~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.

vmk1       VSAN                IPv4      172.200.200.207                         255.255.255.0   172.200.200.255 00:50:56:68:00:fb 1500    65535     true    STATIC

~ # esxcfg-vswitch -l

[ ... ]

Switch Name  Num Ports   Used Ports  Configured Ports  MTU Uplinks
vSwitch1     2352    6       128           1500vmnic2,vmnic3

  PortGroup Name    VLAN ID  Used Ports  Uplinks
  VSAN              0    1       vmnic2,vmnic3

^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

[ ... ]

vmnic2  0000:41:00.00 bnx2x       Up   10000Mbps Full   00:10:18:f1:b8:40 1500   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

vmnic3  0000:41:00.01 bnx2x       Up   10000Mbps Full   00:10:18:f1:b8:42 1500   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

^^ these are the two physical NICs used as uplinks by the vmknic port group.

All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.

View solution in original post

Reply
0 Kudos
16 Replies
brugh2
Contributor
Contributor
Jump to solution

ok, i just realized a little more information on my setup might help.

i have 3 servers with 5 disks and 2 ssd's. i can vmkping the other vsan kernelports on all servers.

it seems that the vsan configuration still exists:

~ # vdq -i -H

Mappings:

   DiskMapping[0]:

           SSD:  eui.a31ab98418a7460700247110375fb631

            MD:  naa.600605b008bcb1201b3529cdfc5b309e

            MD:  naa.600605b008bcb1201b352b3411b960ee

   DiskMapping[2]:

           SSD:  naa.600605b008bcb1201b352b3411ba71fe

            MD:  naa.600605b008bcb1201b352b3411b87db4

            MD:  naa.600605b008bcb1201b352b3411b8a008

            MD:  naa.600605b008bcb1201b352b3411b9a715

but it does look like the vsan membership isn't what it should be;

~ # esxcli vsan  cluster  get

Cluster Information

   Enabled: true

   Current Local Time: 2014-11-13T21:14:37Z

   Local Node UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 54985fab-d608-465f-b5eb-bf9896677959

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member UUIDs: 539a9749-dfe0-e17f-c529-1005ca9e2fa8

   Sub-Cluster Membership UUID: dc136554-cf4b-4d8b-29e6-1005ca9e2fa8

where it shows only one member in Sub-Cluster Member UUIDs. i think there should be 3.

still now clue how to proceed. i still have a blank VSAN...

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

brugh2,

It certainly looks like a network partition has occurred. We have one node in our cluster instead of the three you indicated should be there. If this is the case on all three nodes, we won't be able to form a quorum and get production online.

Do you have a VMware Support Request filed? If you do, will you please PM me the SR number?

Also, we can try a couple of things.

1) On each host, make sure the network tagging is intact and we are associated with a vmknic:
# esxcli vsan network list*

2) If the network is still tagged properly (it should be), try to ping each VSAN node from each VSAN node (e.g., ping your partner machines).

3) If the ping works, determine if we are using jumbo frames. If we are, ensure that that jumbo frames are configured completely (vmknic, vswitch, physical NIC, physical switch).**

--> If jumbo frames are in use, send a large frame ping without permitting fragmentation:

     # vmkping -s 8500 -d <destination address>

4) If the jumbo frames (if applicable) do NOT work, fix the MTU in the physical switch or drop your vmknics back down to 1500 MTU.

5) If everything at the transport level checks out, we very-likely have a multicast problem. Validate your IGMP groups/snooping/couriers/etc. on the physical switch to ensure that multicast is being handled properly.

Please let me know how things go!

* The output should look something like this (from my infrastructure):

Interface

   VmkNic Name: vmk1

   IP Protocol: IPv4

   Interface UUID: 9ebf0854-3a78-734f-b15e-90b11c2b6604

   Agent Group Multicast Address: 224.2.3.4

   Agent Group Multicast Port: 23451

   Master Group Multicast Address: 224.1.2.3

   Master Group Multicast Port: 12345

   Multicast TTL: 5

** You can use the following commands to check the jumbo frame configurations (I don't use them, so my MTUs are all 1500):

~ # esxcfg-vmknic -l |grep vmk1 <== I am examining vmk1 because that is the interface we got from esxcli.

vmk1       VSAN                IPv4      172.200.200.207                         255.255.255.0   172.200.200.255 00:50:56:68:00:fb 1500    65535     true    STATIC

~ # esxcfg-vswitch -l

[ ... ]

Switch Name  Num Ports   Used Ports  Configured Ports  MTU Uplinks
vSwitch1     2352    6       128           1500vmnic2,vmnic3

  PortGroup Name    VLAN ID  Used Ports  Uplinks
  VSAN              0    1       vmnic2,vmnic3

^^ the vmknic is called "VSAN," as is the port-group name. If you a distributed vSwitch, you will need to look for the port number instead of a portgroup name (the number will still be in the esxcfg-vmknic -l output).

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

[ ... ]

vmnic2  0000:41:00.00 bnx2x       Up   10000Mbps Full   00:10:18:f1:b8:40 1500   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

vmnic3  0000:41:00.01 bnx2x       Up   10000Mbps Full   00:10:18:f1:b8:42 1500   Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet

^^ these are the two physical NICs used as uplinks by the vmknic port group.

All of the MTUs above are 1500 bytes. If you are using jumbo frames, they should all be 9000 bytes.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

Hi jetaylor

i'm not using jumbo frames. i tried vmkping on all hosts with the max framesize of 1472 and they all ping eachother perfectly.

as for other network settings, the switch settings are the same as before and vsan worked fine so i'm assuming that switch config is still good.

i did a reboot of all hosts 15 minutes apart in the meantime. on 2 of the 3 hosts i got part of the vsan cluster back. 2 now say that the cluster consists of 2 hosts and 1 still things it's alone.

that also means some of the VMs are back, just the objects that i think were owned by the one host are empty. weird thing is, if i do a 'du -shm' of the empty directories it says there's data in it but 'osfs-ls' comes up empty.

i'm hesitant to remove the 1 host from the cluster from commandline and putting it back it.

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

If they are pinging normally, we are probably in a weird state with regard to multicast. Did you explicitly configure multicast/IGMP on your physical switch, or did it just "find a way?"

If we are back up (even if degraded), that is a great thing, but we definitely want to get host 1 back into the mix.

I am also disinclined to do an esxcli vsan cluster leave/esxcli vsan cluster join right now, too. That shouldn't necessarily resolve the problem, either, since it appears to be a communication issue.

All VSAN data movement occurs as unicast traffic but the clustering communication and quorum-maintenance are multicast. If multicast isn't working, we end up partitioned even if unicast is working (which it clearly is).

If no physical-switch configuration was done and it all "just worked" when it was spun up, then it is probably either automatically handling multicast or it is converting the traffic to broadcast.

If we didn't do any switch config, we can try powering down (not rebooting) the node 15 minutes or so so all information about the host decays out of the switch (MAC tables clear up, etc.). When we power back on it, it will hopefully repopulate everything as expecting and come back up.

I am loathe to reboot the physical switch, etc. since we do have our VMs back online and we don't want to risk taking everything down again right now, since we are back to running.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

here's the output from the configuration of the 1 server that thinks it's alone:

~ # esxcli vsan network list

Interface

   VmkNic Name: vmk4

   IP Protocol: IPv4

   Interface UUID: 02b9e053-e4e8-0216-9412-1005ca9e2fa8

   Agent Group Multicast Address: 224.2.3.4

   Agent Group Multicast Port: 23451

   Master Group Multicast Address: 224.1.2.3

   Master Group Multicast Port: 12345

   Multicast TTL: 5

~ # esxcfg-vmknic -l |grep vmk4

vmk4       12                                      IPv4      192.168.1.11                            255.255.255.0   192.168.1.255   00:50:56:7d:a1:db 1500    65535     true    STATIC

vmk4       12                                      IPv6      fe80::250:56ff:fe7d:a1db                64                              00:50:56:7d:a1:db 1500    65535     true    STATIC, PREFERRED

DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks

DSwitch          5632        8           512               1500    vmnic5,vmnic4

  DVPort ID           In Use      Client

  576                 1           vmnic4

  577                 1           vmnic5

  16                  1           vmk0

  0                   1           vmk1

  12                  1           vmk4

~ # esxcfg-nics -l

Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description

vmnic0  0000:02:00.00 igb         Up   1000Mbps  Full   10:05:ca:9e:2f:a8 1500   Intel Corporation I350 Gigabit Network Connection

vmnic1  0000:02:00.01 igb         Down 0Mbps     Half   10:05:ca:9e:2f:a9 1500   Intel Corporation I350 Gigabit Network Connection

vmnic2  0000:02:00.02 igb         Down 0Mbps     Half   10:05:ca:9e:2f:aa 1500   Intel Corporation I350 Gigabit Network Connection

vmnic3  0000:02:00.03 igb         Down 0Mbps     Half   10:05:ca:9e:2f:ab 1500   Intel Corporation I350 Gigabit Network Connection

vmnic4  0000:88:00.00 enic        Up   10000Mbps Full   10:05:ca:a8:c7:b8 1500   Cisco Systems Inc Cisco VIC Ethernet NIC

vmnic5  0000:89:00.00 enic        Up   10000Mbps Full   10:05:ca:a8:c7:b9 1500   Cisco Systems Inc Cisco VIC Ethernet NIC

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

Are you possibly using LACP or a static etherchannel to handle the traffic?

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

i'm shutting down the 1 host and will boot it back up in 15min, see if that helps. the switches are configured to work with vsan btw. i have a networking guy that put in the right config in (and saved it Smiley Happy). will have him check tomorrow just to be sure.

i wonder though, if the 1 host was just alone i think i would see some files on the vsan datastore, even if it thinks it's isolated. but the vsanDatastore directory is completely empty on that one host. the other 2 have some files but not all of them. some VM directories are missing and some VM directories that are there, are empty. hopefully the vms will come back if the cluster config gets sorted out. would hate to loose some of the VMs on there..

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

not to my knowledge, the network was kept as flat as possible.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

btw, i havent been able to boot any VMs yet. when i try, i get an error and the log shows:

VSAN: VsanIoctlCtrlNode:1746: aec35854-5321-3c01-1eea-1005ca9e2fa8: RPC to DOM returned: No connection

sounds pretty bad. the 2 nodes are not happy yet either.

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

The NO_CONNECTION is likely due to us not having (or have lost) quorum on a per-object basis.

Are the two nodes that converged still converged (e.g., do they show two members when you run the esxcli vsan cluster get)?

In addition, can you please run the following command and paste up the output?

# cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

brugh2,

Please also check your PM.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

the 2 nodes show 2 members when i type vsan cluster get and they both show the same 2 UUIDs show that looks ok.

as for the cmmds, there's a lot of output, no idea if that's complete or not. the owner is one of the 2 hosts.

~ #  cmmds-tool find -u aec35854-5321-3c01-1eea-1005ca9e2fa8 -f json -t DOM_OBJECT

{

"entries":

[

{

   "uuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8",

   "owner": "539a94ed-a020-3482-534b-1005ca9d8466",

   "health": "Healthy",

   "revision": "15",

   "type": "DOM_OBJECT",

   "flag": "2",

   "md5sum": "075b35a62336396933cfd22fc2a18470",

   "valueLen": "2616",

   "content": {"type": "Configuration", "attributes": {"CSN": 41, "addressSpace": 274877906944, "compositeUuid": "aec35854-5321-3c01-1eea-1005ca9e2fa8"}, "child-1": {"type": "RAID_1", "attributes": {}, "child-1": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-9e6c-ffdb-4ec6-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415910583, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "d1c25c54-b83e-02dc-173f-1005ca9e2fa8", "diskUuid": "52c54a3f-9c2c-8c6d-f46d-610a7998993a"}}, "child-2": {"type": "RAID_0", "attributes": {"stripeBlockSize": 1048576}, "child-1": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 5, "componentStateTS": 1415373588, "staleLsn": 0, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-1807-88d5-a8d1-1005ca9e2fa8", "diskUuid": "52afb34e-d1e6-d142-1c71-7098718817e6"}, "child-2": {"type": "Component", "attributes": {"addressSpace": 137438953472, "componentState": 6, "componentStateTS": 1415906710, "staleLsn": 9162486, "staleCsn": 40, "bytesToSync": 0, "recoveryETA": 0, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-bb12-8bd5-1a8a-1005ca9e2fa8", "diskUuid": "521b3674-ae89-6355-70c7-cd5ef9d9d014"}}}, "child-2": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415915399, "staleLsn": 0, "staleCsn": 0, "isWitness": 1, "faultDomainId": "539a9751-676c-b12c-db63-1005ca9decca"}, "componentUuid": "9bda5c54-2be7-8ed5-edb2-1005ca9e2fa8", "diskUuid": "5242203b-d4c3-e0e0-b840-ae94aa12707c"}, "child-3": {"type": "Witness", "attributes": {"componentState": 5, "componentStateTS": 1415371419, "isWitness": 1, "faultDomainId": "539a94ed-a020-3482-534b-1005ca9d8466"}, "componentUuid": "9bda5c54-2e29-90d5-8402-1005ca9e2fa8", "diskUuid": "526674f8-c398-1ffc-b0f3-eea569b05ab2"}, "child-4": {"type": "Witness", "attributes": {"componentState": 6, "componentStateTS": 1415910583, "isWitness": 1, "faultDomainId": "539a9749-dfe0-e17f-c529-1005ca9e2fa8"}, "componentUuid": "9bda5c54-af88-91d5-1630-1005ca9e2fa8", "diskUuid": "5268d4b5-d67b-4962-f3a8-26542a6c0558"}},

   "errorStr": "(null)"

}

]

}

Reply
0 Kudos
jetaylor
VMware Employee
VMware Employee
Jump to solution

Okay, so that output indicates that we have missing components on the following:

Host 539a9749-dfe0-e17f-c529-1005ca9e2fa8 --> The host from your initial post.

     Disk: 5268d4b5-d67b-4962-f3a8-26542a6c0558

     Disk: 52c54a3f-9c2c-8c6d-f46d-610a7998993a

Host: 539a9751-676c-b12c-db63-1005ca9decca

     Disk: 521b3674-ae89-6355-70c7-cd5ef9d9d014

(at least for this one object). We don't have a complete mirror and thus this object (among others, presumably) is offline.

It is possible that during the blackout one host went down slightly later and thus has more-recent data, so other things are being held offline. We will need to look more closely to find out more.

As we discussed offline, I will try to follow up with you tomorrow.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

'held offline' sounds like 'not lost' which would be wonderful. http://kb.vmware.com/kb/2059091 and http://kb.vmware.com/kb/1012864 to me suggest that when i add it back to the cluster (probably remove it from it's own cluster remnant first), it would start syncing things and eventually all data would be back online. i will do that as a last resort but hopefully get some troubleshooting done first to make sure that such an action doesnt actually destroy any data that may still be there.

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

i get an access denied trying to send PMs. but i'll try the commands to verify multicast connectivity. wont be available before 14:30 ET but will keep you posted on updates!

Reply
0 Kudos
brugh2
Contributor
Contributor
Jump to solution

turns out that the switch port where the nic of one of the hosts was on, got lost from the igmp group. that meant multicast didnt get through and the host wouldnt rejoin the cluster. unfortunatly it was the last one to go down and had the newest writes. so only after it joined the cluster (by disabling the faulty uplink) were things starting to sync and after some reboots the whole cluster was back up and running. seems vsan can handle this kind of abuse but sometimes you need to help it a bit Smiley Wink

thank you jeffrey for your time and it was very nice to see how you went about and fixed it and that indeed it was fixable. not only the cluster but my confidence in vsan is all restored now Smiley Happy thanks!

Reply
0 Kudos