VMware Cloud Community
atoerper
Enthusiast
Enthusiast

VM Failure during Switch Reboot

We have a 2-node cluster setup  at a remote facility. Each node has 2x10Gb ports used for all traffic(VMKernel, VM, Mgmt, vMotion). Port A on each host goes to Switch A. Port B on each host goes to Switch B. The switches were rebooted 1 at a time because of switch code upgrades. During the process, all of the VMs were powered off for no apparent reason.

The port group for VSAN is using 2 NICs both set as active using the Originating Virtual Port LB policy and Fallback is set to No.

Any ideas what happened here? 

Reply
0 Kudos
4 Replies
SureshKumarMuth
Commander
Commander

Did you get a chance to check the status of host and VMs after rebooting the first switch ? Are you able to bring the VMs up and are they running fine now ?What does the log show ? Are you seeing any HA related events recorded?

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
atoerper
Enthusiast
Enthusiast

Here is a snapshot of some of the events before one of the VMs was powered off. The events are listed in newest to oldest order.

VMNAME is powered off

Configuration file for VMNAME cannot be found
Renamed VMNAMEfrom VMNAME to /vmfs/volumes/vsan:525cb476734a77d1-9e6dd64dfe3083b0/afa4c559-5e11-d9fe-b8ff-1402ec953578/DKGMPTL1.vmx
Renamed VMNAME from VMNAME  to /vmfs/volumes/vsan:525cb476734a77d1-9e6dd64dfe3083b0/a697c559-a8a4-1e7c-34b7-1402ec953578/DKGM0600.vmx
Configuration file for VMNAME cannot be found
Renamed VMNAME  from VMNAME to /vmfs/volumes/vsan:525cb476734a77d1-9e6dd64dfe3083b0/008dd359-c045-560f-6afa-1402ec953578/MSDC-P-G01.vmx
User root@127.0.0.1 logged out (login time: Tue Oct 10 06:58:54 EDT 2017, number of API invocations: 0, user agent: )
Host cannot communicate with one or more other nodes in the vSAN enabled cluster
Alarm 'Network uplink redundancy lost' on hostname changed from Green to Red
Alarm 'Network uplink redundancy lost' on hostname triggered an action
Alarm 'Network uplink redundancy lost': an SNMP trap for entity hostname was sent
vSphere HA agent is healthy
The vSphere HA availability state of this host has changed to Master
Task: Update vSAN configuration
The vSphere HA availability state of this host has changed to Election
The vSphere HA availability state of this host has changed to Unreachable
User root@127.0.0.1 logged in as
Device or filesystem with identifier bbdaaffe-b043b2ff has entered the All Paths Down state.
Lost uplink redundancy on DVPorts: 23/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "15/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "7/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "31/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "24/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "28/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "27/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69", "29/f1 ee 0c 50 9b d5 db ab-15 5d 33 4d 5f fa ab 69". Physical NIC vmnic4 is down."
Lost access to volume 59aef088-2dd9c2d1-a533-1402ec953578 (87f0ae59-8b60-0a7c-b1a4-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Lost access to volume 59b2f295-f868c64a-f1aa-1402ec953578 (94f2b259-791b-f447-e916-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Lost access to volume 59c597a6-2d735090-25ce-1402ec953578 (a697c559-a8a4-1e7c-34b7-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Lost access to volume 59c597a6-509b8aa4-3133-1402ec953578 (a697c559-9262-3aa0-d0ba-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Lost access to volume 59c5a4af-addb738a-32c7-1402ec953578 (afa4c559-5e11-d9fe-b8ff-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Lost access to volume 59d38d00-b7a4d91c-c8a7-1402ec953578 (008dd359-c045-560f-6afa-1402ec953578) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Reply
0 Kudos
SureshKumarMuth
Commander
Commander

Host cannot communicate with one or more other nodes in the vSAN enabled cluster

There was some connectivity issues during the switch reboot.

In normal scenario are you able to see if the traffic is passing through both the nics ? And is the issue happened only on few VMs or all the VMs in the VSAN cluster.

Take putty session (ssh) to the host and run esxtop and press 'n' to see network page. Check if you are able to see the traffic passes through both the vmnics.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
atoerper
Enthusiast
Enthusiast

Traffic is passing through both NICs

pastedImage_2.png

Reply
0 Kudos