Hi,
I'll detail the design, I'd like to know if the result is expected or if I've designed something wrong!
ToR-1 and ToR-2 have a 20GB LAG between them and I'm running VRRP to move the routing engine back and forth, I can restart either switch and the SAN fails over along with the ISP that's linked into both switches. No network outages at this level.
Leaf-1 and Leaf-2 in fabric A1 and A2 of a blade chassis have a 20GB LAG between them and then a 20GB LAG from Leaf-1 to ToR-1 and Leaf-2 to ToR-2, reloading Leaf-2 and ToR-2 and ToR-1 I have no outages but reloading Leaf-1 I have an outage of about 1 minute, I think that's the duration of the switch reloading and becoming available again.
The blades have a single 10GB dual port NDC 1 port goes to Leaf-1 and the other to Leaf-2
The dvSwitch/Portgroups are configured as such with 'Route based on originating virtual port' Failback set to Yes
vSphere Management
Active (vmnic0)/Standby (vmnic1)
vMotion
Active (vmnic1)/Standby (vmnic0)
Storage_1
Active (vmnic0)/Unused (vmnic1)
Storage_2
Active (vmnic1)/Unused (vmnic0)
VM Network(s)
Active (vmnic0)/Active (vmnic1)
Utilizing vSphere 6.7 latest SP and Patch
The hope is to not see any outage and everything just keeps humming along but I feel I've missed something at the dvSwitch level that is not allowing the NICs to route traffic out of the remaining online switches, if I power down Leaf-1 traffic does route out of Leaf-2 but not without an outage.