I have a problem creating resiliency for vMotion traffic on vSphere 5.5 installed on Hitachi blades.
Goal is: when one of the blade switches is down for maintenance or just broken, vMotion should remain operational.
Multil-nic vMotion has been configured and both adapters on source and target server do have vMotion traffic processed.
Yet when I disable either one of the vMotion nics in ESXi or disable the port in the switch (to which one of two vMotion nics is connected), vMotion will not work anymore. So the other adapter is not used for resiliency.
Configuration:
multiple-nic vMotion is configured like described in the VMware KB here
In my case:
vmnic6 is connected to portgroup vMotion-1 with an ip-address of 10.20.120.51 /24
vmnic6 is active adapter and connected to switch0. Switch0 is a switch installed in the Hitachi blade enclosure
vmnic1 is set as standby adapter
vlan-id is set at 1941
vmnic1 is connected to portgroup vMotion-2 with an ip-address of 10.20.120.52 /24
vmnic1 is active adapter and connected to switch1.
vmnic6 is set as standby adapter.
vlan-id is set 1941
When performing a vMotion both vmnic6 and vmnic1 are used. This is confirmed by performance graphs in vCenter. vMotion with 2 nics is 30% faster than with single nic.
When vmnic6 is put to 'down' state in ESXi , vMotion is not possible. The process hangs at 14%. The error says 10.20.120.51 cannot be reached.
Initially I tried another configuration:
single portgroup named vMotion. vmnic6 set as active adapter. vmnic1 set as standby adapter. When vmnic6 was disabled, vMotion did not work anymore. So no resiliency here either.
Any help is very much appreciated!
do you remove the vmnic while vmotion is running?
No. Procedure was:
1. perform vMotion on multiple nics. Worked perfectly. Performance graphs in vCenter show data transfer on all nics.
vMotion succesfull.
2. Disable nic in the source ESXi host using esxcli network nic down -n vmnicX
3. Start vMotion
4. Hangs at 14% . Error 'the ESX hosts failed to connect over vMotion network'
The same happens when the switchport is disabled/shutdown. This is done before the vMotion starts.
allirght, now I unterstand.
How does your vswitch0 Nic teaming configuration looks like? can you take a screen of it?
Problem solved.
I added the vlan-id used by vMotion to the uplink switch ports of both blade enclosure switches. The vlan-id was also added to the core switches.
After that, failover works.
I am not sure why this needed to be done.
Before the vlan id was added to uplink trunk blade-switch->core switch:
ping vmnic6 /host1 to vmnic6/host2 was possible
ping vmnic1/host1 to vmnic1/host2 was possible
disable vmnic6
ping vmnic1/host1 to vmnic1/host2 was still possible.
However vMotion failed at 14%