Hi All
Quick question about protecting virtual machines, we had an issue last week where we lost access to the production network only on one of the ESXi 5.0 nodes and a number of VM’s where offline
Quick rundown on the setup
vCenter with a 4 node ESXi 5 Cluster, with HA\DRS enabled
vSwitch0 with 2 NIC’s for management and vMotion
vSwitch1 with 2 NIC’s used for Production traffic
The question is, if you lose both NIC’s on vSwicth1 (I know this may be very rare) is there any way for vCenter or another tool to migrate the VM’s off that node and onto a good node with Production traffic?
Many Thanks and sorry if thats a silly question
Hi Jonathan,
HA or VMware high availability would only check for host failures. So if the host is not responing on the network, HA would kick in. With regards to the Virtual Machine network, as long as the host is pingable, the Virtual Machines would not be migrated off unless a DRS event has occurred (cumpute resources imbalance)
I know Beacon Probing can help on the management network, but even that will not help if the link goes down on the vSwitch 1 in your case
http://kb.vmware.com/kb/1005577
What you can do is essentially set up alarms for the vSwitch1 to fire off an event (email or alarm) if the link redundancy or failure occurs.
You could also set up a monitoring tool such as NAGIOS etc to check for Virtual Machine ping responses
Alart from that, I don't really think there's much you can do, for let's say you have a host with VM's on it and the VM's are connected to VM network on a seperate switch, even deleting the portgroup or vSwitch itself will not cause any issue to the working host
Regards
a
The question is, if you lose both NIC’s on vSwicth1 (I know this may be very rare) is there any way for vCenter or another tool to migrate the VM’s off that node and onto a good node with Production traffic
No it is not possible
Is all the 4 NIC's are onboard NIC?
Thanks guy's
Thought i was being silly and missing something
I think the best thing to do is setup the alerts for the customer on the NIC's and VM's also to maybe log a call with VMware about the NIC's going offline as it was weird, both NIC's showed as up from the config screen but you was unable to ping the default gateway from any of the VM's connected to the vSwitch, forgot to test to see if you could ping another VM on the same vSwitch also if you clicked the CDP icon this showed no info. Once the host was rebooted you could ping the gateway and see CDP
Thanks again
Interesting.
Could you share the host hardware details as well as nic details?
esxcfg-nics-l and a esxcfg-vswitch -l?
Would also help if the vm-suppoet bundle of the host could be shared across.
Regards
a
Hi
Server: Dell PowerEdge R820
ESX: ESXi 5.0.0, 721882
~ # esxcfg-nics-l -l-l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:01:00.00 tg3 Up 1000Mbps Full d4:ae:52:a5:e4:fe 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic1 0000:01:00.01 tg3 Up 1000Mbps Full d4:ae:52:a5:e4:ff 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic2 0000:02:00.00 tg3 Up 100Mbps Full d4:ae:52:a5:e5:00 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic3 0000:02:00.01 tg3 Down 0Mbps Half d4:ae:52:a5:e5:01 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet
vmnic4 0000:44:00.00 tg3 Up 1000Mbps Full 00:10:18:e5:a1:50 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet
vmnic5 0000:44:00.01 tg3 Up 1000Mbps Full 00:10:18:e5:a1:51 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet
vmnic6 0000:44:00.02 tg3 Up 100Mbps Full 00:10:18:e5:a1:52 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet
vmnic7 0000:44:00.03 tg3 Down 0Mbps Half 00:10:18:e5:a1:53 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet
~ #
~ # esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 128 5 128 1500 vmnic0,vmnic4
PortGroup Name VLAN ID Used Ports Uplinks
VMotion 0 1 vmnic0,vmnic4
Management Network 0 1 vmnic0,vmnic4
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 128 8 128 1500 vmnic1,vmnic5
PortGroup Name VLAN ID Used Ports Uplinks
Production 0 5 vmnic1,vmnic5
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch2 128 4 128 1500 vmnic6,vmnic2
PortGroup Name VLAN ID Used Ports Uplinks
DMZ 0 1 vmnic6,vmnic2
~ #
~ # ethtool -i vmnic0
driver: tg3
version: 3.123b.v50.1
firmware-version: FFV7.2.14 bc 5720-v1.25
bus-info: 0000:01:00.0
not much use but cant upload the vm-support files
would need the ok from the customer
Thanks
You could try upgrading the driver to "tg3 version 3.129d.v50.1" and see if the issue is reproducible.
A firmware upgrade to 7.4.8 might help as well if this is a firmware/driver issue
Under firmware details, mentioned as "Bugfixes"
Might help if this is one of the reported issues ![]()
Yeah think the best thing to do is update the driver and firmware and see if this fixes the issue, random one!!!
Thanks for your help again
http://www.dell.com/support/drivers/us/en/04/Product/poweredge-r820
