VMware Cloud Community
Jonathan_Stroud
Contributor
Contributor

Protecting Virtual Machines

Hi All

Quick question about protecting virtual machines, we had an issue last week where we lost access to the production network only on one of the ESXi 5.0 nodes and a number of VM’s where offline

Quick rundown on the setup

vCenter with a 4 node ESXi 5 Cluster, with HA\DRS enabled

vSwitch0 with 2 NIC’s for management and vMotion

vSwitch1 with 2 NIC’s used for Production traffic

The question is, if you lose both NIC’s on vSwicth1 (I know this may be very rare) is there any way for vCenter or another tool to migrate the VM’s off that node and onto a good node with Production traffic?

Many Thanks and sorry if thats a silly question

Reply
0 Kudos
8 Replies
a_nut_in
Expert
Expert

Hi Jonathan,

HA or VMware high availability would only check for host failures. So if the host is not responing on the network, HA would kick in. With regards to the Virtual Machine network, as long as the host is pingable, the Virtual Machines would not be migrated off unless a DRS event has occurred (cumpute resources imbalance)

I know Beacon Probing can help on the management network, but even that will not help if the link goes down on the vSwitch 1 in your case

http://kb.vmware.com/kb/1005577

What you can do is essentially set up alarms for the vSwitch1 to fire off an event (email or alarm) if the link redundancy or failure occurs.

You could also set up a monitoring tool such as NAGIOS etc to check for Virtual Machine ping responses

Alart from that, I don't really think there's much you can do, for let's say you have a host with VM's on it and the VM's are connected to VM network on a seperate switch, even deleting the portgroup or vSwitch itself will not cause any issue to the working host

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
Reply
0 Kudos
aravinds3107
Virtuoso
Virtuoso

The question is, if you lose both NIC’s on vSwicth1 (I know this may be  very rare) is there any way for vCenter or another tool to migrate the  VM’s off that node and onto a good node with Production traffic

No it is not possible

Is all the 4 NIC's are onboard NIC?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful |Blog: http://aravindsivaraman.com/ | Twitter : ss_aravind
Reply
0 Kudos
Jonathan_Stroud
Contributor
Contributor

Thanks guy's

Thought i was being silly and missing something

I think the best thing to do is setup the alerts for the customer on the NIC's and VM's also to maybe log a call with VMware about the NIC's going offline as it was weird, both NIC's showed as up from the config screen but you was unable to ping the default gateway from any of the VM's connected to the vSwitch, forgot to test to see if you could ping another VM on the same vSwitch also if you clicked the CDP icon this showed no info. Once the host was rebooted you could ping the gateway and see CDP

Thanks again

Reply
0 Kudos
a_nut_in
Expert
Expert

Interesting.

Could you share the host hardware details as well as nic details?

esxcfg-nics-l and a esxcfg-vswitch -l?

Would also help if the vm-suppoet bundle of the host could be shared across.

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
Reply
0 Kudos
Jonathan_Stroud
Contributor
Contributor

Hi

Server: Dell PowerEdge R820

ESX: ESXi 5.0.0, 721882

~ # esxcfg-nics-l -l-l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

vmnic0 0000:01:00.00 tg3 Up 1000Mbps Full d4:ae:52:a5:e4:fe 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet

vmnic1 0000:01:00.01 tg3 Up 1000Mbps Full d4:ae:52:a5:e4:ff 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet

vmnic2 0000:02:00.00 tg3 Up 100Mbps Full d4:ae:52:a5:e5:00 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet

vmnic3 0000:02:00.01 tg3 Down 0Mbps Half d4:ae:52:a5:e5:01 1500 Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet

vmnic4 0000:44:00.00 tg3 Up 1000Mbps Full 00:10:18:e5:a1:50 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic5 0000:44:00.01 tg3 Up 1000Mbps Full 00:10:18:e5:a1:51 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic6 0000:44:00.02 tg3 Up 100Mbps Full 00:10:18:e5:a1:52 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic7 0000:44:00.03 tg3 Down 0Mbps Half 00:10:18:e5:a1:53 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

~ #

~ # esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 128 5 128 1500 vmnic0,vmnic4

PortGroup Name VLAN ID Used Ports Uplinks

VMotion 0 1 vmnic0,vmnic4

Management Network 0 1 vmnic0,vmnic4

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch1 128 8 128 1500 vmnic1,vmnic5

PortGroup Name VLAN ID Used Ports Uplinks

Production 0 5 vmnic1,vmnic5

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch2 128 4 128 1500 vmnic6,vmnic2

PortGroup Name VLAN ID Used Ports Uplinks

DMZ 0 1 vmnic6,vmnic2

~ #

~ # ethtool -i vmnic0
driver: tg3
version: 3.123b.v50.1
firmware-version: FFV7.2.14 bc 5720-v1.25
bus-info: 0000:01:00.0

not much use but cant upload the vm-support files Smiley Sad would need the ok from the customer

Reply
0 Kudos
a_nut_in
Expert
Expert

Thanks

You could try upgrading the driver to "tg3 version 3.129d.v50.1" and see if the issue is reproducible.

A firmware upgrade to 7.4.8 might help as well if this is a firmware/driver issue

Under firmware details, mentioned as "Bugfixes"

Might help if this is one of the reported issues Smiley Happy

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
Jonathan_Stroud
Contributor
Contributor

Yeah think the best thing to do is update the driver and firmware and see if this fixes the issue, random one!!!

Thanks for your help again

Reply
0 Kudos
a_nut_in
Expert
Expert

http://www.dell.com/support/drivers/us/en/04/Product/poweredge-r820

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
Reply
0 Kudos