VMware Cloud Community
alankoh
Enthusiast
Enthusiast
Jump to solution

vsphere ha - restart vm when production network is down

Hi

 

I have 2 vswitches in current vsphere ha cluster with host monitoring turn on

vswitch1 - vmk1 - esxi network - management service turn on

vswitch2 - production network - no other services turn on

 

if the production network go down due to nic failure, is there anyway to restart the vm on another host ?

I understand that vsphere ha uses the management traffic for vsphere ha heartbeats / but i do not have management service turn on vswitch2

 

Do i turn on management service on vswitch2 and turn off management service on vswitch1 as well ?

What will happen if vswitch1 and vswitch2 has management service turn on ?

 

Thank you

Reply
0 Kudos
4 Solutions

Accepted Solutions
vXav
Expert
Expert
Jump to solution

The management service is what gives your host an IP to manage it so you don't need to enable it anywhere else.

By the looks of your question I assume you only have one NIC in each vSwitch? If this is the case and you can't add more, I suggest you only keep 1 vSwitch with the 2 NICs in which you'll have a management portgroup and a VM portgroup . In the management portgroup you set NIC2 as standby and on the VM portgroup you set NIC1 as standby. That way you get redundancy.

Other than that, there is no feature to restart the VMs in case of nic problem. The closest scenario I can think of is host isolation but it doesn't apply to this context.

View solution in original post

IRIX201110141
Champion
Champion
Jump to solution

Correct!

If you have multiple NICs for your VM Network as Active/Active or Active/Standby and when these cables goes to different pSwitches you have a perfect redundancy for network and most likely never see  downtime here. Yes you can configure alarms(always enabled by default) about reduced and lost network redundancy within vCenter.

The HA "management" its only for reaching the witness to find if a host it self is separated from the cluster or not. Depending on the result you have a configure option within the HA config about Shutdown or leave on the VMs.  If you only have 2 hosts in a cluster the setting most likely differs as when having 3 or more.

Restarting a VM because of lost Host networking is not a general "feature" of a ESXi Host and vSphere HA.  Maybe you should take a look to Dell Intergration for vCenter because there is something like  "Predicited HA" so when a special Host problem occurs you can specify an event. Similar to executing a script when a Alarm is triggert.

Regards,
Joerg

 

View solution in original post

depping
Leadership
Leadership
Jump to solution


@alankoh wrote:

thanks for the reply - so can i say host isolation is not meant for such scenario and the right way would be to have multiple nics for the vm network and turn on alert when a pnic is down

 

but again why would anyone in host isolation want to restart the vm if the management network is down as long as the prod vm network is up


Host Isolation is used by customers where it is very likely that a failure on the management network also means that there's a failure for the VM or the storage is impacted. You have the ability to specify whether you want to take action during an Isolation. If it is unlikely that the VM is impacted, and it is unlikely that storage is impacted, then you simply set the isolation response to disabled.

View solution in original post

vXav
Expert
Expert
Jump to solution

This would be the right approach indeed.

Lot's of great feedbacks to choose from in these replies. I'll just add the following.

The uplinks in a team should be on different cards (e.g. 1 on the mobo integrated and 1 on a pci NIC). That way if a NIC dies you don't loose all the uplinks. As someone rightly said, you should also patch your NICs to different physical switches for added redundancy (best practice anyway).

Consistent driver/firmware across the cluster is recommended as well.

If you get a network event with all that, it's probably gonna be outside the scope of vSphere and affect more than one host so...

Note that, on top of pnic status alert, you should also have monitoring of your workloads (Nagios, Zabbix...) so you know when the guest is impacted as it may not show in vCenter (vlan removed or missing from a switch port trunk for instance).

View solution in original post

11 Replies
vXav
Expert
Expert
Jump to solution

The management service is what gives your host an IP to manage it so you don't need to enable it anywhere else.

By the looks of your question I assume you only have one NIC in each vSwitch? If this is the case and you can't add more, I suggest you only keep 1 vSwitch with the 2 NICs in which you'll have a management portgroup and a VM portgroup . In the management portgroup you set NIC2 as standby and on the VM portgroup you set NIC1 as standby. That way you get redundancy.

Other than that, there is no feature to restart the VMs in case of nic problem. The closest scenario I can think of is host isolation but it doesn't apply to this context.

IRIX201110141
Champion
Champion
Jump to solution

ESXi have inbuild network redundancy as soon as you have 2 or more uplinks assigned to a vSwitch and Portgroup.  Btw. having network redundancy is a requirement for vSphere HA because without that you will get a warning.

So how many NICs does your ESXi Host have? In you current vNetwork setup you need at least 4 NICs. If you have only 2 in total you should consolidate all Portgroups on vSwitch0.

Regards,
Joerg

depping
Leadership
Leadership
Jump to solution

HA does not have the capability to restart VMs when the VM Network itself is down. Even if you would add a second management interface on the same switch, this probably would still be a different VLAN/network segment. So if only the VM Network would fail, this would still not result in a restart.

alankoh
Enthusiast
Enthusiast
Jump to solution

hi all

 

thanks for all your replies

 

I do actually have 2 pnics for the management vswitch and 2 pnics for the vm network vswitch.

Today 1 of the pnics of the vm network went down and i was being asked the question if we will be able to restart the vm automatically on another host

I went to read about vsphere ha and the closest that came to restarting vm is host isolation - but that is on the management network

So i am wondering if i could enable management service on the vm network so as to restart the vm on another host should all the pnics in the vm network fail

 

Thank you

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

If you would add a management network on the other vswitch then you have 2, if either of the two would work then HA would still not see this as an isolation event. the only way you could do this is if you would have 1 management network and it is on the same network segment as your VM network, which I normally would not recommend, as that goes against most security best practices!

alankoh
Enthusiast
Enthusiast
Jump to solution

thanks for the reply - so can i say host isolation is not meant for such scenario and the right way would be to have multiple nics for the vm network and turn on alert when a pnic is down

 

but again why would anyone in host isolation want to restart the vm if the management network is down as long as the prod vm network is up

Reply
0 Kudos
IRIX201110141
Champion
Champion
Jump to solution

Correct!

If you have multiple NICs for your VM Network as Active/Active or Active/Standby and when these cables goes to different pSwitches you have a perfect redundancy for network and most likely never see  downtime here. Yes you can configure alarms(always enabled by default) about reduced and lost network redundancy within vCenter.

The HA "management" its only for reaching the witness to find if a host it self is separated from the cluster or not. Depending on the result you have a configure option within the HA config about Shutdown or leave on the VMs.  If you only have 2 hosts in a cluster the setting most likely differs as when having 3 or more.

Restarting a VM because of lost Host networking is not a general "feature" of a ESXi Host and vSphere HA.  Maybe you should take a look to Dell Intergration for vCenter because there is something like  "Predicited HA" so when a special Host problem occurs you can specify an event. Similar to executing a script when a Alarm is triggert.

Regards,
Joerg

 

depping
Leadership
Leadership
Jump to solution


@alankoh wrote:

thanks for the reply - so can i say host isolation is not meant for such scenario and the right way would be to have multiple nics for the vm network and turn on alert when a pnic is down

 

but again why would anyone in host isolation want to restart the vm if the management network is down as long as the prod vm network is up


Host Isolation is used by customers where it is very likely that a failure on the management network also means that there's a failure for the VM or the storage is impacted. You have the ability to specify whether you want to take action during an Isolation. If it is unlikely that the VM is impacted, and it is unlikely that storage is impacted, then you simply set the isolation response to disabled.

depping
Leadership
Leadership
Jump to solution


@IRIX201110141 wrote:

Restarting a VM because of lost Host networking is not a general "feature" of a ESXi Host and vSphere HA.  Maybe you should take a look to Dell Intergration for vCenter because there is something like  "Predicited HA" so when a special Host problem occurs you can specify an event. Similar to executing a script when a Alarm is triggert.

Regards,
Joerg

 


You mean "proactive HA". this uses a vendor plugin indeed, and will typically allow you to respond to NIC down events etc. But note, it responds to hardware based failures. If there's a config failure or something that happens on the switch then proactive HA doesn't help either.

Reply
0 Kudos
IRIX201110141
Champion
Champion
Jump to solution

Yeah... "Proactive". I forgot the the name because we have disabled it. 

 

Regards,
Joerg

Reply
0 Kudos
vXav
Expert
Expert
Jump to solution

This would be the right approach indeed.

Lot's of great feedbacks to choose from in these replies. I'll just add the following.

The uplinks in a team should be on different cards (e.g. 1 on the mobo integrated and 1 on a pci NIC). That way if a NIC dies you don't loose all the uplinks. As someone rightly said, you should also patch your NICs to different physical switches for added redundancy (best practice anyway).

Consistent driver/firmware across the cluster is recommended as well.

If you get a network event with all that, it's probably gonna be outside the scope of vSphere and affect more than one host so...

Note that, on top of pnic status alert, you should also have monitoring of your workloads (Nagios, Zabbix...) so you know when the guest is impacted as it may not show in vCenter (vlan removed or missing from a switch port trunk for instance).