Re: VMware HA Network Redundancy

xadamz23 · ‎04-23-2008

Let me first start off by saying that for the last 3 weeks all I have been doing is setting up a 4 node cluster, reading best practices, reading forum posts, etc.... So my brain is a little fried right now. My environment consists of a 4 node cluster. Each server has a 10 Gb nic with a port group for the service console and a port group for the vm traffic. I was tired of seeing the "Host hostname currently has no management network redundancy" message so I decided to patch down the onboard 1 Gb nic and create a second service console. The redundancy message is now gone. So far so good. I wanted to see what would happen if I pulled the network cable from the 10 Gb nic. I had one vm running on the host and it was powered back up on a different ESX host, which is what I want. I noticed in the VI client that the ESX host was no longer responding which makes sense because when I initially added the host to the cluster I provided the IP address of the 10 Gb nic and since it now has no network connectivity VirtualCenter can't communicate with the host. According to the documentation, the HA heartbeat is sent over all service console port groups. That being said the other hosts in my cluster should still be able to communicate with this host. I hope I havent lost anyone so far

So I guess my question is, what do I really gain by having a second service console?

I've attached a screenshot of my network setup to help you visualize my config.

aguacero · ‎04-23-2008

Why don't you add the secondary physical NIC to the vswitch0 which would provide you HA on the VM side of things as well as connectivity to the VC. This is assuming you only have two physical nics on the esx hosts.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

mike_laspina · ‎04-23-2008

Hi,

The purpose the secondary HA interface is to allow for events like a bound IP is dropped or a patch cable on the console is disconnected.

If HA can still see the other hosts it will behave differently.

For example if you allow the HA host to shutdown VM's on isolation and a cable is unplugged this would be nasty with one console network.

Message was edited by: mike.laspina reworded failure to isolation

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

aguacero,

I decided not to go this route for a couple of reasons:

1. If the 10 Gb nic loses connectivity, the vm traffic goes from having 10 Gb bandwidth to 1 Gb.

2. Our network admin only set up the trunk on the 10 Gb switchport. Of course I could have him set it up on the 1 Gb port as well

xadamz23 · ‎04-23-2008

"For example if you allow the HA host to shutdown VM's on failure and a cable is unplugged this would be nasty with one console network."

Can you expand on that?

mike_laspina · ‎04-23-2008

Yes. there is an option called shutdown VM's on isolation. If left at the default the isolated host will shutdown all the VM's its hosting of the basis that it cannot contact the other hosts within ~15 seconds.

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

Ok, that is what I tested. In my first post I stated that I had one vm running and it was automatically powered off and then powered back up on another host. I have also tested this when I had just one service console port group on the 10 Gb nic and got the same result. So I guess I am not understanding your point.

Based on what you just said shouldnt my vm have remained powered on? Since I only disconnected the 10 Gb nic, the service console on the other nic was still active so the host should have been able to communicate with the other hosts in the cluster using the HA heartbeat.

mike_laspina · ‎04-23-2008

I apologies, I did not read you post very well.

This would indicate a configuration issue with the second adaptor or a name resolution problem.

Each host should be able to resolve every other host by FQDN and the short name. It is a good idea to create static host lookup entries if you can not guaranty that a DNS server will be available.

Addtionally since the new network components were not present during the HA client configuration, you would need to make sure that you issue a reconfigure HA from the GUI (rclick on host)

Message was edited by: mike.laspina - added reconfigure

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

I had thought about DNS being an issue because I know that HA is dependant on name resolution. But I dont understand something. The service consoles have different IP addresses. It is not allowed to configure 2 IP addresses to resolve to the same name in DNS. The IP of the 10 Gb nic is 10.254.23.152 and it resolves to rmhesx03.rhsnet.org. The IP of the other service console is 10.254.23.158. I dont see a place in VirtualCenter to set another hostname. So how will my 2nd IP ever resolve to anything?

Also, if the 10 Gb nic loses connectivity, I actually want the vms to power down because since my vm traffice is going through this nic they would lose connectivity. It does me no good to leave them powered up if they cant communicate on the network.

Edited - I did do the HA reconfigure after adding the 2nd service console.

aguacero · ‎04-23-2008

You add a secondary DNS entry (on dns server) with the ip of 10.254.23.158 to point to rmhesx03.rhsnet.org. Your esx host will have 2 dns entry.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

xadamz23 · ‎04-23-2008

As I stated in a previous post, DNS does not allow you to create two entries with the same name pointing to two different IP addresses.

mike_laspina · ‎04-23-2008

I frequently create host names with multiple IP addresses on my MS DNS services, it will need a reverse lookup zone if you don't already have it.

But you will need to set the TTL to a very short time for that to work and possibly increase the HA time-out.

I would only do that if you have to deal with a lot of hosts. You only have 4 so the static method is you best route.

To use static entries you must edit the /etc/hosts file on the ESX servers and the %windir%\system32\drivers\etc\hosts file on the VC

Message was edited by: mike.laspina added static file locations

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

I tested this before reading your post and you are right, you can have multiple IPs assigned to the same host.....my mistake.

But if the 10 Gb nic loses connectivity, I actually want the vms to power down because since my vm traffice is going through this nic they would lose connectivity. It does me no good to leave them powered up if they cant communicate on the network. Why would you want your vms to remain powered on if they cant communicate on the network?

mike_laspina · ‎04-23-2008

Yes you do have a single point of failure and if you are not going to add any redundancy there it will not of great benifit. But it is a good learning process none the less.

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

Thanks for your responses....I appreciate it. One last thing. I was doing a constant ping of 10.254.23.152 (10 Gb nic) and 10.254.23.158 (1 Gb nic). I unplugged the 10 Gb nic and got request timed out for both pings. I plugged it back in and unplugged the 1 Gb nic but still got a reply from both pings.

What is up with that?

kjb007 · ‎04-23-2008

Where were you pinging from? If you had 2 service console interfaces, only 1 can have a default gateway. If you were pinging from an address outside of the 10.254.23.x network, then you will time out from the interface which did not have a gateway.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

xadamz23 · ‎04-23-2008

I was pinging from a workstation that is not on the 10.254.23.x network. Here is the output of ifconfig:

vmnic0 Link encap:Ethernet HWaddr 00:0C:FC:00:34:60 # 10 Gb nic

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:19907322 errors:0 dropped:0 overruns:0 frame:0

TX packets:2221573 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:2630217612 (2508.3 Mb) TX bytes:638973915 (609.3 Mb)

Interrupt:193

vmnic1 Link encap:Ethernet HWaddr 00:1C:C4:94:8D:C8 # 1 Gb nic

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:15 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:900 (900.0 b)

Interrupt:121 Memory:f8000000-f8012100

vswif0 Link encap:Ethernet HWaddr 00:50:56:4A:6A:05

inet addr:10.254.23.152 Bcast:10.254.23.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3310082 errors:0 dropped:0 overruns:0 frame:0

TX packets:2210079 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:722313233 (688.8 Mb) TX bytes:627418928 (598.3 Mb)

vswif1 Link encap:Ethernet HWaddr 00:50:56:46:6D:49

inet addr:10.254.23.158 Bcast:10.254.23.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:4 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:168 (168.0 b)

mike_laspina · ‎04-23-2008

You should really ping from the hosts directly.

vswif1 Link encap:Ethernet HWaddr 00:50:56:46:6D:49

inet addr:10.254.23.158 Bcast:10.254.23.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:4 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:168 (168.0 b)

This interface is not recieving packets.

issue and post the following console commands so we can see whats up on the uplinks.

esxcfg-vswif -l

esxcfg-vswitch -l

http://blog.laspina.ca/ vExpert 2009

xadamz23 · ‎04-23-2008

It is receiving packets:

vswif1 Link encap:Ethernet HWaddr 00:50:56:46:6D:49

inet addr:10.254.23.158 Bcast:10.254.23.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:19 errors:0 dropped:0 overruns:0 frame:0

TX packets:8 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:2587 (2.5 Kb) TX bytes:336 (336.0 b)

Here is the output you requested:

# esxcfg-vswif -l

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.254.23.152 255.255.255.0 10.254.23.255 true false

vswif1 Service Console 2 10.254.23.158 255.255.255.0 10.254.23.255 true false

# esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 64 5 64 1500 vmnic0

PortGroup Name VLAN ID Used Ports Uplinks

10.255.22.x 110 0 vmnic0

10.254.23.x 116 0 vmnic0

10.255.23.x 115 0 vmnic0

10.200.23.x 3315 0 vmnic0

10.201.23.x 3316 0 vmnic0

10.253.23.x 117 0 vmnic0

10.254.22.x 111 0 vmnic0

10.100.x.x 2 0 vmnic0

Service Console 116 1 vmnic0

VMotion 116 1 vmnic0

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch1 64 4 64 1500 vmnic1

PortGroup Name VLAN ID Used Ports Uplinks

Service Console 2 0 1 vmnic1

xadamz23 · ‎04-23-2008

So I decided to try my constant ping again, but this time from another host in the cluster. After I unplugged the 10 Gb nic I got "Destination Host Unreachable" from both pings.

All

VMware HA Network Redundancy