Re: Connectivity issues between hosts/vms

Justin2501 · ‎03-04-2015

Hi there,

We recently upgraded our single server mail environment to a full redundant mail environment.

Sadly enough, it's not redundant yet.

We are facing connectivity issues between some of the VM's which makes it impossible for our environment to function correctly.

I saw a post about problems with cisco and vmware/esxi, but sadly enough i can't find it anymore.

Firewall's of the Windows Machine are off and to be sure i Enabled File and Printer sharing (ICMP)

They are all in the same IP Range of course, and are all using the same VLAN ID in ESXi/vSphere

Our current environment:

ESXi Host 151

DC02

HC02

ESXi Host 153

DB02

ESXi Host 162

DC01

HC01

DB01

DC = Domain controller

HC = HubCas

DB = Database

Issues:

HC02 en DC01 can not communicate with eachother (ping, discovery etc.)

HC02 en DB01 can not communicate with eachother (ping, discovery etc.)

DC02 en HC01 can not communicate with eachother (ping, discovery etc.)

I've searched everywhere, and checked (i hope) everything but i can;t figure it out.

So I was hoping someone could help me a bit further.

If there is any information I forgot, feel free to ask.

Kind regards,

Justin

jrmunday · ‎03-04-2015

Hi Justin,

We need more information about your setup before looking at any specifics.

- How is your physical network configured?

- How is your host networking configured (vSwitches, Port Groups, teaming and failover policies, etc.)?

Thanks,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

Justin2501 · ‎03-05-2015

Hi,

- How is your physical network configured?

The physical network is completely connected on a Cisco Switch which is the Firewall of all the machines.

I must be honest, I lack of Cisco experience. This part is taken care off by someone else over here.

- How is your host networking configured (vSwitches, Port Groups, teaming and failover policies, etc.)?

The virtual machines are connected with a Virtual switch in vSphere (name: MAIL) and is a member of VLAN ID **
From there, it can push it's traffic trough the 2 physical adapters.

As far as I can see there are no other policies or failovers configured.

EDIT: I noticed something. Host 151 has the following not enabled. "Enable IPv6 support on this host system".

Kind regards,

Justin

unsichtbare · ‎03-05-2015

So what's often confusing about vSphere Networking is that more is not always better.

What I mean is that more physical uplinks associated with a vSwitch can cause disconnection if all of the physical uplinks are not connected to the same network(VLANs). vSphere uses two teaming policies by default:

Route based on the originating virtual port ID
Link status only

Combined, these two policies can cause the behavior you are describing if one or more of the uplinks (NIC's) connected to a vSwitch is on a live port but not the correct network. Basically, Route based on the originating virtual port ID, tells vSphere that each VM gets to use only one of the available uplinks (NIC's), to prevent unintended loops in your network. Link status only tells vSphere to leave an uplink (NIC) active if it is connected to a live port, regardless of connectivity to actual networks.

Ideas, while running a continuous ping from one of the VMs:

Since we don't know how many uplinks you have in each ESXi (hopefully two or more), try rotating through the NICs on each host by making all but one of them at a time "unused" in the vSphere client. If this restores/alters connectivity, fix your physical network.
If you have three or more NICs, consider using Beacon Probing as Failure detection method.
Make absolutely certain that every NIC in a team are uniformly/evenly connected to the same broadcast domain/VLAN's

+The Invisible Admin+ If you find me useful, follow my blog: http://johnborhek.com/

Justin2501 · ‎03-05-2015

Hi,

I just checked the hosts, and all 3 of them have 2 Uplinks (physical nics)

On host 153 and host 162 both of the uplinks have an Observed IP Range, which is fine if i'm right.

But on host 151 only 1 of the 2 uplinks has an Observed IP Range and the other one has "None"

Host 153 does not have any connectivity issues, and DB02 which is on this host can contact all vms.

So all i can think of it that host 153 or host 162 is the issue, but i'm still not sure what it is.

Saldy enough I can't disable the network on the vm's becuase it's an live environment.

All (Teamed) nics are in the same VLAN

unsichtbare · ‎03-06-2015

So it is obviously not the best idea to make changes to a production system that is not in Maintenance Mode.

That being said, the NIC without an observed IP range probably has 50% of the virtual NICs assigned to it. If you disable this NIC, the remainder of the connectivity may be restored.

+The Invisible Admin+ If you find me useful, follow my blog: http://johnborhek.com/