VMware Cloud Community
mbenson84
Contributor
Contributor

Unable to ping/access older ESXi hosts between VLANs

Hey all,

I've run into an interesting issue with some of my ESXi 5.5 U2 hosts, it doesn't really affect critical operations of the VMs running on this network however it's a pain from the admin side. We recently had a major network hardware upgrade/reconfiguration in this office and while this was being done the hosts were also upgraded and reconfigured.

This office has some dated hardware and the main hosts here are 2 x IBM System x3650 M3 and 3 x IBM System x3650 machines.

The hosts are connected to a SAN - M3 boxes via SAS, the older boxes via iSCSI.

Physical and virtual servers including ESXi hosts in this office run on the Server VLAN, workstations etc run on the Workstation VLAN.

I started fresh and reloaded ESXi 5.5.0 (build 2068190) onto all of the hosts and created a new vCenter server.

This all went smoothly and got all hosts connected up fine and the VMs running no problem, web interface works fine.

Installed the client on my local PC and connected fine, so I thought it was all done and dusted.

I discovered shortly afterwards that I was unable to connect to the console of any of the VMs running on the older hosts when using the client on my local PC (workstation VLAN) - Unable to connect to the MKS: Failed to connect to server <hostname>:902, any VMs on the newer M3 hosts I can open console just fine. When running the client on our network ops server (Server VLAN) I can open console to any VM on any host. The webclient suffers from the same issue, VMs on M3 hosts can connect, the older hosts gives the error. Directly access the running VMs is no problem between VLANs, however they are running on different Intel I340 NICs on a 4 x NIC PCI card added to each server.

After some troubleshooting I noted that from the Workstation VLAN you cannot ping any of the older hosts, the M3 hosts reply as expected.

On the Server VLAN you can ping all 5 hosts no problem.

The NICs being used for the management network are Broadcom BCM5708C NetXtreme II GigE and I thought perhaps the NIC drivers may need updating and and I found this after some searching: https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI55-BROADCOM-BNX2X-271070V557&product...

The drivers included in that package appeared to be newer than the ones that were installed originally.

I attempted to update these via VUM plugin which seemed to work, however the connectivity problem persists after host reboot.

This is what currently shows post patching on the hardware status > software section on the older host:

bnx2.JPG

In the previous environment accessing console worked fine across VLANs on ESXi 5.0 for all hosts but I did not set this up so I'm not sure if any customization was needed. I'm not super experienced with the deeper inner workings of VMWare stuff so I'm not sure where to go from here, or is this ancient hardware just no longer supported properly anymore in 5.5?

At this stage I was going to try running the management network on one of the Intel based NICs to see if that made any difference.

Any suggestions would be most welcome.

Cheers,

Mark.

0 Kudos
11 Replies
NuggetGTR
VMware Employee
VMware Employee

Considering it all works fine when your sitting in the Server VLAN, what NIC's the management is run on doesn't mater as its working. The issue sounds like a network one between VLANs. it sounds more like a firewall between the workstation VLAN to the ESXi hosts is blocking port 902.

Cheers

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
mbenson84
Contributor
Contributor

How do you explain it working perfectly fine for the newer M3 hosts regardless of which VLAN you have the client on, why is it not being blocked for them?

To show it in a simpler way -

Server VLAN:

  • Guests on x3650 M3 hosts = console accessible
  • Guests on x3650 hosts = console accessible
  • Ping to x3650 M3 hosts IP =  successful reply
  • Ping to x3650 hosts IP =  successful reply

Workstation VLAN:

  • Guests on x3650 M3 hosts = console accessible
  • Guests on x3650 hosts = console error
  • Ping to x3650 M3 hosts IP =  successful reply
  • Ping to x3650 hosts IP =  no reply

All 5 of the hosts are connected to the same switch and are behind the same firewall.

All 5 hosts have the same version of ESXi 5.5 U2 installed on them.

The only thing different between the M3 and older hosts is the physical hardware they are built with.

I'm not saying it definitely isn't the firewall (which is managed by an external company, I don't have access to it or make changes to it) it's just very strange that it works fine for the newer machines.

0 Kudos
NuggetGTR
VMware Employee
VMware Employee

Everything in the Server VLAN is fine which tells me that moving management onto a different NIC shouldn't matter. everything is fine within the server VLAN which is where im guessing the management network of the ESXi host is connected to. Because this part works fine its telling me something between your workstation and the server VLAN is the issue.

Next firewalls are generally IP to IP type of arrangements. which would explain why some hosts work and some don't, I would guess there would be a rule saying wks subnet is allowed to ESXi host IP addresses. on port 902 etc. If not all the IPs are in this rule then you would get something similar to what you are seeing. Either that or the local guest firewall is blocking the connection.

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
mbenson84
Contributor
Contributor

But that's just the thing... ALL guests are accessible from BOTH VLANs regardless of which host they reside on old or new. Connectivity is not blocked to any of them in any way.

The host IPs are all close together numerically so I find it hard to believe the firewall rules are allowing the first 2 and not the next 3 sequential IPs.

It's only the management traffic that seems to have an issue crossing VLANs for the older servers. Like I said previously the difference between management and guest traffic are the physical NICs. Management traffic is running on the onboard broadcom dual NICs, guest traffic on pcix Intel 4 x NIC card.

0 Kudos
NuggetGTR
VMware Employee
VMware Employee

I maybe confused and apologies if i read it incorrectly but we were not talking about the guest connectivity we are talking about connecting to the console of the guests that reside on the old ESXi hosts from the workstation VLAN correct?

In that case it requires port 902 to be open from your desktop (vSphere Client) in the workstation VLAN to the ESXI hosts management IP that the VM is residing on. Because all of this works completely fine from one VLAN (Server VLAN) and not from the  other VLAN (Workstation VLAN) it appears there is something blocking it.

The only reason I think this is the case is because you have said when running the vSphere client in the Server VLAN everything works fine, this would point at a network issue could be a routing problem with the default gateway of the hosts being incorrect or a firewall problem.

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
mbenson84
Contributor
Contributor

Correct we are talking about connectivity in general to the IP address of the hosts which includes connectivity to the consoles of the VM guests.

The only reason I mentioned guests was the fact you said this:

Either that or the local guest firewall is blocking the connection.

Did you mean the windows based firewall on my PC? Otherwise I don't understand how it's possible that any of the guest VM's firewalls could block connectivity from an external workstation PC to the physical host that the VM is running on.

I can see your point in why console might not be able to connect, but the VMs are almost all windows based and most of the firewalls are fully disabled. The windows based firewall is disabled on my workstation by policy as well.

Just to add a bit more info, here is how the IPs are configured - (the office also has a standalone x3650 M4 host which has no connectivity issues across VLANs)

vCenter Hosts:

x3650 M3 #1     192.168.131.20     Ping from workstation VLAN = OK

x3650 M3 #2     192.168.131.21     Ping from workstation VLAN = OK

x3650 #1     192.168.131.22     Ping from workstation VLAN = FAIL

x3650 #2     192.168.131.23     Ping from workstation VLAN = FAIL

x3650 #3     192.168.131.24     Ping from workstation VLAN = FAIL


Standalone Host:

x3650 M4     192.168.131.25     Ping from workstation VLAN = OK


The gateway set for all of the hosts is the same, and the test management network ping on the ESXi hosts themselves all come back OK.

0 Kudos
a_p_
Leadership
Leadership

Out of curiosity, does the arp cache on the workstation (arp -a) show the ESXi host's MAC address after trying to ping or tracert to the host?

André

0 Kudos
mbenson84
Contributor
Contributor

No, the IP doesn't even appear in the list.

So I did make some progress... I have a 4th first gen x3650 server that isn't in use.

I fired it up and installed ESXI 5.5 U2 that has been customized and had the very latest Broadcom drivers added to it.

Assigned it 192.168.131.26 and it's accessible across both VLANs...

Makes no sense to me at all, the same procedure was done with the 3 other hosts that aren't working properly just with the standard 5.5 U2 iso.

The only difference I can come up with is that I didn't get a chance to touch firmware/BIOS on the other 3 hosts and USB was not booting so I installed them all via CD.

This 4th host that just worked fine I updated firmware/BIOS on the host first, then installed via USB as it now will boot successfully.

I did pull one of the problem hosts out of service and updated firmware/BIOS and manually tried updating the drivers through vCenter as I mentioned earlier, but it made no difference.

At this point I think I'm just going to reload ESXi on this host from the USB drive I just had success with and see if that fixes the problem and be done with it.

0 Kudos
mbenson84
Contributor
Contributor

So an interesting update on this.

I haven't had an opportunity to shuffle the VMs and reload esxi 5.5 on the production hosts yet but I was just messing around with the spare host and esxi/vcenter 6.0.

Running as a standalone host I can ping the host's management IP fine between VLANs

I connected to the host via my locally installed vsphere client.

Open console to VM and created a new VM with a Server2012 R2 template.

Installed vCenter which is running fine - it is accessible via the local vSphere client and also via the web client.

At some point during this process I accepted a security certificate for the host.


Everything is working fine at this point, I can open console no problems using the vSphere client.

I log into vCenter and create a new data center and add the host into my environment.

This has to be done via the webclient which works fine... the local vSphere client gives me this: Call "Datacenter.QueryConnectionInfo" for object "TEST" on vCenter Server "192.168.198.30" failed.

So that was curious... something doesn't seem right already.

At some point during this I'm presented with another certificate for the host saying that the cert has changed, so I accept it and the host joins.(I don't recall this happening in 5.5 but I may be mistaken)

Now console to the VM ceases to connect: Unable to connect to the MKS: Internal error

Fails to connect via the web client as well: The console has been disconnected. Close this window and re-launch the console to reconnect.

I can still RDP to the VM fine though, so it is still technically accessible, just not via console.

Logging into the host directly with the vSphere client and attempting console now yields the same result:Unable to connect to the MKS: Internal error

I tried removing the host from vCenter but the issue persists.

Interestingly if I create a brand new VM I can connect to console just fine.

I thought maybe I'd somehow broken something so I blew the VM away and started fresh again, as soon as the host is joined to the vCenter, exact same symptoms as above.

So what am I missing here?

Anyone got any ideas?

EDIT: Managed to fix this by disabling the VMXNET3 VNIC for the vCenter Windows VM and enabling it again.

Now connectivity is working as expected via both vSphere and web clients.

0 Kudos
warkaj
Contributor
Contributor

I've had repeated problems with VMXNET3 NICs inside windows VMs. It will randomly show a in the network system tray icon and require a DHCP flip to enable it which I don't want to do. I've updated the vmtools and updated the drivers but it still happens.

Edit: Also I didn't know if you checked your hosts files on your esxi hosts.... might be a dumb question but just thought of it.

-ajw

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points.
0 Kudos
DrWhy
Enthusiast
Enthusiast

Powering off and on the Virtual machines fixed this problem for me.

0 Kudos