tj6
Contributor
Contributor

VM network not talking to another VM network of the ESXi on the same LAN

Re-posting to the correct thread:
 
VM network not talking to another VM network of the ESXi on the same LAN

I have 2 x ESXi hosts - each with VMs in it. The VMs on one of the server (call it ESXi host #2)  was able to talk to an application server on the ESXi Host #1. Networking is very simple - default VM network. The application server has a real subnet IP (10.x.x.x) assign to it statically. After about 40-45 days in operation suddenly this networking stopped working and I checked there was absolutely no changes were made.

Here are some more details:

- ESXi host #1 and the VMs within it to the Internet (8.8.8.8) - I can reach fine

- ESXi host #2 and the VMs within it to the Internet (8.8.8.8) - I can reach fine

- ESXi host #1 has an application server running - which is assigned to the 10.x.x.x network - not reachable by clients (VMs) on the ESXi Host #2.

If I ssh to the ESXi hosts - they can reach each other using the VM network real subnets.  

Attaching a simple drawing.

Can someone help me t-shoot this issue?

Please let me know if its not the correct forum.

0 Kudos
11 Replies
fabio1975
Expert
Expert

Ciao 

In summary:
Currently, the VMs indicated as Services Clients are unable to communicate with the VM indicated as Application Server.
Correct?
From what I see the VM clients have a different address (192.168.120.x) different from the VM Application Server (10.8.70.x)
Correct?

Fabio
BLOG:https://vmvirtual.blog

if satisfied give me a kudos
0 Kudos
tj6
Contributor
Contributor

That's correct. VM client is not able to reach the 10.8.70.x network anymore but it was working just fine. The client VMs however can reach the google dns etc... on the Internet. So North-bound traffic is working but east-west is not able to reach. When I say east-west this is for the ESXi hosts sitting on the same LAN.

0 Kudos
tj6
Contributor
Contributor

Additionally here is where traceroute ends on the client VM - and am not sure where is this: 192.168.153.254 hop coming into play.

 

0 Kudos
a_p_
Leadership
Leadership

Where's the routing done? To me this looks like either an issue on the router configuration.

André

0 Kudos
tj6
Contributor
Contributor

For the client VMs to go out that NIC should be responsible for the routing. But not sure who do I check? Also to re-iterate both the hosts are on the same LAN / VLAN on the physical switch.

I am using ESXi 7.0.3 FYI..

0 Kudos
a_p_
Leadership
Leadership

ESXi does not do any routing, so there's got to be some instance that' does it, otherwise VMs in two different subnets will not be able to talk to each other.

Maybe it will help if you post one of the server VM's IP configuration, and one of a client. (IP address, subnet mask, default gateway)

André

0 Kudos
tj6
Contributor
Contributor

Here is the interface config from the server:

network:
ethernets:
ens160:
addresses:
- 10.8.70.233/24
gateway4: 10.8.70.1
nameservers:
addresses: [10.101.2.10, 10.101.2.11]
version: 2

 

And attached is the client VM. 

 

The ESXi mgmt and the Application server VM's IP is on the same 10.8.70.x network.

0 Kudos
a_p_
Leadership
Leadership

Ok, so what you need to find is the system that has the two .1 gateway addresses. That's most likely responsible for routing the traffic.

André

0 Kudos
tj6
Contributor
Contributor

That .1 is the GW on the VLAN interface but spanning/pcap even doesn't show traffic is making to it destined for 10.8.70.233.

For discussion, let's assume the traffic was making it to the gw, it will simply get forwarded to the application server as its part of the same subnet (same LAN). There is no routing involved.

I think the issue is at the client VM side. If you look at the traceroute I provided the traffic is going to 192.168.120.1 (this is correct VM_Network gw IP) then the next_hop is 192.168.153.254 which is not known where its going.  Could it be the vm_network is messed up and if so how do I fix it? I have rebooted the client VMs as well as the entire ESXi host a couple of times already but the same behavior. Note that this was all working for about 40 some days and suddenly it stopped.

0 Kudos
a_p_
Leadership
Leadership

You are saying "on the VLAN interface". Are the hosts connected to the network on tagged network ports? Does maybe a misconfigured VLAN ID on the client's VM port group cause the issue?

André

0 Kudos
tj6
Contributor
Contributor

That PG on Client VM is using VLAN 0. I tried changing it but no difference.

0 Kudos