VMware Cloud Community
Seonix
Contributor
Contributor

vCloud Networking Issue

Hi all,

We've recently encountered an odd problem with our vCloud/vSphere networking environment that I'm hoping someone could shed some light on.

Our environment:

- Two host clusters with 4 hosts in each.

- Both clusters have vApps deployed through vCloud and have been working fine for months.

- All vApp networking is through a vApp network with NAT routing configured to an external IP for each VM. Firewall is currently disabled (for testing).

- Each vApp network has a vShield Edge appliance.

The issue:

In one of the two environments, none of the VM's within any vApp can ping each other, nor can they ping the gateway for the vApp network.

Any workstation coming in from the physical network to the virtual environment can ping the entire virtual envrionment up to and including the vShield Edge appliance for the vApp network, but can't ping any VM's internal to the vApp.

Short story is, it would appear that for whatever reason, networking between the vse appliance and vApp VM's isn't functioning correctly. I've tried resetting the vApp networks but haven't got a clue as to where to continue.

Any help would be greatly appreciated and let me know if more info is required.

Simon

0 Kudos
22 Replies
pscheri
Enthusiast
Enthusiast

Hi! are you working with VLAN Backed o VCDNI network pools??

Be sure to correctly trunk all VLANs in every uplink.

What does it happens if you ping between VMs that are residing in the same host?

Pablo.

0 Kudos
Seonix
Contributor
Contributor

Hi Pablo.

We are using VCDNI network pools. One for each cluster. VLAN's all look fine (although I'm by no means an expert).

When attempting to ping VM's on the same host, they cannot get a response from each other. I tried pinging internal and external VM addresses with no luck.

Any ideas?

0 Kudos
Seonix
Contributor
Contributor

Not sure if this is helpful or not but the whole issue seems to be (from a high level) that no VM's that exist on vse port groups are able to communicate with each other/anything. Anything that sits on other port groups (even if it is connected to both) can contact the entire environment.

0 Kudos
_morpheus_
Expert
Expert

So you're saying that two VMs on the same host, that are using the same VCDNI-backed network can't ping each-other? What version of VCD/VC/ESX?

0 Kudos
Seonix
Contributor
Contributor

Yes that's correct. They are obviously both using the same vse appliance generated distributed port group as well.

VCD V1.5.1

VC V5.0

ESXi V5.0

This is doing my head in. We have another cluster that is configured exactly the same (unless I've missed something) and both have worked for a long time.

0 Kudos
charliejllewell
Enthusiast
Enthusiast

To keep it simple and remove as many factors I would create a new vApp with two virtual machines with a single vApp network with all firewalls disabled. vMotion the two virtual machines onto the same ESXi host. Manually check that the networking has been configured correctly on each VM. Setup a continuos ping from one host to the other. Install and setup wireshark/TCPdump on the hosts and check whether any traffic has been sent/recieved. Can you post back the results?

0 Kudos
_morpheus_
Expert
Expert

Try unprepare, reboot, and prepare all the hosts in the cluster, and then stop and start the vApps that are having issues

0 Kudos
Seonix
Contributor
Contributor

Hi all,

I've managed to resolve the issue. It would appear the vShield appliances on each of the hosts had failed (presumably because of a vShield manager or general cluster hiccup). We rebooted these and the vShield edge appliances on the vApp networks were able to traverse the networks again.

Thanks for the assistance Smiley Wink

0 Kudos
_morpheus_
Expert
Expert

That doesn't make any sense. You don't need a working vShield Edge for VMs within the same vApp to ping each other

0 Kudos
pscheri
Enthusiast
Enthusiast

Did you restart the vShield Manager as well? I think it might have a little more sense if after that, the vApp is restarted too.

The vDistributed PortGroups gets regenerated when you restart the vApp, and if the connection to the vShield Manager is reset after reset, this task could have been done right finally.

0 Kudos
_morpheus_
Expert
Expert

That doesn't make sense either. Once the portgroup is created, whatever happens to vShield Manager is irrelevent. If the portgroup didn't get created correctly, then the vApp deploy operation should have failed and rolled back.

pscheri wrote:

Did you restart the vShield Manager as well? I think it might have a little more sense if after that, the vApp is restarted too.

The vDistributed PortGroups gets regenerated when you restart the vApp, and if the connection to the vShield Manager is reset after reset, this task could have been done right finally.

0 Kudos
sevenp
Enthusiast
Enthusiast

Hy All,

We have exactly the same issues. Multiple vApps/Orgs cannot reach their gateway (vShield Edge) and cannot reach the other machines in the vApp.

The difference is that a reboot of the Edge does not solve the issue. We are running on the latest versions of all products: vCD, vCenter, ESXi hosts are all on 5.1. Also the issue is not on all vApps/VMs.

Situation:

On this vCD we have one vCD machine, one Provider vDC, one vCenter, one cluster and four hosts within this cluster.

We have multiple small organizations with each one vApp and one or some VMs in it.

When I deploy an organization vDC, I choose for the Network Pool "Cloud Network Isolation" I created previously.

IP's on the VMs are provided as Static IP Pool (from the org VDC network with a private addressrange)

Affected machines displayes "No network access" as status of the NIC. On the NIC the IP is automaticly configured as configured in the vCD.

On the EDGE we have NAT configured so the VMs can reach the internet.

So far I can verify in vSphere / on the ESXi hosts all things are configured well; the vShield EDGE machines has two IP's: one in the public network, the other in the private network where also the connected VM's are in.

The vLan used for Cloud Network Isolation is available on the switches and on all hosts. All deployed networks in the same vCloud Network Isolation Network Pool (where mutiple organizations belongs to) have this vLan.

When EDGE and VMs are on the same host; it also didn't work...

I saw one thing thats strange on the (automaticaly deployed) vShield Edge machines: Beside the Public network and the private network, there is also a third network thats not on the dvSwitch, have no vlan, not configured on any hosts, and is named 'none'. All Edge machines have this third network attached (according the vSphere Client). Maybe this has something to do with the issues? In the vShield Web GUI, the Edge's only shows the normal two interfaces and IPs.

Anyone have a advice or suggestions?

0 Kudos
sevenp
Enthusiast
Enthusiast

Can anyone help me with this vCD networking issue?

I'm running out of options...

0 Kudos
_morpheus_
Expert
Expert

Did you try unprepare and re-prepare the hosts?

0 Kudos
sevenp
Enthusiast
Enthusiast

Thanks for replying.

Only one host (without results), do I have to unprepare and reprepare all the hosts?

Some orgs/vApps are working fine... I don't want to broke these...

Today one org/vApp works after I Powered Off and start the vShield Edge (and deployed a second VM in the vApp). When trying this (power off and start the Edge) on another Org/vApp it didn't solve this issue on this one....

0 Kudos
_morpheus_
Expert
Expert

What if you use VLAN-backed network pool, then does it work fine?

0 Kudos
sevenp
Enthusiast
Enthusiast

Just tested it with VLAN-Backed Network Pool instead of VCDNI and also the same issues... Can't ping the gateway (edge) and other VMs. The IPs are configured right (static IP Pool) by vCD on the Edge and on the VMs in the vApp.

In the org, I created a new vDC with vLan-backed network pool, new edge and new internal network. I moved the vApp to the newly created org vDC and changed NICs to the new Internal network (thats based on vlan backed at org vDC level).

The used vLan is Tagged on the connected switches. In vSphere the created portgroup (by vCD) in the dvSwitch has the right vlan ID (one of the specified vlan backed Pool). The EDGE and vApp machines have a network in this portgroup.

With normal vLans/port groups on the same dvSwitch all networks are working fine (without the use of vCD, but on the same cluster and ESXi hosts).

So what's next?

0 Kudos
sevenp
Enthusiast
Enthusiast

Found solutions for the issues I had...

On some Edge's the Firewall was configured to deny (default setting).

On other Edge's a re-deploy did the trick (maybe needed after previously host re-configurations for vCD?)

On another EDGE the NAT rule was configured wrong; the wrong interface was selected.

This explains why there were some organizations that works fine in this vCD deployment...

Thanks for helping anyway!

Another question: I'm unable to delete an org network because the EDGE has an IP in it (initialy configured as the gateway). I created an new org network (als with the same EDGE as gateway in another private network) and want to delete the old org network. How to archive this?

0 Kudos
mobinqasim786
Enthusiast
Enthusiast

Hi Guys,

I'm having a similar issue in test environment. I'm using VLAN Backed network pool. Created a vApp and added two VMs in it. Both VMs can ping their gateway  as well as Org VDC Network, but can't ping each other. I added another VM and it could ping it's gateway and also ping one of the other two VMs created earlier but couldn't ping.

I migrated all the VMs in same host as the vApp network then all works fine. Any help please!!

0 Kudos