VMware Cloud Community
OmniTech
Contributor
Contributor

Service Console over subnet problem

Hi!

I have the wierdest thing happening here.

I got 3 ESX hosts, 1 host on my local subnet and 2 hosts over remote subnet.

I can't contact or ping the local ESX host from my ESX host on the remote subnet.

But I can contact my remote host with no problems from my local subnet. I also can ping local servers from my remote ESX host.

I can contact the local ESX host only from the local subnet.

So I figured that the problem is the gateway of the local ESX host, but it is set correctly in the /etc/sysconfig/network file.

Wierdest thing is that I deleted the vswif and recreated it because there was a broadcast address error. When I did so, I was able to contact the Host from remote subnet for 5 minutes and then suddenly, I could not anymore, without changing a thing...

I just don't get it.

Thanks for your help!

0 Kudos
18 Replies
jcwuerfl
Hot Shot
Hot Shot

Sounds like it is something going on with your Local ESX host and default gateway or routing?. How many NIC's do you have on your Local ESX host?

*) take a look at : route -n | grep UG That will show you your Default Gateway for the Service Console

Other Network Info that would be helpeful to understand:

*) esxcfg-vswif -l - Get which PortGroup name its using, usually this is "Service Console" This will also give you the IP address that the SC is using.

*) esxcfg-vswitch -l See which vmnics the "Service Console" PortGroup is using.

*) esxcfg-nics -l to see if those vmnics Link Status is up.

*) esxcfg-vmknic -l will show you your VMKernel Network Interfaces if any

*) esxcfg-route -l This will show you VMKernel routes only

*) ifconfig -a and looking at the vmnic will show you if there has been any errors on that vmnic# .

*) Other questions are if the uplink(s) for the Service console are normal access ports or if you are doing any trunking/vlan's/etherchannel with them?

0 Kudos
OmniTech
Contributor
Contributor

Sounds like it is something going on with your Local ESX host and default gateway or routing?. How many NIC's do you have on your Local ESX host?

*) take a look at : route -n | grep UG That will show you your Default Gateway for the Service Console

Other Network Info that would be helpeful to understand:

*) esxcfg-vswif -l - Get which PortGroup name its using, usually this is "Service Console" This will also give you the IP address that the SC is using.

*) esxcfg-vswitch -l See which vmnics the "Service Console" PortGroup is using.

vmnic0 and 1

*) esxcfg-nics -l to see if those vmnics Link Status is up.

All Up

*) esxcfg-vmknic -l will show you your VMKernel Network Interfaces if any

*) esxcfg-route -l This will show you VMKernel routes only

*) ifconfig -a and looking at the vmnic will show you if there has been any errors on that vmnic# .

I got 18 drops but no error

*) Other questions are if the uplink(s) for the Service console are normal access ports or if you are doing any trunking/vlan's/etherchannel with them?

Access ports on a unmanaged switch

route is looking good to the good default gateway.

0 Kudos
OmniTech
Contributor
Contributor

I think I found something but not sure why it's related.

On the same vSwitch I had a VMKernel port group, when I deleted this port group, the ping was routed correctly.

ut than, when I tried to ping again, it failled. Only worked one time after I deleted the portgroup, but after this it stopped working....

Anyone have an explanation?

Thanks

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

This could also be a routing issue if everything looks ok on the local host. Has anything changed on your local router ? gateway/firewall or whatever you have there?

0 Kudos
OmniTech
Contributor
Contributor

It's not my router cause if I use any other server on this local network I can ping on the remote network and also from the remote to the local net.

The problem is only on this particular local ESX host.

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

Gotcha. Have you tried also doing: service network restart

Also check out this kb some of this we already looked at : http://kb.vmware.com/kb/1003796

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

Also, here is another KB I found that may be helpful

Configuring networking from the ESX service console command line: http://kb.vmware.com/kb/1000258

which also links to this which may be worth doing:

Configuring or restoring networking from the ESX service console using console-setup: http://kb.vmware.com/kb/1022078

0 Kudos
OmniTech
Contributor
Contributor

Yup. I did restart the network service, even the host itself.

Maybe this is a clue, when I delete and recreate the vswif I get this error:

"Cannot Update management NIC. no suitable network connection found."

0 Kudos
OmniTech
Contributor
Contributor

The tracert command does not give me the full path, it only shows that it has resolved the address

# tracert 172.17.1.3 -A

traceroute to 192.168.1.3 (172.17.1.3), 30 hops max, 40 byte packets

1 gelfs1.groupegeloso.int (172.17.1.3) 0.197 ms 0.249 ms 0.239 ms

Is there a way to tell that command to show the full path to the destination like in windows pathping command?

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

Did it add it? and is it enabled?

esxcfg-vswif -l

Wonder if its anything to do with your vmnic0 and vmnic1 both active in your vSwitch0. Perhaps you should remove vmnic1 from there for now just to make sure.

0 Kudos
OmniTech
Contributor
Contributor

Yeah, it did add it ok.

I even tried removing the vswif and the vswitch completly, rebooting the host than recreated it. Did not worked

Now i'm trying the same but with different vmnics.

0 Kudos
OmniTech
Contributor
Contributor

Did not worked...

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

At this point you may be better served giving VMware support a call. Have them remote in and walk though reconfiguring it. I did search in the kb the warning you saw, but didn't find much on that.

0 Kudos
OmniTech
Contributor
Contributor

I'm still trying to solve this with the VMWare support. But I wanted to post some of my progress.

I think there is a conflict somewhere in config files cause here's the thing. When I simply change the network file to set the gateway on another router (let's called it B), it works. I can ping anything on remote subnet back and forth. But when I set it back to gateway A, it can't be pingeg over remote subnet.

The thing I can't figure is that any other ESX hosts, servers or desktop use gateway A with absolutely no problem.

So I think that when I set this host to Gateway A, it cause a conflict in the config and the SC won't use the gateway properly.

The other clue I have is when the host is set to gateway A and I make some changes on the vswitch (like adding a port group) sometimes, pinging remote subnet works than after like 30 seconds, without making any changes, it stopped working.

Wierdest thing I've seen so far with ESX...

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

I guess if all else fails you can try and reinstall ESX fresh. Maybe that would clear it up. Have you thought about going ESXi yet?

OmniTech
Contributor
Contributor

No, but it's a wonderful idea to test my gateway before doing a fresh ESX install.

Thanks

0 Kudos
OmniTech
Contributor
Contributor

I have installed ESXi on a USB key and run it on the same server as the one with issue using the same network parameters and it works just fine with the gateway. It can be pinged across subnets with no problem at all.

So I guess a fresh install of ESX on this machine would probably be the only fix for that issue.

I though Windows was the only product that needed a drastic measure like this to resolve an issue. I'm really disapointed about VMWare for this...

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

Typically its fine, I think it was something funky going on with this one though. Glad it seems ok at least with the USB.

0 Kudos