VMware Cloud Community
RootWyrm
Contributor
Contributor

ESXi 4 doesn't want to talk to just one box

This one has me completely stumped, and I am at wit's end. I resolved (sort of) my issue with the system losing the NICs completely, and now I've run into an incredibly odd and strange issue.

ESXi 4 refuses to talk TCP/UDP with one system on my network, which is of course, the one I use for managing it. Here's where things get odd; the system cannot ping, ssh, or manage the ESXi 4 system. HOWEVER, the ESXi 4 system will happily ping the affected box with no problems.

Every other system on the network works fine for ping, ssh, the works. Absolutely no problems whatsoever, but none of them are management capable, so I'm stuck.

Any ideas? Like I said; I am completely stumped. Why is ESXi 4 refusing to talk to just the ONE system?

Tags (3)
0 Kudos
9 Replies
DSTAVERT
Immortal
Immortal

Network troubleshooting doesn't change in a virtual environment.

Some things that come to mind.

Firewall

Duplicate IP

Misconfigured nic gateway subnet mask etc etc.

Switch issues vlan etc.

-- David -- VMware Communities Moderator
0 Kudos
RootWyrm
Contributor
Contributor

Except the fact that this isn't network troubleshooting at this point.

If any of those were at issue, there would be one of two problems which are absolutely not present.

1) ESXi 4 would not be able to ping or telnet TO the affected machine - it can communicate when it's the originator, but ONLY then.

2) Other machines on the network would NOT be able to communicate normally with the ESXi 4 box - but they can, I've verified it both directions including scp.

This is definitely an issue in the ESXi 4 management layer. Not the network.

0 Kudos
continuum
Immortal
Immortal

from the description it sounds like your management machine is firewalled.

Why do you think this is an ESXi-issue if other machines can ping or ssh to it fine ?




___________________________________

VMX-parameters- Workstation FAQ -[ MOA-liveCD|http://sanbarrow.com/moa241.html] - VM-Sickbay


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
RootWyrm
Contributor
Contributor

Yup, exactly my thoughts, continuum. The problem is thus: it's a flat network right now. Literally, everything is on one switch, no VLANs implemented, nothing between the ESXi system and the rest of the network.

So, the only thing that can firewall ESXi 4 is itself. I considered that the management box might be firewalling, but I have the same symptoms with Windows Firewall turned fully off. Just in case, I combed through the registry and firewall settings, and I can't find any rules that would block traffic to the ESXi system yet allow it to ping only one way.

The only explanation I can come up with is that ESXi 4 itself is blocking or rejecting all responses to the affected machine. Meaning I can watch the pings cross the wire, but ESXi just refuses to answer them for that one machine. Same for management traffic.

0 Kudos
PaulSvirin
Expert
Expert

Well, please have a look at these problems identification hints :

Maybe there will be something similar.

---

iSCSI SAN software

http://www.starwindsoftware.com

--- iSCSI SAN software http://www.starwindsoftware.com
0 Kudos
RootWyrm
Contributor
Contributor

Here goes nothing. I mean, a LOT of troubleshooting output, so we can all be on the same page here. Smiley Happy

ESXi host - shuhalo, 192.168.0.250

Problem host - terra, 192.168.0.100

Working host - alloy, 192.168.0.1

terra:

C:\Users\prj>ping 192.168.0.250

Pinging 192.168.0.250 with 32 bytes of data:

Reply from 192.168.0.100: Destination host unreachable.

Reply from 192.168.0.100: Destination host unreachable.

Reply from 192.168.0.100: Destination host unreachable.

Reply from 192.168.0.100: Destination host unreachable.

Ping statistics for 192.168.0.250:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

C:\Users\prj>ping 192.168.0.1

Pinging 192.168.0.1 with 32 bytes of data:

Reply from 192.168.0.1: bytes=32 time<1ms TTL=64

Reply from 192.168.0.1: bytes=32 time<1ms TTL=64

Reply from 192.168.0.1: bytes=32 time=1ms TTL=64

Reply from 192.168.0.1: bytes=32 time<1ms TTL=64

Ping statistics for 192.168.0.1:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

C:\Users\prj>

SSH (via PuTTY) - Timed Out

Telnet (via PuTTY) to 902 - Timed Out

alloy:

prj@alloy ~ $ ping -c 4 shuhalo

PING shuhalo.SANITIZED (192.168.0.250): 56 data bytes

64 bytes from 192.168.0.250: icmp_seq=0 ttl=64 time=0.705 ms

64 bytes from 192.168.0.250: icmp_seq=1 ttl=64 time=1.001 ms

64 bytes from 192.168.0.250: icmp_seq=2 ttl=64 time=0.998 ms

64 bytes from 192.168.0.250: icmp_seq=3 ttl=64 time=0.998 ms

--- shuhalo.achedra.org ping statistics ---

4 packets transmitted, 4 packets received, 0% packet loss

round-trip min/avg/max/stddev = 0.705/0.925/1.001/0.127 ms

prj@alloy ~ $ telnet shuhalo 902

Trying 192.168.0.250...

Connected to shuhalo.SANITIZED

Escape character is '^]'.

220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP, MKSDisplayProtocol:VNC , VMXARGS supported

prj@alloy ~ $ ssh shuhalo

~ # ps ef |grep hostd

5076 5076 hostd hostd

5106 5076 hostd hostd

(trimmed, total of 11 hostd processes)

shuhalo (ESXi 4 box):

~ # ping 192.168.0.100

PING 192.168.0.100 (192.168.0.100): 56 data bytes

64 bytes from 192.168.0.100: icmp_seq=0 ttl=128 time=0.538 ms

64 bytes from 192.168.0.100: icmp_seq=1 ttl=128 time=0.376 ms

64 bytes from 192.168.0.100: icmp_seq=2 ttl=128 time=22.977 ms

--- 192.168.0.100 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.376/7.964/22.977 ms

~ # nslookup terra

Name: terra

Address 1: 192.168.0.100 terra.SANITIZED

~ # ping 192.168.0.1

PING 192.168.0.1 (192.168.0.1): 56 data bytes

64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.931 ms

64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.387 ms

64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=1.175 ms

--- 192.168.0.1 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.387/0.831/1.175 ms

~ # nslookup 192.168.0.1

Name: 192.168.0.1

Address 1: 192.168.0.1 alloy.SANITIZED

So, as you can see, OTHER boxes have absolutely no trouble reaching ESXi. It's the one Windows box (terra) that can't. But ESXi has no problems at all reaching saidsame Windows box.

0 Kudos
continuum
Immortal
Immortal

can you sniff the connection attempts with wireshark ?




___________________________________

VMX-parameters- Workstation FAQ -[ MOA-liveCD|http://sanbarrow.com/moa241.html] - VM-Sickbay


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
kastlr
Expert
Expert

Hi,

check used subnet mask on host terra, looks for me that it's using 255.255.255.128.


Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)
RootWyrm
Contributor
Contributor

Ghost got me to poking around, and I found a most interesting problem on the Windows 7 box, which was aggravating the ESXi system, creating a sort of feedback thing going on.

Windows 7 apparently, can silently corrupt it's arp tables. Double checking the subnet, I thought to poke at them to make sure it was even still seeing the system. Surprise, the problem interface has no arp entry. Ping it, still no arp entry.Add a static arp entry from the command line on Windows 7, things abruptly work again. This part of the problem is DEFINITELY Windows 7 being stupid somewhere.

I can duplicate with all OSes on the first port (ether ending in 3e:6e) while I have no problems at all with the second port (ether ending in 3e:6f). Any suggestions as to where I should be digging in Windows to see what's causing this?

0 Kudos