This one has me completely stumped, and I am at wit's end. I resolved (sort of) my issue with the system losing the NICs completely, and now I've run into an incredibly odd and strange issue.
ESXi 4 refuses to talk TCP/UDP with one system on my network, which is of course, the one I use for managing it. Here's where things get odd; the system cannot ping, ssh, or manage the ESXi 4 system. HOWEVER, the ESXi 4 system will happily ping the affected box with no problems.
Every other system on the network works fine for ping, ssh, the works. Absolutely no problems whatsoever, but none of them are management capable, so I'm stuck.
Any ideas? Like I said; I am completely stumped. Why is ESXi 4 refusing to talk to just the ONE system?
Network troubleshooting doesn't change in a virtual environment.
Some things that come to mind.
Firewall
Duplicate IP
Misconfigured nic gateway subnet mask etc etc.
Switch issues vlan etc.
Except the fact that this isn't network troubleshooting at this point.
If any of those were at issue, there would be one of two problems which are absolutely not present.
1) ESXi 4 would not be able to ping or telnet TO the affected machine - it can communicate when it's the originator, but ONLY then.
2) Other machines on the network would NOT be able to communicate normally with the ESXi 4 box - but they can, I've verified it both directions including scp.
This is definitely an issue in the ESXi 4 management layer. Not the network.
from the description it sounds like your management machine is firewalled.
Why do you think this is an ESXi-issue if other machines can ping or ssh to it fine ?
___________________________________
VMX-parameters- Workstation FAQ -[ MOA-liveCD|http://sanbarrow.com/moa241.html] - VM-Sickbay
Yup, exactly my thoughts, continuum. The problem is thus: it's a flat network right now. Literally, everything is on one switch, no VLANs implemented, nothing between the ESXi system and the rest of the network.
So, the only thing that can firewall ESXi 4 is itself. I considered that the management box might be firewalling, but I have the same symptoms with Windows Firewall turned fully off. Just in case, I combed through the registry and firewall settings, and I can't find any rules that would block traffic to the ESXi system yet allow it to ping only one way.
The only explanation I can come up with is that ESXi 4 itself is blocking or rejecting all responses to the affected machine. Meaning I can watch the pings cross the wire, but ESXi just refuses to answer them for that one machine. Same for management traffic.
Well, please have a look at these problems identification hints :
Maybe there will be something similar.
---
iSCSI SAN software
http://www.starwindsoftware.com
Here goes nothing. I mean, a LOT of troubleshooting output, so we can all be on the same page here.
ESXi host - shuhalo, 192.168.0.250
Problem host - terra, 192.168.0.100
Working host - alloy, 192.168.0.1
C:\Users\prj>ping 192.168.0.250
Pinging 192.168.0.250 with 32 bytes of data:
Reply from 192.168.0.100: Destination host unreachable.
Reply from 192.168.0.100: Destination host unreachable.
Reply from 192.168.0.100: Destination host unreachable.
Reply from 192.168.0.100: Destination host unreachable.
Ping statistics for 192.168.0.250:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
C:\Users\prj>ping 192.168.0.1
Pinging 192.168.0.1 with 32 bytes of data:
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1ms TTL=64
Reply from 192.168.0.1: bytes=32 time<1ms TTL=64
Ping statistics for 192.168.0.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 1ms, Average = 0ms
C:\Users\prj>
SSH (via PuTTY) - Timed Out
Telnet (via PuTTY) to 902 - Timed Out
prj@alloy ~ $ ping -c 4 shuhalo
PING shuhalo.SANITIZED (192.168.0.250): 56 data bytes
64 bytes from 192.168.0.250: icmp_seq=0 ttl=64 time=0.705 ms
64 bytes from 192.168.0.250: icmp_seq=1 ttl=64 time=1.001 ms
64 bytes from 192.168.0.250: icmp_seq=2 ttl=64 time=0.998 ms
64 bytes from 192.168.0.250: icmp_seq=3 ttl=64 time=0.998 ms
--- shuhalo.achedra.org ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.705/0.925/1.001/0.127 ms
prj@alloy ~ $ telnet shuhalo 902
Trying 192.168.0.250...
Connected to shuhalo.SANITIZED
Escape character is '^]'.
220 VMware Authentication Daemon Version 1.10: SSL Required, ServerDaemonProtocol:SOAP, MKSDisplayProtocol:VNC , VMXARGS supported
prj@alloy ~ $ ssh shuhalo
~ # ps ef |grep hostd
5076 5076 hostd hostd
5106 5076 hostd hostd
(trimmed, total of 11 hostd processes)
~ # ping 192.168.0.100
PING 192.168.0.100 (192.168.0.100): 56 data bytes
64 bytes from 192.168.0.100: icmp_seq=0 ttl=128 time=0.538 ms
64 bytes from 192.168.0.100: icmp_seq=1 ttl=128 time=0.376 ms
64 bytes from 192.168.0.100: icmp_seq=2 ttl=128 time=22.977 ms
--- 192.168.0.100 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.376/7.964/22.977 ms
~ # nslookup terra
Name: terra
Address 1: 192.168.0.100 terra.SANITIZED
~ # ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.931 ms
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.387 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=1.175 ms
--- 192.168.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.387/0.831/1.175 ms
~ # nslookup 192.168.0.1
Name: 192.168.0.1
Address 1: 192.168.0.1 alloy.SANITIZED
So, as you can see, OTHER boxes have absolutely no trouble reaching ESXi. It's the one Windows box (terra) that can't. But ESXi has no problems at all reaching saidsame Windows box.
can you sniff the connection attempts with wireshark ?
___________________________________
VMX-parameters- Workstation FAQ -[ MOA-liveCD|http://sanbarrow.com/moa241.html] - VM-Sickbay
Hi,
check used subnet mask on host terra, looks for me that it's using 255.255.255.128.
Hope this helps a bit.
Greetings from Germany. (CET)
Ghost got me to poking around, and I found a most interesting problem on the Windows 7 box, which was aggravating the ESXi system, creating a sort of feedback thing going on.
Windows 7 apparently, can silently corrupt it's arp tables. Double checking the subnet, I thought to poke at them to make sure it was even still seeing the system. Surprise, the problem interface has no arp entry. Ping it, still no arp entry.Add a static arp entry from the command line on Windows 7, things abruptly work again. This part of the problem is DEFINITELY Windows 7 being stupid somewhere.
I can duplicate with all OSes on the first port (ether ending in 3e:6e) while I have no problems at all with the second port (ether ending in 3e:6f). Any suggestions as to where I should be digging in Windows to see what's causing this?