narten
Contributor
Contributor

Infamous black screen when opening console

Hi.

I ran into a problem today I can't figure out, and searching some of the fora, it seems others have had this problem with no obvious fix. So here's another attempt at flushing this out.

I am unable to open a console window properly for some VMs. I get a "black screen" that appears to be unresponsive to keyboard and mouse. However, investigation shows:

1) keystrokes and mouse are making it to the VM. (This was verified by someone else who is able to open the console normally and saw the keystrokes and mouse actions I was generating).

2) I'm able to open consoles fine on one vCenter, but not another.

3) I'm having the same problem whether using the windows vSphere client or the Web Client, so it's not just a browser issue.

4) I'm seeing the problem for all VMs in the one setup, whether running windows, or linux.

5) It's just me (my particular setup), as two colleagues are able to open consoles to the same machines just fine.

6) I'm doing this from Windows 7 running under Linux KVM (RHEL). Using FireFox 31.1.0

7) If I run Chrome directly under linux (on same laptop) I can open a console successfully, but since a chrome console doesn't pass control characters, that solution is't really a useable option for me.

😎 Running vCenter/ESXi 5.5

Poking around:

> VSphere 5.5 VM Console screen black | VMware Communities

> https://communities.vmware.com/message/2363368

The solution there involved IPv6, but we are not using IPv6 and none of the machines in question have IPv6 addresses.

Other related reports point to the DNS entries needing to resolve properly, which is not the case here (plus, you typically get errors in that case).

Any suggestions?

Thanks!

Thomas

Tags (3)
7 Replies
dainok
Enthusiast
Enthusiast

Any firewall, port filtering, NAT or something similar between vClient and vCenter/ESXi hosts?

-- Data Center Engineer - CCIE #38620
0 Kudos
narten
Contributor
Contributor

> Any firewall, port filtering, NAT or something similar between  vClient and vCenter/ESXi hosts?

Not that I think are relevant.

There is a NAT running between the Windows-VM and the rest of the world. No known FW getting in the way.

Testing some more, just now I was able to get a proper working console using Chrome (under the Windows VM). At the same time, from the same VM, neither the console under FF or the Windows client worked. I just see the black screen.

Also, I did some additional testing that showed that even though the screen is black, the TCP connection to the ESXi host where the VM/Console resides is being opened correctly and data is flowing in both directions (per tcpdump) as I type or move the mouse. So I doubt its a FW/NAT issue. I believe that running a console uses a single TCP connection to the host where the VM resides

0 Kudos
dainok
Enthusiast
Enthusiast

IT's seems a one-way console.

I suggest you to test vclient (not web client) directly connected to the esxi host and check if the console works fine.

-- Data Center Engineer - CCIE #38620
0 Kudos
narten
Contributor
Contributor

I've narrowed down the problem I am having.  The TCP connection for the console window is getting hosed in a weird way. I'm able to reliably reproduce the problem when using VPN software X, but problem does not happen when using VPN software Y.

Wireshark has shown the following:

1) Machine A (vSphere Client) opens a TCP connection to machine B (ESXi host), they successfully exchange some data in both directions, and then B sends a big chunk of data (more than one segment - i.e., the initial console screen) to A that doesn't arrive (wireshark reports "previous segment not captured").  A then responds with a Selective ACK, which B apparently doesn't deal with properly. (Or the SACK gets mangled before B sees it. Or something -- I don't know what is really happening here, but from this point on the connection is wedged.)

2) Once A has returned a SACK, B continues to retransmit data, but it retransmits from the SACK point, not the ACK point. So A continues to respond with an ACK of X (since it is missing a chunk), whereas B retransmits at a sequence point greater than X, and the missing data never gets retransmitted.

I would bet that the problem has to do with the VPN software, rather than with ESXi's ability to handle SACK. But that is just a guess.

I googled for stuff like "ESXi SACK problems" but nothing came up. Are there any known issues with ESXi and SACK?

0 Kudos
raog
Expert
Expert

heh.. i liked your zeal to narrow down the problem. Based on the initial post, i thought the problem might be with the way networking is being handled inside KVM on RHEL.

Like you said, it might be a problem with the VPN software. Did you raise a support ticket for this?

Regards

Girish

To Virtualization and beyond! PS::If you felt the answer as helpful, please mark it as helpful/answered so that it helps other users as well! Blog:: www.virtualtipsntricks.com
0 Kudos
dainok
Enthusiast
Enthusiast

MTU problem? Try to force a lower MTU just to be sure.

-- Data Center Engineer - CCIE #38620
0 Kudos
narten
Contributor
Contributor

More narrowing down of problem. I used pktcap-uw on the ESXi host to

get a packet trace, and did the same on my laptop.

Looking at the traces, I see the following "interesting" behavior

(courtesy of tcpdump):

> 06:18:14.490938 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [.], ack 1361, win 129, length 0

> 06:18:14.490963 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [P.], seq 3141:3178, ack 1361, win 129, length 37

> 06:18:14.534533 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [P.], seq 3178:3231, ack 1361, win 129, length 53

> 06:18:14.534556 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [P.], seq 3231:4772, ack 1361, win 129, length 1541

Here, esxi sends out a packet > 1500 bytes... even though the MTU on

vmknic0 is 1500...

On my laptop, I do receive:

> 11:23:06.957635 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [P.], seq 4553:4772, ack 1361, win 129, length 219

I.e., the packet that was sent was apparently resegmented into two

TCP segments (i.e., two separate IP packets) but only the second one

arrived. Presumably TCP offload is doing resegmentation on outbound.

> 06:18:14.586327 IP laptop.ipfltbcst > esxi.ideafarm-door: Flags [P.], seq 1361:1398, ack 3178, win 16483, length 37

> 06:18:14.618077 IP laptop.ipfltbcst > esxi.ideafarm-door: Flags [.], ack 3231, win 16470, options [nop,nop,sack 1 {4553:4772}], length 0

Laptop returns SACK saying it got sequence 4553:4772, and is missing

3231-4552 (1321 bytes)

> 06:18:14.620168 IP laptop.ipfltbcst > esxi.ideafarm-door: Flags [P.], seq 1398:1435, ack 3231, win 16470, length 37

> 06:18:14.620184 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [.], ack 1435, win 129, length 0

> 06:18:15.141979 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [.], seq 3231:4553, ack 1435, win 129, length 1322

> 06:18:15.991972 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [.], seq 3231:4553, ack 1435, win 129, length 1322

Esxi is retransmitting the missing bytes, but the packet is not

getting delivered to laptop. The above two packets do not show up in

the trace on the laptop.

> 06:18:16.447390 IP laptop.ipfltbcst > esxi.ideafarm-door: Flags [P.], seq 1435:1488, ack 3231, win 16470, length 53

> 06:18:16.448857 IP laptop.ipfltbcst > esxi.ideafarm-door: Flags [P.], seq 1488:1541, ack 3231, win 16470, length 53

Laptop has more data to send, does so (but still ACKing 3231)...

> 06:18:16.448884 IP esxi.ideafarm-door > laptop.ipfltbcst: Flags [.], ack 1541, win 128, length 0

Esxi ACKs the received data...

The remainder of the trace shows ESXi continuing to retransmit the

missing segment, but I never receive it at my laptop. Somewhere

between the ESXi outbound interface and my laptop the packet is lost.

At this point, I'm not sure I have the ability to debug/trace

further. There could be an issue with the ESXi TCP offload, or there

could be some random issue along the path between the esxi host and my

laptop.

A couple of other things I tried:

1) drop MTU on esxi host to something like 1200. When I did that, the

problem went away. Smiley Happy

2) From the esxi host, I ran ping to my laptop  with various packet

sizes around the size of the packet that was getting lost in the above

trace. No packets were lost, and no fragmentation was observed.

So, I'm now stuck at trying to understand where the missing packet is

getting lost, and why.

BTW, ESXi setup is:

~ # vmware -v

VMware ESXi 5.5.0 build-1623387

Host: IBM System x3650M3

Network Adaptors: Broadcom NetXtreme II BCM5709