VMware Cloud Community
gss4w
Contributor
Contributor

ESX Networking question

I am trying to troubleshoot a problem with installing Windows 2000 using an attended install from a network share hosted on a Windows 2000 Server VM. The first step of the install process involves a copy of all of the files from a network share to the target machine. This copy process is hanging repeatedly on various different files which prevents the unattended install from completing successfully.

What is perplexing about this problem is that it only occurs when we are doing an install from a host VM that is on a different physical server from the target machine. We can successfully deploy Win 2000 OS'es from the host VM to a client VM if these VM's are on the same ESX server and using the same virtual switch. However, when I try to deploy an OS from a VM host on one ESX server to a VM hosted on a different physical server the copy eventually hangs and provides the error message "Setup was unable to copy the following file:" (the actual file based on when the copy hangs).

I have tried doing a packet capture in the host VM and I noticed that when the copy hangs there are a series of TCP Retransmissions followed by the host pinging the client. The client responds successfully to the ping and the host then send an SNMP get-request the client replies with a ICMP destination unreachable reply.

When the copy hangs the client eventually provides an error message that offers the option of pressing ENTER to retry the copy operation. Usually if I retry the copy the file that hung will complete successfully, but the system will hang again on a different file.

We had used this same process in the past with GSX running on a Windows 2003 server hosting the VM that was deploying the OS and did not experience this problem when deploying to VM's running in ESX 2.5. We are currently running all the VM's on ESX 3.01.

I have searched various different site for ideas but have not found a solution yet. HP had a knowledge base article that described our problem and it suggested turning off "Offload Transmit TCP Checksum" on the host OS. However, I have not seen any way to do that within the VM or on the ESX server. Any ideas for possible solutions are appreciated.

0 Kudos
3 Replies
Tibmeister
Expert
Expert

We had a similar problem when we tried to copy a large file from VM to VM on different hosts, it actually caused a network storm which shut all of our Cisco switches down. We have since not tried to do that again as we can replicate the issue, but cannot determine a cause for it. VM to VM copy on same host has no issue as it doesn't even go across the physical network at all.

When we do a large file copy ffrom VM to Physical box, or vice versa, no problems at all, jsut VM to VM on different hosts.

0 Kudos
kharbin
Commander
Commander

Sounds when the VM boots and starts the install process, it is pulling a duplicate IP or MAC (probably IP though). Works fine under you start bulk transfer of data. That's why it works when on the same hostbecause the network traffic never leaves the host.

See what IP get used when you do the install. Then do an arping using duplicate address detection mode. That will tell you if a dup exists and if its a VM or physical.

my 2 cents

0 Kudos
gss4w
Contributor
Contributor

Kharbin, thanks for the ideas, unfortunately I don't think this is the problem that I am experiencing. I am running a separate DHCP server along with the server that I am using to deploy the OS images and so I can see that the IP addresses is being assigned correctly. Also when I said the copy hangs what I mean is that everything will be working correctly and then at a certain point the system stops transferring data. Thanks for the suggestion though, it is useful for me to consider all possibilities.

0 Kudos