VMware Cloud Community
DavidinCT
Enthusiast
Enthusiast
Jump to solution

VM loses connectivity after a reboot....help !!!!

I'm lost on this one, I have NEVER seen this problem before.

The Server is a HP DL380 G5 with dual 1gb nics. Running ESXi 5.0 (with all current updates) One comes in from ISP and one goes out the local lan (server is mail, NAT, and much more).

After a reboot, the NIC that comes from the ISP, loses connectivity, well mostly. It will serve websites (I assume that is port opened in the firewall) but, nothing else. No internet, No DNS, I can't even ping an external site by it's IP. The only thing that works is the websites that are running. We use this connection for websites and Internet for the site so using NAT gives it around our PCs SO it's important that it works.

The ONLY way to get it back is to delete BOTH nic cards from Windows and let Windows find them again (reinstall the drivers). I have taken this server and copied it back down to older hardware and it never has an issue after a reboot and works with out an issue. On ESXi 5, even after fixing it by deleting the NIC cards, once it gets rebooted, I have to do it all over again.

I need to move to the VM but, so far it's been nothing but a nightmare.

Is it hardware or is it ESX that is causing this problem, I do know it's not the 2003 Windows server that it is running...

Any ideas ??? Please help if you can.

0 Kudos
1 Solution

Accepted Solutions
rickardnobel
Champion
Champion
Jump to solution

The problem seems to be what I has been suspecting, an internal VM routing issue. I thought you was loosing the 0.0.0.0 route to internet, but instead you actually get another 0.0.0.0 route that points to the local interface. From the route print output:

Working:

0.0.0.0          0.0.0.0       < ISP gateway >      <   ISP  IP >     10

Not Working:

0.0.0.0          0.0.0.0       < ISP gateway >      <   ISP  IP >     10

0.0.0.0          0.0.0.0      192.168.0.1      192.168.0.1     10

The 192.168.0.1 is the local IP address of the internal NIC of your VM. This means that it for some reason consider itself the default gateway and also will not try to reach the internet through your ISP router.

If you look at the ipconfig /all from the notworking text you will see that the "Ethernet adapter Local Area Connection 4" with IP 192.168.0.1 now has itself as default gateway.

The configuration of the LAN4 is static, so it is quite strange where the new default gateway information comes from. A quick fix after a reboot is likely to open the TCP/IP settings and remove the new default gateway of the interal NIC.

You could also possible enter a new persistent route to your ISP router with lower metric, but I think we should try to first find out why your Windows 2003 is setting a new default gateway on the internal interface.

My VMware blog: www.rickardnobel.se

View solution in original post

0 Kudos
50 Replies
vGuy
Expert
Expert
Jump to solution

David - can you please provide more information on how you have setup your vSwitches or a screenshot of the network configuration.

Also, is this issue specific to single VM or multiple VMs? Do you see any errors on the ESXi host for the corresponding vmnic?
DavidinCT
Enthusiast
Enthusiast
Jump to solution

Hi, Thanks for attempting to help...

I am only using this on one VM, The way the system is setup, only one VM can take advantage of this setup. This is the one that is having this issue. Where I have to delete every time I reboot it. (note:ISP). Is this what you want to see ?

The vlan setup is pretty basic, This one that I am having an issue goes Right from the port> right to the VM. The other port is connected to the internal network.

The Network cards in this system are Broadcom NetXtreme II BCM5708 1000base-T

This is the config page for the NIC card with the issue, do you need something else ?

vlan.jpg

0 Kudos
vGuy
Expert
Expert
Jump to solution

Thanks for the information, David. I believe vSwitch0 is your local LAN linked to vmnic0.

You mentioned, "The ONLY way to get it back is to delete BOTH nic cards from Windows and let Windows find them again (reinstall the drivers)"

Do both the NICs loses network connectivity or only the one connected to ISP portgroup? Can you try to use vmxnet adapters and see if the behaviour changes?


0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

I tried both adapters (I have been troubleshooting this issue from Monday), with the same result. After a reboot JUST the ISP (vSw1) stops working that I have noticed (again, parly works, only serves websites that are opened in the esxi firewall). I have totaly disabled the ESXi 5.0 firewall (--enabled false and unload)

The reason why I say "Both" is if I just delete the ISP NIC from windows and it refinds it, it still will show the same problem, but, if I delete them both, it seems to come back.

When going over vswitch0 (my lan connection), I can connect to the My vSpere client and ping things fine(local lan) but, could be having issues but, cant be 100% sure. Everything I used seemed to work fine.

To note: Nothing shows up in the logs for vmware or Windows (well windows is showing can't connect to xxx server but, nothing resulting for the connection problem)

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

DavidinCT wrote:

After a reboot JUST the ISP (vSw1) stops working that I have noticed (again, parly works, only serves websites that are opened in the esxi firewall). I have totaly disabled the ESXi 5.0 firewall (--enabled false and unload)

The firewall in ESXi is only for network traffic against the Vmkernel itself, it will not get in the way from traffic in/out of the virtual machines - so it should not be part of the problem.

Could you give some more information about the setup and display the first view in the networking configuration, that is, which vSwitches do you have and what portgroups and which VMnics are connected to these vSwitches.

Are both the physical cards (vmnics) connected to the same vSwitch? If so, I belive the problem is the Port ID load balancing option which could alter the virtual machine connections at VM reboot and cause this.

My VMware blog: www.rickardnobel.se
vGuy
Expert
Expert
Jump to solution

That's correct. No need to bother about ESXi firewall, it's only for ESXi host's management network it will not effect any other traffic (VM/vMotion/IP Storage, etc)..And the vmxnet adapter is for the virtual machines, it's an advanced driver with nice feature set and also virtualization aware. You can enable it by removing your existing adapters and then while readding select enhanced vmxnet type. You can get some more info on different adatper types here...worth a shot and will rule out any drivers issues!

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100180...

Also, I am assuming you've 2 vmnics (vmnic0 & 1) which are linked to 2 different vSwitches vSwitch0 & 1 respectively without NIC teaming? Although it would be good if you can post the screenshot of your network configuration and also of your VM properties, that will certainly clear things up...hth!

(On a side note, incase if you have P2V'd this server, you may want to clean up the unneeded drivers and management softwares that may interfere with your virtual drivers)

UPDATE: edited by vGuy.

0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

Map.jpg

A little history on this server, This was not a P2V, it was a V2V....it came from a ESX 2.5.4 old server(no direct path, so needed to run VMware's converter tool to move it) and this issue never happend before... Now on to the NICs.....

Internal goes to a vmnic0 (internal) goes out to a old switch (new one on order Note:Half) with desktop/laptop/wi-fi

ISP aka: vmnic1 comes from a cable modem (my ISP) to one server for NAT, web, etc...

The Vlan ID on both are "None (0)" (should this be set as a different one ? I think I tried that without a result.)

Both cards have the same settings in the network settings....as you see above in my post a few up (for NIC teaming, Traffic Shaping, etc) Both Nic cards in the VM that is having this issue is using Flexible

What else do you need to see ? I believe this is setup correctly...

Edit: I tried to give the Internal and ISP different VLAN IDs and resulted in a total failure. I could not get an IP from my ISP(vlan:1) and No machines could talk to each other(internal vlan 10).I set them both back to "None (0)" and everything works again. Am I doing something wrong here ?

Edit2: I have tried everything tonight that I could think of, even changed the NIC cards to VMXNET 2 (that and Flexible are my only options), When ever I reboot the server, I have NO CHOICE but, to delete BOTH NIC cards from Windows, let Windows find them, confirm internet is working, reconfigure private NIC card, reconfigure NAT with firewall and acceptions to the firewall rules...I have to do that EVERY TIME I reboot this server and my APC only lasts about 15min, so I need to fix this or find another VM option. It's already been once where we lost power for a 1/2 hour when I was away at another office, let's just say that was not a good day.

BTW. When the server is working and the ESXi Firewall is Enabled, it DOES cause problems (and directly effects the VM), see this post...

http://communities.vmware.com/message/2001709#2001709

When the firewall is disabled, these errors go away....(I'm no ESX pro or anything but, that shows it does effect the VM, not just for managment parts of the host)

0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

OK, I still need help here, Here is what I have done this weekend so far on this....

1.  swapped all cables (confirmed with a working tested on for my ISP connection that is having the issue).

2. Swapped vSwitches and Vnic(changed ISP from vNic1 to vNic0).

Both of these had no changes, after a reboot, I have to go through the whole 20 min process to get it back up and running.

This is becoming a major issue, so I took a major step...

Made backups of updated items (like Exchange mail boxes and home shares)then I booted up the old server that this came from,and Nuked my VM, then re-migrated the old server (where this image came from), I confirmed, No issues with like this while it was on the old server, a reboot gave everything back exactly like it was before the reboot...

After it came down, I setup the NIC cards, etc, and I gave it a reboot to test. SAME F'N problem. This is clearly a VMWARE/ESXi problem, Not my VM/server. About 3 hours waisted just to find out it does the same dam thing.

I've done everything I can think of here, It should be working. Does anyone have an idea on Why I have to blow away NIC cards every time I reboot a VM just to get connections ??? I am pretty sure everyting is setup correctly.

Does anyone have an idea here ? I'm almost at the end of my rope here....

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

DavidinCT wrote:

The Vlan ID on both are "None (0)" (should this be set as a different one ? I think I tried that without a result.)

The VLAN settings should only be set if you have VLANs configured on your physical switches, so if not you should leave all VLAN configuration blank in the ESXi host.

Edit2: I have tried everything tonight that I could think of, even changed the NIC cards to VMXNET 2 (that and Flexible are my only options),

If you only have VMXNET2 and Flexible then you will most likely have an older virtual hardware version of your VM. Check the summary field, and it is likely version 4 which means compatible with ESX 3.x. One thing that could be done is to shutdown the VM, rightclick and upgrade virtual hardware to at least version 7. This will give you more modern virtual hardware, including the VMXNET3 adapter.

After the very strange issues you are having I think a upgrade to hardware 7 and replacing the virtual nics to VMXNET3 (or e1000) would be a good next step.

BTW. When the server is working and the ESXi Firewall is Enabled, it DOES cause problems (and directly effects the VM), see this post...

The ESXi firewall should really only affect traffic in/out from the ESXi operating system itself, the Vmkernel - while all traffic in/out to the virtual machines will only be plain layer two traffic with no firewall checks even possible.

My VMware blog: www.rickardnobel.se
0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

All Vlans are set to None (0) as they should be.

I upgraded hardware to version 8 on my VM and both NIC cards are now VMXNET 3 cards.  This resulted in the same exact issue. I get everything working, Reboot the server and I lose all connectivity. The only way to get it back is to Delete BOTH NIC cards and let windows find them.   Then once internet works, confirgure private, NAT, etc...

This is one of the most annoying issue I have seen on systems and I have been working on computers for over 20 years now...

I really wish someone has a fix for this. My next stage is to backup the VMs and format the drive to reinstall ESXi (with No updates as that could of caused this). Is that really my only option ? No one else has seen this issue ?

0 Kudos
vGuy
Expert
Expert
Jump to solution

Hello David - few more suggestions here:

how are you Physical Switch ports configured (access mode or trunk mode)?

What make & model of physical NICs you're using? Kindly check if there are any updates available.

For the logs, have you checked Virtual Machine logs in the VM directory (vmware.log) for any issues?

I am personally doubtful the reinstall of ESXi would help but instead would suggest to do a fresh install of win2k3 on the ESXi and see if you are able to reproduce the issue?

0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

how are you Physical Switch ports configured (access mode or trunk mode)?

Are you talking on the VM or the server itself ? (When creating the Network in ESXi, I dont change any settings accept for the Network name, so it's all defaults). I'll double check the BIOS settings on the server when I can, just to double check settings on the Network card but, I tried swapping them already (vmnic0 to vmnic1) with the same results. No notied issues on the Internal lan, so not sure if that will help. The cable modem/ISP is plugged directly into the server, not coming from any switch or anything, nor is the cable modem a router.

What make & model of physical NICs you're using? Kindly check if there are any updates available.

The server is a HP DL380 G5 with dual Broadcom NetXtreme II BCM5708 1000base-T NICs.  I noticed drivers on HP's site but, I installed the server after the drivers came out, and updated ESXi to the current build. How can you tell in ESXi what version of drivers it's using for NIC cards(just to confirm versons) ?

For the logs, have you checked Virtual Machine logs in the VM directory (vmware.log) for any issues?

I've been looking over logs till my eyes are bleeding, No noticed issues there but, I will double check time frames of boot to see if that is the issue. What is the best logs to look at ?

I am personally doubtful the reinstall of ESXi would help but instead would suggest to do a fresh install of win2k3 on the ESXi and see if you are able to reproduce the issue?

This server is going to be decommed in the next few months, as I will be upgrading to Exchange 2010 and all 2008r2 servers but, if I can't have one 2003 server (AD/Exchange 2003/NAT/DHCP/DNS) working perfectly I dont want to move forward with that till I know ESXI is going to work here. Building a temp 2003 server for testing is an option I might try before Nuking the whole system but, I can confirm that this issue did not happen when it was on ESX 2.5.4, as I attempted this over the weekend before re-VMing the server (the server I am now using is a Brand new VM migrated with vmware's tool).

Thoughts ? (sorry about not knowing how to quote on this fourm)

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

DavidinCT wrote:

I upgraded hardware to version 8 on my VM and both NIC cards are now VMXNET 3 cards.  This resulted in the same exact issue. I get everything working, Reboot the server and I lose all connectivity. The only way to get it back is to Delete BOTH NIC cards and let windows find them.   Then once internet works, confirgure private, NAT, etc...

You really seems to have a very very strange and unusual problem.

When you lose the connections after a reboot of the VM, in what way do you notice this in the guest VM? Does it lose the IP address or does it just not work? Could you ping anything?  If not, do you get a ARP reply? (arp -a)

Does the inside network work - pinging against something internal?

It is not some issue with DHCP assigned address on the ISP side and NAT configuration that gets logically corrupt after reboot?

My VMware blog: www.rickardnobel.se
0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

Yea, I know it's a strange problem and yet I am losing sleep because of it.

As I have stated, vmnic0 comes from my ISP and vmnic1 goes to my local lan.

When I reboot the server. I can no longer ping any external sites. I've tried Yahoo, MSN, and google. I've also tried to ping them via their IP to see if it was dns, and that fails too. I can ping my ISP's gateway tho. The funny thing is, that Websites that the VM server is on, still work (so ports 443/80) from external locations. Anthing else that runs on different ports DOES NOT WORK. Only thing I can see is on the Vmware firewall port 443/80 are opened and all other firewalls are disabled in testing. So everything should be wide open but, still I lose connectivity to the internet.

Pinging local machines on my network have no issue at all, mapping drives map, manually accessing the VM and back work fine.

NAT is disabled (therefor reset in 2003 server and needs to be reconfigured) everytime I have to Delete the nic cards. If it was damaged in some way, reconfiguring it should/would take care of that but, I have nuked the NAT settings and rebooted with the same results, so I dont think it's that. DHCP hands out IP adresses fine, No errors in the event log for NAT or DHCP for both these services,

After deleting the NIC cards and letting Windows re-find them, eveything works perfectly, till a reboot happens....

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

I actually belive this is not a VMware or ESXi problem. Smiley Happy Still very strange and unclear of course, but I do think it is some logic inside the VM that is causing the problem.

DavidinCT wrote:

When I reboot the server. I can no longer ping any external sites. I've tried Yahoo, MSN, and google. I've also tried to ping them via their IP to see if it was dns, and that fails too. I can ping my ISP's gateway tho.

So after a reboot of your VM the internal network works fine, but not the external. And you can not ping any remote addresses on the internet, but the default gateway on the ISP network is reachable. What I suspect is that this is firewall related, however not on the ESXi host, this firewall has nothing to do with the packets to the VMs, so it will not affect this.

There is no firewalls on the Windows 2003 server? Not the default or any thirdparty?

Are you running Windows RRAS for NAT and other services? There is packet filtering function on this too, it might be blocking some traffic - however strange in this case. When the RRAS is configured some typical access lists are attached to the outside NIC.

Do you get a DHCP assigned address on the ISP side of the VM? Is this typically the same each time - that is, between reboots?

Can you do a tracert against a few internet addresses and collect the output into some textfile, then reboot and try again against the same addresses - to see exactly how far the traffic gets.

Also do a ipconfig /all >> ip.txt before and after the reboot.

My VMware blog: www.rickardnobel.se
0 Kudos
DavidinCT
Enthusiast
Enthusiast
Jump to solution

I get an IP fine from my ISP. Even when I have no connection. Yes, I do use RRAS but, when I boot the server with RRAS completely disabled it does the same thing (so starting with RRAS configured or not makes no difference with this issue).

My server gets an ISP when it is shut down and when it comes up it gets and IP fine. Gettting an IP is not an issue. That is one of the first things I check on boot with this issue.

All firewalls are disabled besides the ESXi one, so that made me think that was it, and only that port was open (HTTTPS/HTP 90/443 was checked).

A TraceRT will not help, I have tried this back and forth long before my first post about this here. After the reboot, a ping to google.com will result in No host found. Even with an IP of google, MSN.com or Yahoo.com will give the same result.

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

DavidinCT wrote:

A TraceRT will not help, I have tried this back and forth long before my first post about this here. After the reboot, a ping to google.com will result in No host found. Even with an IP of google, MSN.com or Yahoo.com will give the same result.

Is the error message "No host found"? It could be important what it exactly says.

You are sure that you have had an IP address of an internet host that has been working when pinging by IP before reboot and stopped afterwards? If you can not reach anything remote and tracert does not work (at all?) this could possible be:

A. the default gateway of the VM is missing or incorrect, that would give this effects.

B. some kind of firewall/packet filtering is not allowing the specific ICMP type/code combinations used by ping and tracert.

If there really is no kind of FW then I guess this could be ruled out. For the internal routing/gateway - could you do a route print for the VM while working?

My VMware blog: www.rickardnobel.se
0 Kudos
vGuy
Expert
Expert
Jump to solution

David - I am of the same opinion as Rick, it does not seem to be an ESXi issue. But something within the Guest OS (routing, ipsec, tcp/ip stack corruption, etc). I would suggest you to do a clean install of Win2k3 (and if possible Win2k8 too) with latest patches and updates and see if you are able to replicate the issue.

You can think of ESX vSwitch as a passthrough device with some Layer 2 functionalities. There is no filtering of traffic types done at the vSwitch layer.

Also, try to take a screenshot or backup of the network configuration while its working then compare that post reboot and look for any anomalies. If possible, try to run any network monitoring tools such as wireshark and look for any errors/packet drops.

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

Obaid wrote:

You can think of ESX vSwitch as a passthrough device with some Layer 2 functionalities. There is no filtering of traffic types done at the vSwitch layer.

I agree and this is important. Since the vSwitch is just a switch it will by design "work" as long as the guest will get link on their virtual NICs and the vSwitch is able to pass any ethernet frames from the outside network into the guest.

Other combinations, like certain guest VM TCP ports not available or remote network not reachable comes from something else. It is certainly still very strange and unusal what is happening on this server. I belive that the process of removing the virtual nics and re-entering them does something else inside the guest which will break the incorrect state, but it might not be the root cause.

As for Windows 2003 Server, I seem to remember some issues with IP addresses being saved inside the registry when switching network cards - perhaps something with this while the machine was V2V converted into ESXi.

EDIT: Here is a VMware KB article about this. It does not really seem to apply, but could be worth the check if any old "ghost" NICs are left behind and could be removed.

http://kb.vmware.com/kb/1179

My VMware blog: www.rickardnobel.se
0 Kudos