VMware Cloud Community
Mikeluff
Contributor
Contributor

Virtual Machine Network Fails to Connect?

I've got the following issue on two VM's which have been converted from physical to virtual. For some reason when they 1st boot up the network doesn't kick into life, it's set to connect at startup and it is connected as far as it's concerned. ipconfig command shows it to be connected and IP address is set etc, however you can't ping anything. To get it to come to life I have to open the VMWare Tools from the task bar, select devices, untick NIC 1 click apply - re-tick NIC 1 and apply again. One I have done this it all works...

Anyone have any idea why it's doing this?

0 Kudos
10 Replies
Mikeluff
Contributor
Contributor

Ok figured out what was going on - needed to delete the old non-present devices from the VM.

This HAS to be done from the cmd prompt.

set devmgr_show_nonpresent_devices=1

cd\%systemroot%\System32

start devmgmt.msc

You might have to select show hidden devices from the view menu.

Delete old/missing network cards and anything else you want from the old host system and then reboot.

0 Kudos
Digitus
Contributor
Contributor

I have exactly the same problem. When I reboot a virtual guest it very rarely comes back onto the network properly. The VM isn't pingeable from anywhere and from within the vm it is unable to ping it's default gateway. When this happens I have to go through the same steps as you mentioned and needless to say that isn't very workable.

I first saw this on ESX 3i, and we then upgraded to ESX 4i to see if that would solve the issue, but it didn't. I tried to do the removal of devices, but there weren't any so this didn't make any difference to the guest VM. I've had our networks team investigate the switching side and the server but they can't find anything wrong either.

This is driving me nuts, so I'm desparate for any help. If people would like I can elaborate on the setup.

Best regards

Digitus

0 Kudos
Mikeluff
Contributor
Contributor

I thought I had fixed the issue but I was wrong, what I have found it that when I remove a hidden device and re-boot the network comes back to life at boot (which is what I want), but after the next reboot it fails to connect again. It's like something is being detected which doesn't exist...

I have a call open with VM about this, but nothing much has come back from what I have already tried.

0 Kudos
Digitus
Contributor
Contributor

Thanks for your reply Mike. If you get a solution then let us know. This is driving me up the walls. :smileyblush:

Digitus

0 Kudos
Mikeluff
Contributor
Contributor

I have added a link to this thread in my support call for them to reference so maybe they will contact you as well.

0 Kudos
Digitus
Contributor
Contributor

Mikeluff

I just had a thought. Do you use iSCSI on your system?

Digitus

0 Kudos
Mikeluff
Contributor
Contributor

No - all FC.

0 Kudos
Digitus
Contributor
Contributor

Hi Mikeluff

Together with our networks team, I think we've managed to tie this down. First a bit about our setup:

We have two Virtual hosts each with a number of network cards but both have a quad card which all the VMs are connected to. Two of the ports go to one switch and the ohter two go to a second switch using EtherChannel. On the ESX host side all these four vmnics are connected to one vSwitch thus providing resilience and throughput.

Initially the vSwitch was configured to use "Route Based on originating Virtual Port ID" in the vSwitch Teaming section., and the switch set up was fairly standard. At this point VMs were "loosing" their networks on both hosts whenever the VMs rebooted.

We tried to only run the connections to one switch, but that made no difference.

In the end our networks team did some deep investigation and found the following:

Cisco Switch Config

You need to set up the 802.3ad link aggregation with mode "on" in the interfaces, and both the port-channel interface and the ethernet interfaces require an exactly matching list of allowed VLANs. Its definitely best to ensure you're trunking only the VLANs you need to the server, and indeed to use where possible dedicated VLANs only for virtuals, to reduce broadcast / multicast traffic heading into the VMWare host.

In connection with them setting up switchport nonegotiate and spanning-tree portfast trunk, the vSwitch was also changed to "Route based on IP Hash" in the vSwitch Teaming section.

This solved the problems on one of the hosts but not the other one, which was very strange.

In the end after a lot of frustration, I found that on the non-working host each portgroup had gone into an "overide vSwitch" setting state at some point, so any changes that was made on the vSwitch were effectively ignored. When all the overides were taken off, the second host started working correctly as well and VMs now all appear on the network correctly following a reboot.

Here is the Cisco switch configuration.

interface Port-channel1

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 100

switchport mode trunk

switchport nonegotiate

spanning-tree portfast trunk

interface GigabitEthernet0/1

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 100

switchport mode trunk

switchport nonegotiate

channel-group 1 mode on

spanning-tree portfast trunk

interface GigabitEthernet0/2

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 100

switchport mode trunk

switchport nonegotiate

channel-group 1 mode on

spanning-tree portfast trunk

VMWare Vswitch Config

Edit the Vswitch properties, and set "Load Balancing" to "Route based on IP Hash". (Essential to do this)

I hope this helps you as well Mikeluff...

Best regards,

Digitus

0 Kudos
Mikeluff
Contributor
Contributor

Thanks for the info - we don't use the Etherchannel method here yet but we do plan to move to it and will be testing in the next week or so in our test environment.

I did some testing with VMWare today, did a repair on VMWare Tools - rebooted and the network connected. Rebooted again and it disconnected. Rebooted several times, sometimes it came up others it didn't - I'm thinking it must be related to driver issues or something.

Out of interested what hardware had to converted from?

Thanks

Mike

0 Kudos
Digitus
Contributor
Contributor

I was seing the issue on fresh builds as well as on VM that were converted in from Windows Virtual server 2005. So for us the drivers would have been very clean, thus that wasn't the issue for us... The fact that it sometimes works and other times don't "feels" like the VM is switching between working and non working network interfaces on the host machine.

Digitus

0 Kudos