VMware Cloud Community
dventet
Contributor
Contributor
Jump to solution

Attempted vSwitch config change, and now ESXi host and VMs offline

So I have no doubt I have myself to blame for this one. I was attempting to change my vSwitch from 100 mbps to 1000 mbps when an "operation timed out" error popped-up and my vCenter Server lost connection with my ESXi host. I tried rebooting the host manually but that didn't help. The host isn't pingable and all of the VMs on that host are offline; none of them vMotion off to my other ESX server.

I tried reconnecting the host in vCenter but it's obviously not connecting since it can't ping it. I logged on to the ESXi console and all the network settings appear correct. I'm used to logging in to the ESX console where you can run commands, but this is my first ESXi host so I'm not sure what else I can do.

Any help is greatly appreciated!

-D.

0 Kudos
1 Solution

Accepted Solutions
Dave_Mishchenko
Immortal
Immortal
Jump to solution

If you had your management port and virtual machines on different vSwitches / NICs the leave on setting would eliminate a false positive. As in your case the the management port would have gone done due to the cable / speed issue, but the virtual machine network would have been untouched (and I'm assuming your storage as well). In that case the VMs would have worked fine even though the management port was not working.






Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

View solution in original post

0 Kudos
8 Replies
Dave_Mishchenko
Immortal
Immortal
Jump to solution

If you're running ESXi 4.1 login to the DCUI, go down to Troubleshooting Options and enable Local Tech Support Mode. Then press ALT+F1 and login with the root account. You can then run esxcfg-nics to set the speed for the nic.

For any prior version of ESXi you can just press ALT+F1 to login.




Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

dventet
Contributor
Contributor
Jump to solution

Thanks Dave.

I'm actually back up and running. I took a look atthe back of the ohost and none of the lights on the nic were on. So I swapped the cable to a different port on the nic - stil nothing. Then I tried a different port on the switch - yet again nothing. Then I tried another cable on the original nic/switch port combo and it immediately worked. Not believing that it was the cable, I reconnected the original cable and it lost connection.

So long story short, it appears to have something to do with the cable. I tested the cable on another computer and it seems to be working fine!?!?! Plus, when I reconnected the host I noticed the vSwitch config change to 1000 mbps did succeed.

So now I'm scratching my head...

a) Why doesn't ESXi like that cable and why was it working for so long until I change the speed of the virtual switch? (I checked the switch and there weren't any unusual errors/warnings.

b) Why didn't any of the VM's vMotion off? I have HA and DRS set to fully automated.

-D.

0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

I'm not sure about the cable. Is it a quality Cat5 or + cable?

Vmotion requires communication to the ESXi host management port and over the port designated for vMotion. vMotion is co-ordinated by vCenter so it has to be able to communicate with both hosts during the transfer. DRS relies on vMotion. HA doesn't require communication with vcenter (once you have it setup). Check the settings for the cluster and see what the isolation response is. It may be set to leave powered on.




Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

0 Kudos
dventet
Contributor
Contributor
Jump to solution

Hi Dave,

The original cable was a Cat6, the new one I found is a Cat 5.

The host isolation reponse for VMs is set to leave powered on. Wouldn't that be the setting I want?

I think the reason vMotion failed is because the IP for my ESXi host Management Network was the one that was not pingable. I think I have vMotion configured incorrectly in the event of a failure. I remember running into difficulties with this becaue I have one ESXi host and one ESX host. I was getting errors on the ESXi host because there is no Service Console Port to enable. I can't remember the exact error but it was complaining because there was a mismatch betweeen hosts. I think I need to set the Management Netwrok IP to a non-routable address much like the Service Console Port is set, so if the actual host IP is unavailable vMotion will still occur.

I've attached screen shots of the network config between my ESX and ESXi hosts for reference.

Thanks,

-D.

0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

Between your hosts and vCenter you need a management port (vmkernel for ESXi / SC for ESX). On the vMotion network you just need a vmkernel port.

With the leave powered on option, should a host be isolated (which in this case it did) the VMs stay running. If you want them to be restarted on another host you have to pick a power down option.




Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

0 Kudos
dventet
Contributor
Contributor
Jump to solution

OK, so aside from the actual reason for the network failure, the setting that "did me in" was the cluster host isolation setting for leaving the VM's powered on. I changed this to Powered Off so that they will vMotion off in the future.

Looking at my network settings, it appears I have everything configured correctly.

When would it be beneficial to have the host isolation response set to Leave Powered On? Maybe if you had redundant physical switches with multi-pathing? Else, I would think for the most part when a host becomes isolated the network connection would be one of the prime culprits, and without teamed NICs your VMs are also offline.

Thanks,

D.

0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

If you had your management port and virtual machines on different vSwitches / NICs the leave on setting would eliminate a false positive. As in your case the the management port would have gone done due to the cable / speed issue, but the virtual machine network would have been untouched (and I'm assuming your storage as well). In that case the VMs would have worked fine even though the management port was not working.






Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

0 Kudos
dventet
Contributor
Contributor
Jump to solution

Thanks for clarifying, Dave!

0 Kudos