flyerguybham
Contributor
Contributor

Urgent Help Needed - Lost all external connectivity to ESX host

I've just lost all external connectivity to one of my ESX hosts.

The last thing I was doing was trying VMotion. It complained that I could not do VMotion because this origin ESX host had no valid network adapter for the VMKernel. I looked at its Network Adapters from VI3 client, and saw that indeed, it had one virtual switch to which VMKernel was connected but no adapter.

I figured that what it was trying to do was to have the VMKernel on the origin ESX server talk to the VMKernel on the destination, so what I tried to do was to add the adapter that leads to the LAN to which both ESX servers are connected. The moment I did this, BAM, VI3 could no longer communicate with the ESX host.

I'm sitting at the ESX host console, and I can see that the vswif0 interface is not receiving any packets.

How do I undo this mapping from VMKernel to vswif0, given that I only have command line access?

0 Kudos
16 Replies
conradsia
Hot Shot
Hot Shot

you can remove the vmkernel using the esxcfg-vmknic command from the console.

You can use the esxcfg-vswitch command to remove the VMkernal portgroup from the vSwitch.

0 Kudos
flyerguybham
Contributor
Contributor

I -think- what I am trying to ask is how can I move interface vswif0 from one virtual switch to another from the command line. Thanks.

0 Kudos
conradsia
Hot Shot
Hot Shot

you can remove the service console using esxcfg-vswif.

0 Kudos
conradsia
Hot Shot
Hot Shot

what you need to do is:

remove the current vswif

remove the portgroup from the vswitch

create a new portgroup on the other vswitch using esxcfg-vswitch

add a new vswif onto that portgroup using esxcfg-vswif

if you need exact commands let me know

0 Kudos
flyerguybham
Contributor
Contributor

This has been immensely helpful. I managed to re-associate vSwitch0 (where the Service Console is) with vmnic0 (the nic connected to the LAN). I can now ping out from the Service Console. I have not yet re-established full contact with VC. It seems to see the ESX host now, but can't quite reconnect to it. Let me keep debugging. Thanks so much for the help thus far.

0 Kudos
flyerguybham
Contributor
Contributor

I'm rebooting the ESX server - I've hacked on it pretty hard, I want it to come up clean before I continue debugging.

I think what happened is this:

I tried to associate VMKernel's virtual switch, vSwitch3, that wasn't associated with any adapter, to vmnic0. This "stole" vmnic0 from vSwitch0, where the Service Console and some VMs were. vSwitch0 had no redundant path out, that was the only NIC.

So now I've essentially had vSwitch0 re-"steal" vmnic0 back from vSwitch3. And I'm rebooting. Let's see what happens when it comes back up.

0 Kudos
flyerguybham
Contributor
Contributor

Ok, after reboot I was able to reconnect to the host from VI client. I did get this error on the reboot of the ESX host:

Jun 19 17:18:40 lion esxcfg-route: E

Jun 19 17:18:40 lion esxcfg-route: rror: Unable to set VMkernel gateway address.

Please verify your IP settings and try again

Jun 19 17:18:40 lion vmware: Setting VMkernel gateway failed

Ignoring that for the moment, I restarted my VMs and they appear to be coming back online fine (save for the one that is still scanning its disks, since a check hadn't been performed in over 150 days.... Smiley Happy

I am a much happier camper now. Any thoughts on the remaining error above and what I have missed?

0 Kudos
conradsia
Hot Shot
Hot Shot

Using the VI client you need to set the vmkernel gateway on the esx server. Click configuration | dns and routing and edit the gateway entry for your vmkernel.

If it is greyed out, edit the networking properties and then edit vmkernel, you should be able to add the default gateway there.

0 Kudos
Faustina
Enthusiast
Enthusiast

Are you able to ping to your gateway from your esx server service console.

The error only suggests that there is no route to the configured gateway from the service console and so it is unable to set the gateway.

Try reconfiguring the gateway in the esx :

Choose Configuration > in the software block select "DNS and routing" > Now select proeprties on right hand top corner > select the routing tab and enter the correct gateway address.

Let me know how it goes.

also note that only one default gateway can be configured per TCP/IP stack.

0 Kudos
flyerguybham
Contributor
Contributor

It errored out when I tried to set it to the same value I have on my other ESX host.

There is an odd config difference between the two in Configuration -> Networking.

My servers each have 4 NICs. One (vmnic0) goes to the "main" LAN, one (vmnic1) goes to a Windows subnet, and the other two (vmnic2, vmnic3) are teamed for a separate iSCSI LAN.

On the "broken" ESX Host

\----


vSwitch0 --> vmnic0

Virtual Machine Port Group "VM Network" (with bunch of VMs)

Service Console Port "Service Console"

vSwitch1 --> (vmnic2, vmnic3)

Service Console Port "Service Console for iSCSI"

VMKernel Port "iSCSI"

\*** I think the problem is here ***

vSwitch2 --> (no adapter)

Virtual Machine Port Group "VMKernel" (yes, it's named VMKernel, but seems to be a VM port group???)

vSwitch3 --> (vmnic1)

Virtual Machine Port Group "VM Windows Network" (with a few VMs)

On the "good" ESX Host

\----


vSwitch0 --> vmnic0

Virtual Machine Port Group "VM Network" (with bunch of VMs)

Service Console Port "Service Console"

vSwitch1 --> (vmnic1)

Virtual Machine Port Group "VM Windows Network" (with a few VMs)

vSwitch3 --> (vmnic2, vmnic3)

Service Console Port "Service Console for iSCSI"

VMKernel Port "iSCSI"

So I see these differences:

- The virtual switch for the Windows VMs are in a different order (vSwitch1 vs. vSwitch3) - I don't think that matters much

- The good server does not have a vSwitch2 or an adapter with a Virtual Machine Port Group called "VMKernel"

When I try to remove the portgroup "VMKernel" from the adapter-less vSwitch2 on the broken server, I get "Unable to remove portgroup VMKernel for the following reasons: VM Kernel NIC". However, if it is essential, why doesn't the good server have one of these?

On the good server, the default gateway for the VMKernel is correctly set to our main LAN's gateway. However, trying that same value on the bad server, I get "Unable to set VMKernel gateway address."

The host seems to be working ok, the VMs are up. However, I suspect that VMotion will not work until I resolve this issue, eh?

0 Kudos
flyerguybham
Contributor
Contributor

Are you able to ping to your gateway from your esx

server service console.

Yes I can ping to the gateway from the service console. The service console is attached to vSwitch0 which is bound to the vmnic0 that is connected to that LAN.

Try reconfiguring the gateway in the esx :

Choose Configuration > in the software block select

"DNS and routing" > Now select proeprties on right

hand top corner > select the routing tab and enter

the correct gateway address.

This all seems to be correct -except- for the vmKernel route. When I try to set it to the same as that of the Service Console, it errors. This implies that the vmKernel has no path to the default gateway, which is true looking at the networking config I just posted - BUT, on the good server, there also is no path, and yet it took the setting fine there.

0 Kudos
flyerguybham
Contributor
Contributor

By the way, if you are wondering why my config is a little screwy to begin with, I had a lot of trouble configuring iSCSI back in the winter. I had a Dell AX150i that I was trying to get working so I had a lot of different configs going. I think the weird vSwitch2 on the broken host might be a remnant from that time. However, now that I actually -did- get the iSCSI working, I want to start using vMotion, which is how I hit this problem in the first place today. When I tried to vMotion from broken host to good host, I get:

"Unable to migrate: the vMotion interface is not configured or is misconfigured on the source host."

Since I know it's the VMKernel TCP/IP stack that handles vMotion, I imagine this is all related.

0 Kudos
flyerguybham
Contributor
Contributor

I need to step out for a couple of hours, but let me just express my deep gratitude for the very quick response to my initial post. Six minutes! Fantastic community-based support network. Sincerely appreciate it!

I want to keep working on this vmKernel/vMotion problem and look forward to your ideas, I will try them out ASAP either this evening or in the morning.

0 Kudos
conradsia
Hot Shot
Hot Shot

In order to remove the vm kernel portgroup from the "bad server" you first need to delete the vmknic that is associated with it using esxcfg-vmknic. Once it is removed you can remove the portgroup. verify this using esxcfg-vmknic -l to see if you have more than one vmkernel nic.

After thinking more about it, the entry that is showing up ad VMkernel is probably the actual VMkernel and not a vm portgroup. Delete it using the VI client.

Vmotion will not work until you get a working vmkernel on a vswitch with a network adapter and you enable the vmkernel for vmotion with the VI client in the vmkernel properties and you set the gateway. I would put the vmkernel on the same vswitch as your non-iscsi console, and then you should be able to set your gateway.

Since you have VI client access again, you can probably do this all from the VI client now. Just remove the vmkernel and portgroup from vswitch 2 and add one back to vswitch0, and be sure to enable it for vmotion, after the creation of the vmkernel you should be prompted to add a default gateway for your vmkernel.

it would help also if you posted the following if the above doesn't work:

esxcfg-vswitch -l

esxcfg-vmknic -l

esxcfg-vswif -l

Message was edited by:

conradsia

0 Kudos
flyerguybham
Contributor
Contributor

I just got finished trying your suggestions. I am very happy to report that everything you suggested worked exactly as you suspected it would. I also can now report that I have successfully vMotioned a VM back and forth a few times.

Thank you so much for your expert guidance this afternoon and evening. You saved my @ss earlier when I had the outage, and helped me solve this vMotion item that has been a longstanding TODO as well. I owe you a few beers.

0 Kudos
conradsia
Hot Shot
Hot Shot

sweet,

I love beer but these guys need it more than I do, if you want to treat some beer send it to these guys....

HEADQUARTERS U S MARINE CORPS

PERSONNEL MANAGEMENT SUPPORT BRANCH (MMSB-17)

2008 ELLIOT ROAD

QUANTICO, VA 22134-5030

0 Kudos