VMware Cloud Community
darkweaver87
Contributor
Contributor
Jump to solution

VMkernel doesn't work anymore after rebooting

Hello,

I have some troubles using VMotion.

Here is our configuration:

- 2 ESX Infrastructure 3.5 servers;

- on each server we have 2 redondant interfaces (A and B) for network default connection (iSCSI + VMs : it will be splitted later with an other network card) ;

- on each server we have 2 redondant interfaces (C and D) for VMotion which are directly connected on the other ESX ;

- the 2 servers are added on a DataCenter with vCenter (HA and DRS enabled) ;

- consequently on each server we have one virtual switch with a VMkernel and Service Console for default network (0: AB) and one other for VMotion (1: CD).

Our problem is the following. When we reboot one or both servers the VMkernel doesn't work any more. Both servers ping each other on the VMotion interface but vmkping doesn't work. Finally, we delete the vSwitch1 and recreate it with the same parameters randomly on one of severs and it seems to solve the problem.

We tried to retry the same scenario few times and we get the same problem each time.

We look at /var/log/boot.log and /var/log/vmkernel and, for us, there is no problem.

Any idea ?

Reply
0 Kudos
1 Solution

Accepted Solutions
rlee
Enthusiast
Enthusiast
Jump to solution

Test 1 = Test VMotion with clean vswitch and 1 NIC. (Both hosts)

I would say start by deleting vSwitch1 and all associated port groups. Create a new vswitch. Under that vswitch, create a vkernel port group. Call it VMotion (just to identify it uniquely). Enable vmotion on it. Add only vmnic2 to the vswitch. Hard code that NIC to 1000 Full. If this setup works, you're laughing. Add the second port group for "Service Console 2" and test again. If adding the vmnic1 breaks it, it is the cause. I would start by following these recommendations rather than just removing the NIC.

I am guessing that "Service Console vMotion" is your second service console for HA redundancy... Take it out for now. Simply your setup for troubleshooting. Use only one NIC. I personally don't like to create bonds with disimilar cards.

Test 2 = Test vmkernel routing/networking question.

The vmkernel default gateway for the VMKernel is allowing NetAppcommunications in and out of the ESX host. The VMotion is using cross over cables. If these two items are in conflict it may be the cause of the problem. Can you connect your vmotion to the network instead of direct cross over cables? The network vmotion connects to should be able to use the same vmkernel default gateway as NetApp.

Start with a simpler config. If you can provide a simpler config for VM disks like local storage and remove the NetApp, that would be helpful to remove the question of vmkernel routing. Today I don't use NetApp. The vmkernel that is configured is used for vmotion and the default gateway is the same IP as the vmotion port group. So vmotion=10.0.0.23, vmkernel def gw = 10.0.0.23.

View solution in original post

Reply
0 Kudos
14 Replies
rlee
Enthusiast
Enthusiast
Jump to solution

Not sure what you mean by "Both servers ping each other on the VMotion interface but vmkping doesn't work. "

It sounds like you have not enabled vmotion on the VMotion port groups. If it is setup correctly you should be able to use vmkping not ping to test the vmotion.

When you believe things are working correctly, have you been able to use vmotion?

If everything is configured correctly, try removing one of the NICs out of the vmotion team on both hosts. Start with a simpler configuration and work up from there.

kjb007
Immortal
Immortal
Jump to solution

Are you vmotion/default/vm networks all the same network, or different networks?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
weinstein5
Immortal
Immortal
Jump to solution

What are the IP addresses of the vmkernel ports enabled vmotion? how are the vmkernel ports connected to each other?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos
Texiwill
Leadership
Leadership
Jump to solution

Hello,

In order to use VMotion you must be able to vmkping the other hosts using their VMotion assigned IP address. In addition, the vSwitch portgroup you are using for VMotion must be marked as 'VMotion'.

I would once more go through and reset up VMotion verifying that you have setup the portgroup on the vSwitch to by used for VMotion and that the networks are properly connected to each other. Note, if you are using VLANs, they need to match on each side.


Best regards, Edward L. Haletky VMware Communities User Moderator, VMware vExpert 2009, DABCC Analyst[/url]
Now Available on Rough-Cuts: 'VMware vSphere(TM) and Virtual Infrastructure Security: Securing ESX and the Virtual Environment'[/url]
Also available 'VMWare ESX Server in the Enterprise'[/url]
[url=http://www.astroarch.com/wiki/index.php/Blog_Roll]SearchVMware Pro[/url]|Blue Gears[/url]|Top Virtualization Security Links[/url]|Virtualization Security Round Table Podcast[/url]

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

That's my problem. VMotion is enabled and when I try :

on ESX1 ifconfig vswif2 --> 192.168.1.1/24

on ESX2 ifconfig vswif2 --> 192.168.1.2/24

on ESX1:

ping 192.168.1.2 --> it works

wmkping 192.168.1.2 --> it doesn't work

When vmkping works, yes, I migrated vms without problem. I tried which started and stoped vms.

I tested with one NIC but it have still the same behaviour Smiley Sad

Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

Yes.

vmotion: 192.168.1.0/24 no VLAN

default: 139.1.1.0/8 VLAN 1

vm: 192.168.2.0/24 VLAN 10

Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

on ESX1: 192.168.1.1/24

on ESX2: 192.168.1.2/24

ESX1 and ESX2 are directly connected with two redondant crossed cables.

Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

Hello,

I checked an hundred times and VMotion is marked as "Enabled" on both servers.

I think it's correctly configured and I have no problem to migrate VMs.

Then, when I reboot I'am not able to migrate VMs anymore. Very curious.

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

Can you post the output of 'esxcfg-vswitch -l' and 'esxcfg-vmknic -l' ?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
rlee
Enthusiast
Enthusiast
Jump to solution

I'm getting confused when I see you using ping and vswif1. For checking vmotion, I wouldn't use these two. You should be using tools and interfaces that are used for the vmkernel.

Can you please post your output from vimsh for vmotion?

1. At command prompt enter vimsh

2. hostsvc/vmotion/netconfig_get

You should get ouput like this:

(vim.host.VMotionSystem.NetConfig) {

dynamicType = <unset>,

candidateVnic = (vim.host.VirtualNic) [

(vim.host.VirtualNic) {

dynamicType = <unset>,

device = "vmk0",

key = "key-vim.host.VirtualNic-vmk0",

portgroup = "vmotion",

spec = (vim.host.VirtualNic.Specification) {

dynamicType = <unset>,

ip = (vim.host.IpConfig) {

dynamicType = <unset>,

dhcp = false,

ipAddress = "192.1681.1",

subnetMask = "255.255.255.0",

},

mac = "00:50:56:xx:xx:xx",

},

port = <unset>,

}

],

selectedVnic = <vim.host.VirtualNic:key-vim.host.VirtualNic-vmk0>,

}

3. type quit to exit.

In this case the vmotion is going through the vmk0 device. This is the first vmkernel interface.

Verify your vmkernel interfaces with esxcfg-vmknic -l.

Is everything as it should be?

An output of your vswitch setup may also be helpful: esxcfg-vswitch -l; esxcfg-nics -l.

Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

Hello,

In reply of both of you I join some text files which contain command outputs of what you wanted me to execute.

Everything seems to be as I configured with vCenter.

I tried to execute the same commands after rebooting one of the two ESX servers and I got strictly the same outputs.

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

Since you're using a direct-connect cable between the two hosts, can you force your NICs to the same speed/duplex?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
rlee
Enthusiast
Enthusiast
Jump to solution

Test 1 = Test VMotion with clean vswitch and 1 NIC. (Both hosts)

I would say start by deleting vSwitch1 and all associated port groups. Create a new vswitch. Under that vswitch, create a vkernel port group. Call it VMotion (just to identify it uniquely). Enable vmotion on it. Add only vmnic2 to the vswitch. Hard code that NIC to 1000 Full. If this setup works, you're laughing. Add the second port group for "Service Console 2" and test again. If adding the vmnic1 breaks it, it is the cause. I would start by following these recommendations rather than just removing the NIC.

I am guessing that "Service Console vMotion" is your second service console for HA redundancy... Take it out for now. Simply your setup for troubleshooting. Use only one NIC. I personally don't like to create bonds with disimilar cards.

Test 2 = Test vmkernel routing/networking question.

The vmkernel default gateway for the VMKernel is allowing NetAppcommunications in and out of the ESX host. The VMotion is using cross over cables. If these two items are in conflict it may be the cause of the problem. Can you connect your vmotion to the network instead of direct cross over cables? The network vmotion connects to should be able to use the same vmkernel default gateway as NetApp.

Start with a simpler config. If you can provide a simpler config for VM disks like local storage and remove the NetApp, that would be helpful to remove the question of vmkernel routing. Today I don't use NetApp. The vmkernel that is configured is used for vmotion and the default gateway is the same IP as the vmotion port group. So vmotion=10.0.0.23, vmkernel def gw = 10.0.0.23.

Reply
0 Kudos
darkweaver87
Contributor
Contributor
Jump to solution

Like you said ... I'm laughing ... I've forced both interfaces on both servers to 1000 Full and it works.

More over, it has made me think that the direct crossed cable is not very justified any more ...

Thanks for both of you !

PS: sorry but I can't mark "correct answer" for both of you Smiley Sad

Reply
0 Kudos