VMware Cloud Community
gogogo5
Hot Shot
Hot Shot

Strange ESXi 4.1 IP Addressing Issue

Hello

We have experienced a strange ESXi 4.1 IP addressing anomaly where the ESXi host's management IP address is changed to the vMotion address.

In our lab we have 2 ESXi hosts with 6 pNICs. All pNICS are configured on a single standard vSwitch. All pNICS are Active (no Standby or Unused) and configured for Route Based on IP hash. The 6 pNICS are uplinked to a Cisco 3120 stack where Etherchannel is configured. All works ok so far.

We have found that if the ESXi host has assigned vmk0 to vMotion and vmk1 to Management, upon rebooting the ESXi host the vMotion IP address is shown as the IP address (i.e. the management address) on the DCUI!!!! When using the DCUI to change the IP address back to its correct IP address, when exiting from the IP configuration screen and restarting the management network the IP address changes back to the vMotion address.

As mentioned earlier, we have narrowed this down to the vmk numbering. If vmk0 is assigned to vMotion and vmk1 assigned to Management then this issue occurs. For ESXi hosts that have assigned vmk0 to Management and vmk1 to vMotion this issue does not occur. To fix the issue we have to remove the vMotion and Management portgroups, then re-create the Management port group which then claims vmk0 and all is ok.

So the question we are trying to answer is why/how ESXi changes the Management vmkernel port from vmk0 to vm1. We built the hosts ESXi manually so its safe to assume that vmk0 would have been allocated to the Management vmkernel port as this is the first vmkernel port to be created. Is this expected behaviour to have vmk0 take higher priority? Should it matter which vmk number is assigned to Management anyway?

I have seen other users post similar observations and it seems Host Profiles is the culprit. I am surprised VMware have not chimed in to these previous posts considering the impact. See:

http://communities.vmware.com/message/1521966#1521966

and

http://communities.vmware.com/message/1597396#1597396

I will be logging a call with VMware but would like to hear if anyone who has ESX 4.1 running and has vmk0 assigned to vMotion and vmk1 assigned to Management to reboot the host and report the IP address on the DCUI.

Cheers

gogogo5

Reply
0 Kudos
29 Replies
bulletprooffool
Champion
Champion

Is this on all hosts, or hosts one host?

it could be that the bootbank files on the host are corrupt and you making the change, which continues to work until reboot. The reboot then reverts to previous settings as the changes are effectively not saved.

have a look at the size of your bootbank volume and the amount of space

available on it.

It may well be that the filesystem has a problem and so your hourly

backups are unable to write to bootbank, so configs are lost at reboot.

Have a look at this SUPER useful doc . .

solved my near identical issue.

One day I will virtualise myself . . .
Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

If you add / remove / move some PCI cards this can generate some changes in NIC enumeration.

But I suppose that is not your case, so it's a strange problem.

Should it matter which vmk number is assigned to Management anyway?

No.. but it's nice to have homogeneous configuration (to keep simple document and manage).

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
gogogo5
Hot Shot
Hot Shot

Hello bulletprooffool

The problem is related to the order behaviour of vmk0 and vmk1. When the vmkernel port is assigned to vmk0, it takes precedence over vmk1 i.e. if the vMotion vmkernel port is assigned vmk0 then the vMotion IP address becomes your Management IP address, as verified on the DCUI logon screen.

It is easy to reproduce on a test ESXi 4.1 host:

1. Using the vSphere Client, delete the vMotion vmkernel port.

2. Using Local TSM delete the Management vmkernel port.

3. Using Local TSM, create a new vMotion vmkernel port and assign the original IP address and enable it for vMotion. This step will assign vmk0 as this is the first vmkernel port on the host.

4. Using Local TSM, create a new Management vmkernel port, assign the original IP address and enable it for Management. This step will assign vmk1.

5. Reboot your host.

6. Look at the DCUI and see what IP address shows.

Can anyone help verify this?

Reply
0 Kudos
hharold
Enthusiast
Enthusiast

We are facing the same issue on ESXi 4.0 hosts.

We do the configuration through Host Profiles, and are also getting vmk0 bound to the VMotion port group and vmk1 to the Management interface.

The DCUI is showing the VMotion address and not the management address.

If we go into the Configure Management Network option, (in the DCUI) we see the addresses and settings from the VMotion port group.

Perhaps more important the host registers its VMotion IP-address in the /etc/hosts file

We suspect that this is causing some strange HA errors occuring.

Anyone has any idea how to solve this?

Regards,

Harold

gogogo5
Hot Shot
Hot Shot

hharold - we are getting the same thoughts its Host Profiles that cause this. We did a few tests in our lab and found that if you make a change to both the vMotion and Management ports (admittedly not a regular occurrence but that's besides the point) in the Host Profile and apply the profile the vMotion and Management ports get destroyed and re-created. Its this re-creation process that binds vMotion to vmk0 and the ensuing IP addressing anomaly.

Want to see something even weirder? When you have ESXi hosts exhibiting this behaviour, go to the DCUI and set the Management IP to DHCP. This clears the IP settings and assuming you don't actually have DHCP on your management network, you'll end up with an address of 0.0.0.0. Then, in the vSphere Client (either connected to VC or host directly) you'll magically see the vMotion IP address go to 0.0.0.0. But, didn't we just change the Management IP address??

I have also seen HA issues. Come into the office and all your VMs are dimmed with "not responding" or "disconnected" status.....sigh....(we have HA set to Leave VMs Powered On to give us some hope).

I have logged a SR with VMware so I'll keep you informed....

-gogogo5

Reply
0 Kudos
gogogo5
Hot Shot
Hot Shot

I logged a SR with VMware and received this reply:

-


Thank you for your Support Request.

As we discussed in the call, this is a known issue with the product ESXi 4.1, ie DCUI shows the IP of vmk0 as Management IP, though vmk0 is not configure for Management traffic.

Vmware product engineering team identified the root cause and fixed the issue in the upcoming major release for ESXi. As of now we don't have ETA on release for this product.

The details on product releases are available in http://www.vmware.com.

Based on our last communication, it appears that this case is ready to be closed. I will proceed with closing your Support Request today.

-


I then asked if there was a workaround and received the following (but it doesn't work, for me anyway):

-


Here is the workaround to get correct IP reflected as Management IP in DCUI.

By design DCUI looks for the "ManagementIface" from esx.conf file. To fix the issue "DCUI always shows vmk0 IP address as Management IP"

We have to go for the following workaround.

1. Keep ESXi host has assigned vmk0 to vMotion and vmk1 to Management Network.

2. Execute the following command.

esxcfg-advcfg -s vmk1 /Net/ManagementIface 3. reboot the host.

4. Ensure that the management IP in DCUI is the IP of vmk1.

-


Have fun!

gogogo5

Reply
0 Kudos
mjamal
Contributor
Contributor

gogogo5. so vmware support acknowledges that there is an issue in ESXi 4.1 but no word on when it will be fixed!!. that workaround you provided also didn't work for us either!! It will be good to hear on this further from everyone else.

Mo

Mo
Reply
0 Kudos
kresimir_pirkl
Contributor
Contributor

Hi everyone.

I had the same issue today.

After applying host profile, vmk0 becomes VMkernel port and vmk1 becomes Management port.

I've managed to solve the problem by manually editing Host Profile and reapplying it after that.

The problem is order that vmk ports are created and applied.

I've renamed VMkernel port to Management and management to VMkernel port, and edited option with IP address setting in "new" VMkernel port.

Important is that Management port is the First port in order (so it could get vmk0 port); VMkernel should be second (so it could get vmk1 port).

After that, I've put host in maintenance mode, reapply host profile, and the problem was solved.

Find some screenshots attached.

Regards, Kresimir.

mjamal
Contributor
Contributor

Excellent work around. Well done. We fixed it by assigning vmk nics manually to mgmt and vmk ports.

Mo
Reply
0 Kudos
RParker
Immortal
Immortal

The problem is order that vmk ports are created and applied.

This looks more like a host profiles bug rather than ESXi itself. We have a few ESXi servers, this has never been a problem, so it must a result of setting / configuring host profiles for those servers.

Reply
0 Kudos
gogogo5
Hot Shot
Hot Shot

kresimir.pirkl - this sounds promising and thanks for your contribution. Can you do me a favour and update your profile from your reference host and let us know if this changes or resets anything in the now, updated, host profile.

Cheers

gogogo5

Reply
0 Kudos
gogogo5
Hot Shot
Hot Shot

RParker - this is actually a bug in the way ESXi treats the assignment of vmk ports and their purpose. It shouldn't matter whether vmk0,vmk5 or vmk12 etc is assigned to "management" but we have found that if "management" is not assigned to vmk0 then problems ensue. As detailed in my previous replies, when vMotion assigned to vmk0 then the IP address of vMotion is shown at the DCUI. This clearly shouldn't happen and in my eyes is a bug.

Your ESXi hosts you refer to that have never had this problem, check out the vmk assignment and I bet all your Management vmkernel ports have claimed vmk0?

Depending on what your host profiles will do to bring your host(s) into compliance, if the profile needs to destroy and recreate both the "management" and "vMotion" port groups at the same time then Host Profiles will create the vMotion port first which, since no vmks have been assigned at this stage, will be assigned vmk0 followed by Management on vmk1. This causes the DCUI to show the vMotion IP address and causes HA havoc Smiley Wink

This is easy to replicate even without using Host Profiles - if you have a test ESXi 4.1 host, use the CLI to remove Management and vMotion vmkernel ports then create the vMotion vmkernel first followed by Management so that when you run a esxcfg-vmknic -l you see vMotion on vmk0 and Management on vmk1. Once this is done, flick to your DCUI screen and what do you see?

So, whichever vmk is assigned to Management, the fact that it is "management" should be the dictating/priority thing here not what vmk number is allocated.

Cheers

gogogo5

Reply
0 Kudos
kresimir_pirkl
Contributor
Contributor

Hi gogogo5.

There is my "edited" host profile. I've changed port groups so that Management Network is firstly created and then the other two port groups.

During the "apply host profile" wizard you will be prompted to manually enter IP addresses (this can be also edited in host profile template).

Verify all "configuration changes that will be applied on the host" before you click "finish" button.

I've reapply host profile on 8 production hosts, without any problem, only order of vmk ports was changed.

Please be advised to change NTP server to your custom IP (in host profile template).

This host profile template is made for Test environment only, and I take no responsibility for any misuse.

Some screenshots are attached.

I hope this will help you to properly edit your own host profile.

Regards, Kresimir

Reply
0 Kudos
SuperSpike
Contributor
Contributor

Wow. Great job on this, everyone. VMware needs to write a KB article on this issue ASAP if they don't plan on fixing it until the next major release. Given that they have the audacity to charge extra for host profiles, it's completely unacceptable that they don't plan to fix it right away. Just another VMware product rushed to market.

@Virtual_EZ
Reply
0 Kudos
mtnbiker5
Contributor
Contributor

This works great! I had 13 hosts that had vmk0 and vmk1 flipped thus setting the management network to my vMotion network. I placed the hosts in Maintenance mode and applied the "manipulated" Host Profile and everything is back to normal. Thanks to everyone!

Reply
0 Kudos
davidw_davis
Contributor
Contributor

I'm also having issues with this but don't have Host Profiles licensing.

I've got two ESXi hosts with 8 network adapters each. I assign vSwitch0 to vmnic0 and vmnic1 with the VM Network role. I assign vmnic2 and vmnic3 to vSwitch1 for iSCSI traffic, vmnic4 and vmnic5 to vSwitch2 for vMotion traffic and vmnic6 and vmnic7 for VM Management traffic. DCUI is showing vSwitch1 ip info as the management interface on both hosts. I haven't been able to manually change this. When I edit the host via DCUI and change the management network adapters from vmnic2 and vmnic3 to vmnic6 and vmnic7, in vSphere vmnic6 and vmnic7 get assigned to vSwitch1 while vSwitch3 has no adapters.

So it seems the best workaround is to assign vSwitch1 containing vmnic2 and vmnic3 to the VM Management network. Hopefully this will clear up my HA issues!

Reply
0 Kudos
SuperSpike
Contributor
Contributor

vSwitch order and vmnic uplink order really shouldn't have anything to do with it. The key is the order in which the vmkernel ports are created. The "Management Network" vmkernel port for your ESXi host has to be vmk0 (check the Config tab, Networking, Virtual Switch). If your iSCSI IP is vmk0, that's what's going to show up in the DCUI. If it's not in that order, blow all of the vmkernel ports away and configure networking from scratch in the DCUI. This assumes that the ESXi hosts aren't in production of course :smileylaugh:

@Virtual_EZ
Reply
0 Kudos
davidw_davis
Contributor
Contributor

vSwitch0 is vmk0, correct?

My iSCSI IP is configured on vSwitch1 but the DCUI shows the iSCSI IP as the management IP and also shows the management network as vmnic2 and vmnic3 which are assigned to vSwitch1.

Reply
0 Kudos
SuperSpike
Contributor
Contributor

No, the vSwitch number has nothing to do with vmkernel (vmk) port number. NICs are uplinks for virtual switches. Think of vmkernel ports as ports on a physical switch, except in this case they are virtual switches.

Use the vcenter client to connect to your ESXi host. Then click the Configuration tab, then the Networking link under "Hardware". You should see all of your vswitches listed here. For each vSwitch, you should see different port groups or vmkernel ports. Each vmkernel port will have a name. Beneath the name you should see vmk0, vmk1, etc. depending on how many vmkernel ports you have.

I suspect that your iSCSI vmkernel port is assigned to vmk0, which is why it's showing up as the host's IP in the DCUI. Typically the first vmkernel port that you add is set as vmk0.

Unfortunately the only way to correct this is to blow the vmkernel ports away and start from scratch, making sure you create the "Management Network" vmkernel port first.

@Virtual_EZ
Reply
0 Kudos