HendersonD
Hot Shot
Hot Shot

Host reboot and IP addresses are lost

New HPE DL380 Gen10 servers running the HPE customized ESXi 6.7 software

Management is setup on standard switch, DVswitch used for VM, iSCSI, and vMotion traffic

VMkernel ports are all setup for iSCSI and vMotion with iSCSI binding showing all storage volumes

Reboot a host and the IP address information for the iSCSI and vMotion is gone! The IP address for the management network stays just fine

Not sure if the problem lies with the setup of these new servers (BIOS or NIC settings) or with the HPE customized version of ESXi

Any ideas?

19 Replies
Vijay2027
Expert
Expert

Is the host booting from SAN??

Can you verify if bootbank and altbootbank on ESXi host are fine.

0 Kudos
HendersonD
Hot Shot
Hot Shot

The host are not booting from SAN, they are booting from the internal SD card that came with the servers. Not sure how I verify the bootbank on an ESXi host

0 Kudos
SureshKumarMuth
Commander
Commander

Can you check if the host is still properly connected to DVswitch also check the events related to DVSwitch when the host is connected back to DVSwitch after reboot. This issue is specific to DVSwitch port groups while the standard switch works fine.

How many hosts are facing this issue or this is the only host with issues ?

Regards, Suresh https://vconnectit.wordpress.com/
0 Kudos
Vijay2027
Expert
Expert

ssh into ESXi host run ls -ltrh and share output please.

0 Kudos
HendersonD
Hot Shot
Hot Shot

I am working with a consultant on this work. Last week we opened a ticket with HPE support and VMWare support. He spent about two hours on Friday with a VMWare support engineer. Come to find out that this is a known issue with ESXi 6.7 on DL series servers. VMWare is working on a patch to fix this issue, until then I cannot put these into production.

0 Kudos
HendersonD
Hot Shot
Hot Shot

Here is the output you are looking for. I did notice the second line below (bootbank -> /tmp), not sure if that is an issue

pastedImage_0.png

pastedImage_1.png

0 Kudos
Vijay2027
Expert
Expert

So any config changes will not be persistent post reboot in cases where bootbank points to tmp.

We can workaround this issue by adding conig parameters to local.sh (/etc/rc.local.d)

How is the host booting (SD card or local HDD) and any status update from VMware Support?

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

0 Kudos
HendersonD
Hot Shot
Hot Shot

How does a fresh install of ESXi 6.7 on a brand new HPE server end up with bootbank -> /tmp?

This server is booting from an 8GB dual microSD USB Drive

0 Kudos
HendersonD
Hot Shot
Hot Shot

Since these servers are not into production yet, I could to a reinstall of ESXi 6.7. How do I do this install so bootbank does not point towards /tmp which appears to be the root cause of my problems?

0 Kudos
msripada
Virtuoso
Virtuoso

I don't think reinstall esxi is an option. If it is an SD CARD in which esxi is installed then possibly the controller or the card might be bad which can lead to this issue.. I would suggest to upgrade your bios firmware to the latest available and reseat sd card.. If you are still facing issues I would reckon to check if the card can be reinstalled or get the hardware vendor to check the controller or card health..

Thanks,

MS

0 Kudos
Vijay2027
Expert
Expert

I would say try powering-off the ESXi host, reseat USB drive and check bootbank.

0 Kudos
HendersonD
Hot Shot
Hot Shot

We purchased 7 of these HP DL380 Gen10 servers. They are identical except that three of them have nVidia Tesla M10 cards in them. We have rebooted two of these hosts and both have lost their IP address information. I am guessing (have not tested) that if I boot any of them, the same behavior will be exhibited.

0 Kudos
HendersonD
Hot Shot
Hot Shot

The HPE servers boot ESXi from an 8GB Dual MicroSD Flash USB Drive

https://www.hpe.com/uk/en/product-catalog/servers/flash-media-devices/pip.hp-dual-8gb-microsd-enterp...

We use Nimble storage and Nimble provides a plugin that gets loaded directly on each host. This Nimble Connection Manager (NCM) plugin manages pathing from a host to the Nimble array. When NCM is installed on a host with one of these USB sticks, there is a problem. Nimble storage has isolated the problem and this is what I got from them:

"This isn't a issue with the model or ESX.  Due to the setting which is required for FC implementations and the way the USB device is enumerated.  It causes the device to be removed since it appears to  be presented via multiple different lun_IDs when it should be the same."

They are working with VMWare on a fix but it sounds like it could take some time to find come up with a solution. In the meantime we are not using the NCM plugin and all is working fine

marod
Contributor
Contributor

HendersonD, thank you very much for posting this, apparently this is still an ongoing issue (as your post was from August of 2018). I e-mailed Nimble Support and provided this information to them and they came back with the following info:

I traced that case and looked into the escalation of that case, we are still working on this issue currently.

Please do not install NCM at the moment.

We do not have a public document mentioning this problem.

I too had two installations for my customers, both had HPE DL360 Gen 10 Servers and these 8GB Dual MicroSD Flash USB Drives.  Both installations also involved the installation of new Nimble Storage arrays.  It didn't make sense why the bootbank and altbootbank properties were being lost upon reboots, but thanks to your post, I now know why this is happening.  Upon rebooting, either manually or applying updates via Update Manager, the USB-Embedded device (shown below) would disappear, both from the ILO device inventory and subsequently from VMware (i.e. configure - storage adapters).

pastedImage_0.png

pastedImage_0.png

The first of the two installations was done a few months ago and I didn't see this post then so I opted to convert the servers to use HPE Smart Array controllers and SAS disks and I reinstalled ESXi and all is well.  For my second installation, which I'm working on now and after having found your post, I've had to reload each ESXi server and not install the NCM.  Removing the NCM wasn't the fix either, at least for me, as upon removal and reboot, the USB device was unavailable.

After reloading ESXi on these servers and not installing the NCM, they seem to be holding steady.

0 Kudos
treb94
Contributor
Contributor

Marod, i share the same grief with you. It is confirmed by Nimble the issue has been identified and they will release NCM 6.1 soon (in 1 or 2 months).

At this time being, I just removed the USB and luckily enough there is an SD card slot near that USB area on DL 360 Gen10 and put the SD card inside SD card slot.

It been working with NCM since then.

0 Kudos
serveradminist2
Contributor
Contributor

please done latest firmware excluding NIC.

0 Kudos
treb94
Contributor
Contributor

hi serveradministrator,

Which firmware level are you advising ? My DL 360 is on the latest.

It is just a bug on NCM. Can you advise further on your finding ?

Thank you

0 Kudos
HendersonD
Hot Shot
Hot Shot

NCM 6.0 was released in the last two days and I was hoping this issue would be solved. The release notes still mention this under Known Issues and Nimble support verified that NCM 6.0 does not fix this long standing issue. HPE servers that boot from a USB drive will not work with Nimble Connection Manager (NCM). The fix is scheduled to be a part of NCM 6.1 which is many months off. I may change my boot method from USB based to SD or microSD based

0 Kudos
treb94
Contributor
Contributor

Agree with you...

microSD card works fine with my setup.

https://infosight.hpe.com/InfoSight/media/software/active/2/236/nimble-ncm-for-esx6.5-6.0.0-650005.z...

That's the link for NCM6.0 if anyone fancy

0 Kudos