aj800
Enthusiast
Enthusiast

Connection failed between ESXi 6.7 and iLO 5 for firmware updates

Jump to solution

Setup:

Gen10 ProLiant DL series
ESXi 6.7U3
SPP version 2022.03
SUM 8.9.5
iSUT/SUT 2.9.1
iLO5 2.65

Problem:

(I've reached out to HPE for this but have not gotten a solid answer yet so maybe VMware can assist with this)

Short summary of the issue: ESXi 6.7 can't connect to iLO on itself (and vice versa), which is required to complete ESXi-specific firmware updates on an HPE ProLiant Gen10 server using the installation software

Long version:

We have several HPE Gen10 servers running ESXi 6.7U3.  We're trying to run firmware updates for those servers and HPE has drastically changed their firmware update process (using SPP packages) so that it now uses iSUT 2.9.1 (Integrated Smart Update Tools) and SUM (Software Update Manager) 8.9.5.  After fighting with it and learning the hard way that this new system is very different from before, I still have yet to complete all firmware updates available for our ESXi 6.7 hosts.

First, in the past (with 6.5 and 6.7 Gen8 systems), we attached virtual media (ISO file via URL) as a CD/DVD to the iLO console of the server (while in maintenance mode) and booted from the SPP (Service Pack for ProLiant) ISO into the SUM software it comes with and loads into memory (a lightweight SUSE Linux install with a GUI that allows "Automatic" updates or "Interactive" updates).  SUM runs an inventory of what firmware the server is currently using against those included in the package, then it makes a list of updates to choose from and then it deploys once you've selected what you want.  After a few reboots of iLO and the server, the server firmware and iLO system is updated.

Now, HPE offers SUM software that does this remotely using it's own UI or via OneView.  Here are our problems thus far:

1. SUM loaded when using the ISO virtual media (this means the ISO file we host on a web server is available and the iLO can connect to it and load it with no network or firewall issues).  However, what's new this time is that the software asks for iLO credentials before it starts running an inventory - that works.... but then it fails when it starts and tries to do the second part of the inventory which is on "Localhost" - itself, and says "Cannot connect to localhost".  It (iLO) did connect to a remotely hosted ISO file, but cannot connect to itself?  This is SUM running from memory on the same server - there aren't any firewalls or networking between iLO, the server's memory and the server firmware...right?

2. Frustratingly, HPE sent us instructions for installing and running SUM in Windows as a workaround (others have reported this issue, as well), which we we did not have since we're a Red Hat based program.  Luckily, I have a Windows VM on my laptop I was able to test-run it from, but we have several servers to update and a team to do it with, but this limits this task to me exclusively - which is not viable solution, but good enough to test, at least.

3. After installing it and figuring out how to operate it, I was able to update firmware and iLOs on a few hosts.  However, even though SUM showed each meeting the new baseline and that the installs were "done", there were still several items that had not completed and were still displayed in the Installation Queue in iLO, and after running inventory again on each.  It wasn't clear and took a while to figure out that these were VMware specific firmware items (Broadcom, QLogic, etc.) stuck as "Pending" in the installation queue, but I couldn't figure out why it wasn't installing them since, included with SUM is iSUT (Integrated Smart Updates Tools), which is supposed to run from SUM while deploying the firmware.  It showed a warning in SUM that "iSUT was not running on the OS".

More research led me to download and install SUT (you can download it as a installable VIB zip package file separately from the SPP ISO package) to each ESXi host via ESXCLI, then set the operating mode for it and reboot to start the service.  It's a gigantic pain if this has to be done on each ESXi server.

4.  After doing all that and getting SUT working on the ESXi host, it managed to finally start and install 3 of the ~12 items in the queue that were "Pending", but not all.  When running commands in the ESXCLI to configure SUT (setting username/password for iLO, setting the operating mode to AutoDeploy or OnDemand, and checking the status of SUT to see those configs), each time, it shows the following error:

"Communication to iLO failed. If iLO is configured in any of the higher security modes, then use sut -set ilousername=<username> ilopassword=<password> to set the iLO credentials. If iLO is in CAC mode, then use sut -addcertificate <path_to_certificate_file> to set the certificate details"

The iLO credentials are good, but it still can't communicate with iLO from the ESXi CLI, which I suspect is the same problem when running SUM from the ISO loaded on the localhost to itself (it runs a version of SUT called iSUT, which is "integrated" into the ISO SPP/SUM package loaded into memory to perform updates), and which I suspect is the problem for why it won't finish the updates to the VMware-specific firmware stuck "Pending" in the installation queue in iLO.

 

What is between ESXi and iLO that prevents communication between them?  Please assist, anyone who has done this before or can help.

Labels (10)
0 Kudos
1 Solution

Accepted Solutions
aj800
Enthusiast
Enthusiast

UPDATE:

So I figured out what the issue was - perhaps VMware folks can make a note of this in documentation somewhere:

Short answer: the iLO password (we are using the local 'Administrator' account for this SPP deployment task) cannot have a leading hyphen in it.  ESXi interfaces with iLO at some point during the update process (and vice versa) to deploy VMware-specific updates and requires iLO credentials, but doesn't like passwords starting with a hyphen.  It stores those credentials in iSUT or SUT which is an OS program used to update the OS-specific firmware.

Long answer:

HPE includes iSUT with the SPP package (full version or a customized build package).  When you boot the server from the ISO attached to the iLO console (either as a virtual media URL or direct ISO file), a lightweight version of Linux is installed to the system's memory (I believe it is SUSE Linux), which also loads and runs a lightweight version of Firefox to run it's GUI of SUM (Software Update Manager 8.9.5) in the server console.  After you select 'Automatic' or 'Interactive' updates and select Firmware to begin, it will prompt you for the server's iLO credentials before it begins running the inventory checks of the server's current firmware against it's baseline in the SPP package (base/full or the custom one you build on HPE's site).  If that password is incorrect, it will tell you immediately and re-prompt you to enter the credentials.  We were using the correct password, so it would continue on to "Discover" the host and then perform the first inventory checks, but it would fail at the second step of the inventory process.

The error message received was shown in a screenshot previously posted in this thread, but it implies that there is a network or a firewall blocking a connection to the server.  This sent us on a "wild goose chase" trying to figure out why iLO can't communicate with the server it is running on, though it CAN communicate with a remote server serving the ISO file.

I installed SUM on a remote host (Windows 10 VM) and added this "Node" (iLO)... and it "Discovered" the server iLO interface successfully and even managed to update the HPE server firmware remotely after running the inventory.  HOWEVER, after running the updater several times, it came back sometimes as meeting the baseline, but oddly, running it again later showed that there were upwards of 11 or 12 updates that were also still "Pending" in the iLO's Installation Queue, and there were warnings in SUM for that host which showed that iSUT (Integrated Smart Updates Tools) was not running on the OS.  My understanding of iSUT is that it deploys from the SPP attached to SUM, then installs to the server and runs as a service to allow the firmware updates... but for some reason, it was not running in the OS (ESXi), though SUM completed several other firmware updates.

I tried to install SUM directly on the ESXi server via ESXCLI, and had success since the service was registered and showed as running.  But changing the config to "AutoDeploy" and checking the SUT status kept showing the following error before still showing the results anyway, and that it was running:

"Communication to iLO failed. If iLO is configured in any of the higher security modes, then use sut -set ilousername=<username> ilopassword=<password> to set the iLO credentials. If iLO is in CAC mode, then use sut -addcertificate <path_to_certificate_file> to set the certificate details
The configuration changes for the command will be saved once the details are provided"

So iSUT wasn't running per SUM's warning, and now SUT running directly on the host wasn't "communicating to iLO", even though I had set the iLO Administrator password correctly in both environments (you set the Node/iLO credentials in the remote SUM UI, and set iLO creds in the ESXi SUT installation via CLI as shown above).. AND I had also noticed in the iLO UI that all the "Pending" firmware left in the Installation Queue to be installed was VMware-specific firmware/drivers.  So, I figured SUT (running on ESXi) or iSUT (running from SUM) both couldn't communicate with iLO for the VMware-specfic dependencies only, which means something was funky with the communication between ESXi and iLO.  The only thing I could think of to address was the password since there's no networking between them:

Our last few iLO account password rotations all began with a hyphen, and I had even previously checked VMware's and HPE password requirements, which ours met, since it worked when logging in and completing some of the updates.  It was allowing the SPP process to continue, and would say it was incorrect when it was entered incorrectly... so I had no idea, initially, that a leading hyphen in the password would prevent the communication between ESXi and iLO (apparently, ESXi interfaces with iLO and vice versa, but the credentials are stored in ESXi's SUT/iSUT configs and applied only when updating VMware specific firmware).

VMware even recommends using special characters - specifically noting that a hyphen was allowed, but there's no mention of having a hyphen as the first character of the password, and this is iLO's Administrator password anyway, not ESXi's root password.

Once I changed it and ran everything again, the Pending items were finally detected by SUM and installed or cleared.  I was even able to run the whole process directly from the ISO loaded to the iLO console, as we normally would have, but SUM worked without the iSUT warnings, as well.  After rebooting, SUM showed the server to meet the SPP baseline.

View solution in original post

5 Replies
compdigit44
Enthusiast
Enthusiast

Nice write up of your issue, and it is odd. Have you seen this article yet? https://support.hpe.com/hpesc/public/docDisplay?docId=a00065962en_us&docLocale=en_US

0 Kudos
scott28tt
VMware Employee
VMware Employee

If you want help from VMware you should open a support request, this is primarily a user forum.

 


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
0 Kudos
aj800
Enthusiast
Enthusiast

Thanks for that link.  I was hoping for a miracle, but that didn't work either.  I even set iLO to allow third-party firmware and got the same thing.

0 Kudos
aj800
Enthusiast
Enthusiast

Unfortunately, our VMware support is through our HPE contract so we have to go through them and escalate, which I may try to do anyway since this seems to involve VMware software configuration, as well, or at least joint efforts with the SPP, iLO and ESXi software all working together.  It's more of a HPE SPP thing since, I would assume, since this worked fine previously, at least with Gen8 and ESXi 6.5.  HPE hasn't been much help so far.  It's been a while since I've done SPP updates on ESXi systems.

0 Kudos
aj800
Enthusiast
Enthusiast

UPDATE:

So I figured out what the issue was - perhaps VMware folks can make a note of this in documentation somewhere:

Short answer: the iLO password (we are using the local 'Administrator' account for this SPP deployment task) cannot have a leading hyphen in it.  ESXi interfaces with iLO at some point during the update process (and vice versa) to deploy VMware-specific updates and requires iLO credentials, but doesn't like passwords starting with a hyphen.  It stores those credentials in iSUT or SUT which is an OS program used to update the OS-specific firmware.

Long answer:

HPE includes iSUT with the SPP package (full version or a customized build package).  When you boot the server from the ISO attached to the iLO console (either as a virtual media URL or direct ISO file), a lightweight version of Linux is installed to the system's memory (I believe it is SUSE Linux), which also loads and runs a lightweight version of Firefox to run it's GUI of SUM (Software Update Manager 8.9.5) in the server console.  After you select 'Automatic' or 'Interactive' updates and select Firmware to begin, it will prompt you for the server's iLO credentials before it begins running the inventory checks of the server's current firmware against it's baseline in the SPP package (base/full or the custom one you build on HPE's site).  If that password is incorrect, it will tell you immediately and re-prompt you to enter the credentials.  We were using the correct password, so it would continue on to "Discover" the host and then perform the first inventory checks, but it would fail at the second step of the inventory process.

The error message received was shown in a screenshot previously posted in this thread, but it implies that there is a network or a firewall blocking a connection to the server.  This sent us on a "wild goose chase" trying to figure out why iLO can't communicate with the server it is running on, though it CAN communicate with a remote server serving the ISO file.

I installed SUM on a remote host (Windows 10 VM) and added this "Node" (iLO)... and it "Discovered" the server iLO interface successfully and even managed to update the HPE server firmware remotely after running the inventory.  HOWEVER, after running the updater several times, it came back sometimes as meeting the baseline, but oddly, running it again later showed that there were upwards of 11 or 12 updates that were also still "Pending" in the iLO's Installation Queue, and there were warnings in SUM for that host which showed that iSUT (Integrated Smart Updates Tools) was not running on the OS.  My understanding of iSUT is that it deploys from the SPP attached to SUM, then installs to the server and runs as a service to allow the firmware updates... but for some reason, it was not running in the OS (ESXi), though SUM completed several other firmware updates.

I tried to install SUM directly on the ESXi server via ESXCLI, and had success since the service was registered and showed as running.  But changing the config to "AutoDeploy" and checking the SUT status kept showing the following error before still showing the results anyway, and that it was running:

"Communication to iLO failed. If iLO is configured in any of the higher security modes, then use sut -set ilousername=<username> ilopassword=<password> to set the iLO credentials. If iLO is in CAC mode, then use sut -addcertificate <path_to_certificate_file> to set the certificate details
The configuration changes for the command will be saved once the details are provided"

So iSUT wasn't running per SUM's warning, and now SUT running directly on the host wasn't "communicating to iLO", even though I had set the iLO Administrator password correctly in both environments (you set the Node/iLO credentials in the remote SUM UI, and set iLO creds in the ESXi SUT installation via CLI as shown above).. AND I had also noticed in the iLO UI that all the "Pending" firmware left in the Installation Queue to be installed was VMware-specific firmware/drivers.  So, I figured SUT (running on ESXi) or iSUT (running from SUM) both couldn't communicate with iLO for the VMware-specfic dependencies only, which means something was funky with the communication between ESXi and iLO.  The only thing I could think of to address was the password since there's no networking between them:

Our last few iLO account password rotations all began with a hyphen, and I had even previously checked VMware's and HPE password requirements, which ours met, since it worked when logging in and completing some of the updates.  It was allowing the SPP process to continue, and would say it was incorrect when it was entered incorrectly... so I had no idea, initially, that a leading hyphen in the password would prevent the communication between ESXi and iLO (apparently, ESXi interfaces with iLO and vice versa, but the credentials are stored in ESXi's SUT/iSUT configs and applied only when updating VMware specific firmware).

VMware even recommends using special characters - specifically noting that a hyphen was allowed, but there's no mention of having a hyphen as the first character of the password, and this is iLO's Administrator password anyway, not ESXi's root password.

Once I changed it and ran everything again, the Pending items were finally detected by SUM and installed or cleared.  I was even able to run the whole process directly from the ISO loaded to the iLO console, as we normally would have, but SUM worked without the iSUT warnings, as well.  After rebooting, SUM showed the server to meet the SPP baseline.