aj800
Enthusiast
Enthusiast

ESXi clean install then join cluster?

I had a few issues upgrading a 5.1.0 ESXi host in a cluster and have not gotten much support here or from HPE since they see no hardware issues via logs I sent them to analyze, so I think I want to do a clean install of 5.5.0 pm that host, which is what I was trying to upgrade to so that we can continue to progressively upgrade our vSphere entire environment to our goal of 6.5 (highest our hardware allows).  It had 2 issues I detailed here:

1. Booting up, it gets stuck indefinitely on this message : 'Loading ipmi_si_drv'

2. Following the same exact procedure used to upgrade the other 3 hosts in the cluster, this 4th host fails with an error: "Cannot create a ramdisk of size 384MB to store the upgrade image".

So, I have a few question for this process:  Can I use the same Custom Build HPE 5.5.0 ISO I download, which I imported into VMware Update Manager, to perform the fresh install?  How do I rejoin the cluster it was in?  Do i need to configure network & HA/DRS settings manually or is there a way to automatically configure it to match the cluster settings when it joins?  Is there a step by step guide for doing this?

Current System Specs:

vCenter Server: WIndows Server

version 5.5.0 8803859

4 x ESXi Hosts in a cluster:

HPE ProLiant DL380p, 12 CPUs x 2.892GHz E5-2667

394GB RAM

Local storage: 274GB HP serial attached SCSI Disks

Shared storage: 7 x 700GB disks in a cluster (fiber)

~50+ Running VMs, balanced

**3 of the 4 hosts: ESXi 5.5.0 3568722

Remaining Host: ESXi 5.1.0 3872664

11 Replies
a_p_
Leadership
Leadership

Are all hosts configured the same way, e.g. BIOS/UEFI, VT-x, NX/XD, Power Profile, ...?

Did you already upgrade the hosts to the latest firmware using the Gen8 SPP (do this before trying to upgrade ESXI)?

Did you run a hardware check to ensure it's healthy?

What you may try to do, is to perform the upgrade from the command line.

Assuming that you don't have any individual drivers installed, which are not included in the HPE image you may upgrade the host as follows:

  1. download the HPE ESXi Offline Bundle (.zip file), and upload it to a folder on one of the host's datastores
  2. enable SSH on the host, and connect to it using e.g. putty
  3. run esxcli software sources profile list -d /vmfs/volumes/datastore/folder/offline-bundle.zip to determine the profile name
  4. run esxcli software profile install -d /vmfs/volumes/datastore/folder/offline-bundle.zip -p profile-name --ok-to-remove --dry-run
  5. if the command succeeds, run it again without the --dry-run option to perform the upgrade
  6. after verifying that the command shows that the upgrade succeeded successfully, run reboot to restart the host

André

As a side note: Please note that DL380p Gen8 hosts are only supported up to ESXi 6.5 Update1 (i.e. no Update 2), and that HPE provides a separate PreGen9 ESXi image for this generation.

0 Kudos
sk84
Expert
Expert

A fresh installation is always an option, but sometimes it is the longest route to your destination.

If you are planning a reinstallation, you need to put the host in maintenance mode and migrate all VMs from that host to other hosts in the cluster. After that, I would recommend unmounting the fiber channel lun and temporarily forbid access for these WWPNs to the SAN. Because during the reinstallation you will see these fiber optic luns and don't want to risk accidentally formatting a lun. This is best practice.

If the host is in maintenance mode, you must remove that host from the cluster.

You can then mount the installation iso file via iLO and install ESXi from the console. Not via the Update Manager. If the installation was successful, you can restart the host and configure the basic network settings (Management Network).

Once the host has its management IP, I recommend connecting to the host via the host client (Web GUI) and putting it back into maintenance mode. You can then reconnect the host to the cluster in the vSphere (Web) client.

Regarding the HA/DRS settings: These settings are applied at the cluster level. So they are applied to each host during the cluster join.

And for the network settings and all other settings (DNS, NTP, Authentication, Advanced Settings and Data Storage) you can use host profiles if your license allows it. Or look at these two links that describe how to back up and restore ESXi configurations. But I've never tried it.

VMware Knowledge Base

https://vbrownbag.com/2013/03/quickly-backup-esxi-host-configuration-with-powercli/

And don't forget to allow the host access to your SAN again after the installation. 😉

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
0 Kudos
aj800
Enthusiast
Enthusiast

I uploaded the Offline bundle zip file to the local datastore, then connected via SSH and ran the command you recommended below:

run esxcli software sources profile list -d /vmfs/volumes/datastore/folder/offline-bundle.zip to determine the profile name (with the foldername I uploaded it to)

But it did not return any values (ran for about 5-10 seconds then gave me a new prompt)

Should this have shown a profile name to use in the next step?

0 Kudos
aj800
Enthusiast
Enthusiast

Ok so for some reason, the zip file was only about 10MB and I went back to where I got the ISO from and downloaded the Offline ZIP file there which was ~370MB and when I ran the profile cheker, it returned a profile name.  So, I ran the dry run installer from the CLI, and it returned a "dependency error" but I'm not sure what to make of it:

~ # esxcli software profile install -d /vmfs/volumes/51c41346-e5a2b2ca-61d5-6c3be5b5d7c0/OfflineBundleInstallZip/VMware-ESXi-5.5.0-Update3-3568722-HPE-550.9.6.5.9-Dec2016-depot.zip -p HPE-ESXi-5.5.0-Upd

ate3-550.9.6.5.9 --ok-to-remove --dry-run

[DependencyError]

VIB Mellanox_bootbank_nmst_4.3.0.29-1OEM.550.0.0.1391871 violates extensibility rule checks: [u'(line 39: col 0) Error validating value boolean', u'(line 39: col 0) Element vib failed to validate content']

Please refer to the log file for more details.

~ #

Any assistance would help.  Thanks

0 Kudos
a_p_
Leadership
Leadership

In this case you may want to append the --force option to the command line.

André

aj800
Enthusiast
Enthusiast

Thanks.  I ran the upgrade from the CLI and it appears it worked, however, in the client, it disconnected while it was rebooting (it was in Maintenance Mode).  Once it came back up, it did not reconnect automatically.  Is it now just a matter of reconnecting the host manually to the vSphere environment (In the Client: rt-click host --> Connect)?  Will anything be lost or was the connection not made because it was initiated from the client.  When I tried to connect, it warned of changes made outside of the client being lost once reconnected (I assume, since it was initiated via the CLI and not from the client) - so since it was upgraded, I'm not sure if that is impacted, especially since SSH was disabled again when it came back up.  Just trying to be cautious.  Also, I noted that it seemed to complete rather fast and only required the one reboot.  It usually takes about 15-20 minutes where this took only a few minutes beyond the server boot process itself.  Is there some way to verify that it was completed entirely?  Thanks.

0 Kudos
a_p_
Leadership
Leadership

it did not reconnect automatically.

That's not unusual. Trying to reconnect should work without the need to enter root credentials. The install command removed some binaries, which will be pushed from vCenter to the host again once you exit Maintenance Mode for the first time.

especially since SSH was disabled again when it came back up

That's expected, because we only enabled SSH, but didn't change the startup policy.

It usually takes about 15-20 minutes where this took only a few minutes

If ESXi is installed on HDD/SSD the install command usually takes less than 30 seconds. Maybe a minute, or two if installed on USB/SD.

Is there some way to verify that it was completed entirely?

If the install command succeeded with s.th. like "The Installation succeeded but system needs a reboot", and ESXi comes up with the new build number then you're fine.

André

0 Kudos
aj800
Enthusiast
Enthusiast

Thank you!  I was able to reconnect and patch as well, and so far it seems to be good.

However, the last problem I'm back to trying to resolve is the IPMI issue.  I thought that the upgrade and patching would have corrected that since it might have been a driver issue, so I re-enabled the IPMI option in the VMkernel -> Boot under Advanced Settings options.  I re-checked the box for that to enable it and see if it boots completely with the option enabled, then I initiated a reboot from the client (while still in maintenance mode).  When it came back up, it got stuck again at 'Loading ipmi_si_drv...'.  So, I restarted/reset the machine and entered the ESXi boot options (noipmiEnabled) with 'Shift + o' to allow it to complete the boot, went back in and disabled the option in the client Advanced Settings again, then rebooted again to make sure it stuck.  It did not.  I had to enter again the 'noipmiEnabled' option to get past that again since it kept trying to load the IPMI driver and the boot would not get past it.  How do I get the option to stick?  I keep unchecking in the client, but after a reboot, it still tries to load the ipmi driver.  Is there a way to view the configs via CLI to ensure it took the option change before a reboot?  It even shows in the notification panel  that a configuration was changed after I uncheck it and click OK.  I want to make sure that if the server goes down, it immediately comes back up without intervention.  Thanks, you've been very helpful thus far.

0 Kudos
a_p_
Leadership
Leadership

In a previous reply I've asked

Did you already upgrade the hosts to the latest firmware using the Gen8 SPP (do this before trying to upgrade ESXI)?

band you didn't answer this question yet.

HPE issues an Advisory regarding a similar issue (https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c05165100). Unless you are running the latest iLO firmware, please consider an upgrade to see whether this helps.

André

0 Kudos
aj800
Enthusiast
Enthusiast

That was it!  I flashed the iLO via the ESXi CLI to 2.61 (from 2.60).  I skipped this earlier because the others are still on 2.60 and had no issues, so I'm still a bit perplexed as to why this one specifically needed an upgrade, but at least it works and ESXi is now upgraded, patched and the server reboots cleanly.  THANK YOU.

0 Kudos
a_p_
Leadership
Leadership

Glad to hear that it's working now.

There've been a couple of issues which were fixed lately in iLO4. That's why HPE recommends to upgrade iLO to v2.61 asap.

These issues may not affect all servers, which may explain why you had an issue on just one of your servers.


André

0 Kudos