wolfwolf
Contributor
Contributor

ESXi lost configuration on reboot

Hi,

Recently we've installed ESXi 4.1 Update 1 embedded version on the internal SD card (SanDisk 2GB) for a few Dell PowerEdge M610 blade servers using the ESXi recovery CD from Dell, also available for download from the VMware site:

http://downloads.vmware.com/d/details/dell_esxi_recoverycd/ZHcqYip3ZWJkKmV3

Yesterday we rebooted the servers to make sure everything would come back up.

To our surprise when ESXi come back up on each server, the whole configuration was gone: IP address, server name, password, everything was lost, the configuration was back as if we just installed ESXi.

Any ideas why that happened?

We did a proper reboot for ESXi: one of the servers was rebooted using vCenter, the others were rebooted using the ESXi DCUI using the F11 key.

Before the reboot, one of the ESXi server was up with its configuration for a few days and was managed through vCenter, the others were installed an hour or so before the reboot and weren't connected to vCenter yet: no matter how long they were up, all of them lost the configuration after the reboot.

And we've been unable to reproduce the issue so far, we've reconfigured ESXi on a couple of those servers today and rebooted them from the ESX DCUI: each of them came back online just fine this time, the configuration is still there.

We didn't reconfigure all of them, let me know if there is anything we can check on those still untouched after the reboot to tell how come they lost the configuration.

Thanks.

0 Kudos
45 Replies
krishnaprasad
Hot Shot
Hot Shot

I think it's okay to still use the Dell ISO image with an additional reboot after first bootup. The configuration loss will happen only on for the first boot.

To know what is extra in Dell Customized images, i think you can refer Dell support website. They have documented what's additional available in Dell Customized image. This is how i could see the information. Might be helpful to you.

http://support.dell.com/support/index.aspx?c=us&l=en&s=biz  --> Select  “Drivers & Downloads” section --> Choose the System model that you have by going to “Select Model” tab  --> Select “Servers, Storage, networking” tab from ‘Select your product family’ section --> Click on ‘PowerEdge Server’ --> Select your product Model ( Eg/- R910 ) --> Confirm

See the attached screenshots where i could see the info.

For fixing the issue in Dell ISO, You can call Dell Tech support . you may hve got tech support numbers / mail id when you purchased servers from them?

0 Kudos
Dave_Mishchenko
Immortal
Immortal

I'd go with the dell iso as well.   The issue you've had with the ISO will be a one time thing.

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

forgot to mention that "Additional Information" section carries the information on what's extra in Dell Customized images. Refer the screenshot ( attached ) as well for 4.1 Update1

0 Kudos
wolfwolf
Contributor
Contributor

It looks like the only driver from the Dell ISO missing from the VMware ISO is the Intel 10G NIC ixgbe driver and you can download version 3.x of in from the VMware site, while the Dell ISO contains older version 2.x. Not sure what are the practical advantages of the newer version from VMware site, if any.

Is it still better to go with the Dell ISO?

In case, we decide to go with the VMware ISO and add the ixgbe driver to it using the vihostupdate utility, will the driver remain in place when we apply future ESXi updates to the hosts?

Thanks.

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

I dont think it's just one driver update in the Dell image. if you see the screenshots that i had attached, the 'additional information' section talks about different driver updates/inclusions. For example  bnx2 driver version available in Dell Customized image is later version compared to VMware ISO.  and 'qlge' driver included in the image is not part of VMware image by default. Need to really see what happens on an upgrade.

0 Kudos
wolfwolf
Contributor
Contributor

Yes, the Dell ISO contains other customizations, but since we're only going to use the 10G Intel NIC with ESXi, the other drivers shouldn't matter.

And for the 10G Intel NIC the driver available as a separate download on the VMware site (version 3.x) is actually newer than the one available from the Dell ISO (version 2.x).

But I'm still not sure installing that driver manually on a stock VMware installation is a good choice, e.g. if future ESXi updates will retain it.

And I'm not sure how to test that either, since there are no updates available for ESXi 4.1 Update 1 yet.

Anyone tested this in the past?

Thanks.

0 Kudos
DSTAVERT
Immortal
Immortal

I would go with the Dell ISO. A later driver version may not be the most appropriate version or the tested and supported version from Dell's persective. Before considering using the VMware downloaded driver I would check with Dell support. I would also suspect that the Dell version of the ISO would include Dell specific CIM providers.

-- David -- VMware Communities Moderator
0 Kudos
krishnaprasad
Hot Shot
Hot Shot

The OM providers are not included in Dell Customized image. it can be separately downloadable and installed on the ESX/ESXi images.  You can see this information from the 'additional information' section screenshot that i had poster earlier ( From Dell Support website ) .

I think you can even use the drivers posted @ VMware website since it might be tested by VMware or the driver vendor before posting it in VMware website. I second Dave and DSTAVERT for continued usage of Dell ISO since the issue will be present only on the first bootup.

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

This issue can be even reproducible with VMware ISO.

1. Let's say We have VMware ESXi 4.1 Update1 ISO installed on a system.

2. Boot into the USB key/LUN where the image is installed.

3. Make some changes to the configuration ( Eg/- Change the password of ESXi host )

4. Install an offline bundle/VIB  ( eg/- any Driver package available @ VMware.com ) using vihostupdate/esxupdate command sets.

5. Make the configuration changes again in the system ( eg/- Create a vSwitch, Register a Virtual Machine to the host etc )

6. Reboot the server

7. Boot into ESXi.

You can see that the changes made like vSwitch creation, Registering VM to the host are lost where as password change is persistent.

i.e. whatever changes made AFTER execution of esxupdate command is found be to lost.

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

The reason of the failure is because when a package is installed/uninstalled on ESXi, the changes are made to the alternate bootbank ( where updated=2 in boot.cfg so that this becomes /bootbank in next bootup ). The changes made between the updates (package install/Uinstall) and reboot are written into /bootbank only. So it is not available during next bootup.

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

it looks like Dell released the updated ISO image which carries the fix for the issue discussed in this thread. See the screenshot attached.

0 Kudos
devilz666
Contributor
Contributor

This is seriously %$$^$^^'ed up

Who do we send the bill to at Dell for the time lost in this stuffup?

Take a few vmware admins per hour charges and time lost and im sure this will cost them a wee bit

I had this server fully configed, joined to luns/sans/multipathed/networking etc etc

Then a nice reboot a few hours later all gone! Later i found this stretch of comments on the problem...

If they are going to release preconfiged mission critical software can they at least do some testing of it first?

Dont vmware have any say in testing of their software thats been modified and released?

Imagine the hassel if a poweroff/on wasnt tried until a later date! (of course you always do a poweron/off test but im sure most admins wouldnt imagine the entire config gone on a single reboot!)

0 Kudos
krishnaprasad
Hot Shot
Hot Shot

You can see the problem in VMware native image as well. Here are the steps.

1. Install VMware ESXi 4.1 Update1 image

2. Boot into ESXi for the first time.

3. Change a configuration setting ( let's say set password )

4. Install a package in ESXi ( for example an async driver package available in VMware website )

5. After successful installation of the package, set some other configuration ( eg/- Set the IP to a static 192.168.1.5 IP )

6. Reboot the server

After reboot, you can see that the IP is set to the old one where as password changes is retained.

This is actually a design w.r.t VMware ESXi. i.e. when ever there is a package gets installed, it updates altbootbank. So what ever configuration changes that you do after installing a package and before reboot WILL BE WRITTEN ONLY to /bootbank but NOT to /altbootbank... The second boot will take /altbootbank as the current bootbank and hence you see that the configuration is lost.

But actually the configuration is not LOST. it's still there in /bootbank.

This scenario came across in Dell Customized ISO images since they have a package installed during first boot to support their storage array. and hence you see a failure in Dell image alone.

Hope it's clear.

Thanks,

Krishnaprasad

0 Kudos
jjgunn
Enthusiast
Enthusiast

Great. So until this is fixed, I've been going to console, configuring networking, password, etc. rebooting, (lose everything) then set it up again, reboot again and it's ok.

Thanks I think I got the installation down pat now. :smileylaugh:

0 Kudos
jbarbee
Contributor
Contributor

I just got this error on my install on a Dell 2950 server, using the Dell ISO of ESXi v4.1.  I assumed it had been rebooted before now (it is a new installation), as I have been working on this system over a week now.  But perhaps I didn't.  When I did, I too came up with the blank configuration. 

So using the "Shift-R" at the Hypervisor boot screen, it apparently overwrote the latest "blank" config, with the only other configurations I had located in "Hypervisor2".  This restored everything back to pre-reboot status.

I did a quick list of the any "state.tgz" in /vmfs/volumes/Hypervisor?/ and only 1 and 2 are updated.  1 is newer than 2, and 2 is the pre-reboot config.

Now that I need to know I need to reboot after initial config after a fresh installation I will, but for those, like me that got bit, the "Shift-R" could help you out as it did me.

Thanks for the thread.

0 Kudos
vonsch
Contributor
Contributor

Has anyone validated that this issue does not persist after the first time that the server is rebooted?

0 Kudos
Dave_Mishchenko
Immortal
Immortal

When I tested this with the original Dell ISO image it was a one time thing.

Dave
VMware Communities User Moderator

Free ESXi Essentials training / eBook offer

Now available - VMware ESXi: Planning, Implementation, and Security

Also available - vSphere Quick Start Guide

0 Kudos
Seemannger
Contributor
Contributor

If is passthrough configured for USB-Device disabled this, because SD-Cards are bound via USB

0 Kudos
jjgunn
Enthusiast
Enthusiast

The latest Dell ISO (Version A03) I have didn't experience the one time lost config bug Smiley Happy

http://support.dell.com/support/downloads/download.aspx?c=us&cs=555&l=en&s=biz&releaseid=R301608&Sys...

Noticed this is version A04. I'll assume A04 is working since the A03 version I'm using no longer has the bug.

Thanks for fixing this Dell.

0 Kudos
SDBinSF
Contributor
Contributor

ESXi 4.1u1 (Dell customized ISO) lost configuration on reboot.

I experienced something very much like this, but using much simpler components:

I downloaded the Dell customized ESXi 4.1u1  VMware-VMvisor-Installer-4.1.0.update1-348481.x86_64-Dell_Customized.iso

I downloaded that iso image on April 29th from the VMware site.  (After the update from Dell?)

I installed ESXi on a Dell 2950, configured the basics, including the free license, and proceeded to install a CentOS 5.6 server, with VMware Tools etc.  A colleague and I put about 3 days worth of effort into this.  Then I shut down the CentOS server; put the host into maintenance mode, shut it down, unplugged it and moved it to a different room.  When I booted the host (physical server) in its new location, it came up with no IP address (0.0.0.0), no root password, and no virtual servers in its inventory.  As another user said: that is seriously #!$@d up. 

Luckily, no reformatting had been done and the server image was still on the disk, so all we had to do was add the virtual server image back into the inventory.  (In the host configuration tab, click storage in the hardware navigation pane, and then right click the storage device to "browse datastore" and add to inventory.)  Our datacenter manager bailed me out of this situation, though he had never seen or heard of this problem. 

My situation had nothing to do with USB cloning or anything else fancy; just simple installation from a DVD, direct attached storage, and plain vanilla configuration.

Having done nothing but add the lost configurations, the machine can now reboot and keep its configuration.   In 10 plus years of light sysadmin work (not my main job), I have never seen anything like this.

0 Kudos