I've got many ESXi 4.1 machines installed via kickstart to a local SAN booted volume. Upon upgrading firmware of the underlying hardware today, each physical machine had to be rebooted which necessitated migrating vms around between hosts to get idle them out before doing so. The problem is that after rebooting, many (not deterministic at all, around 50%) of the machines had stale state, primarily in that the the vms that had just been migrated off them were back in the host inventory (/etc/vmware/hostd/vmInventory.xml). After poking around a bit, it looks like vmInventory.xml is captured in /bootbank within state.tgz/local.tgz and also within /local.tgz. I see that the bootbank version is written by the autobackup.sh scripts and should be getting updated on shutdown, but cannot figure out where /local.tgz comes from? This file appears to have the old stale vmInventory file which is restored on boot and has to be manually deleted to reconnect the host properly to vcenter. It also looks like ssl certificates are not restored or are stale, as vcenter makes me reaccept the certificates to readd these same hosts.
Thanks for any insight,
Matt
I helped someone with a similar problem the prior week. It seems to be a bug in ESXi. local.tgz is used with Embedded and state.tgz with Installable. Have you ever restored a config backup on these hosts?
No I haven't restored any configuration. Today has been my first quite dissappointing introduction to the state issues with esxi vs. vcenter...
I see in esxi's init scripts where it checks for the /local.tgz file and restores it if present, but can't figure out what actually creates this file:
./vmware/init/init.d/16.rootfs-scan:
On Embedded the system config is stored in /bootbank/local.tgz, with installable its /bootbank/state.tgz (state.tgz contains local.tgz as you've found). What's in /bootbank/boot.cfg? You shouldn't have /local.tgz.
The way VMWare operates is that 3 Hypervisor partitions are created and used for normal operation. they are mounted as /bootbank, /altbootbank and /store.
/store is simply used to ‘store’ data (e.g. VMTools isos and VI client etc) as well as information for the vCenter Server agent and the HA agent
Once you configure the Scratchconfig.ConfiguredScratchLocation paramater for an ESX host (swap file) it will mount a 4th partition for this purpose.
Anyway, the first two partitions mentioned above are used for the ‘running’ config and ‘saved’ config . . very loosely similarly to the way in which a Cisco router stores 2 different configs.
What happens with VMware though is that the ‘running’ config
/bootbank and /altbootbank are effectively the running copy of the ESX firmware / config and the last saved version.
VMWare backs up its running config every hours (at the1 minute after the hour)
to see if you have been getting these from backups you can check the date on the stage.tgz file.
I appreciate the responses, but I'm not sure the last one is related to my question. I'm still not seeing the source of /local.tgz (the copy in root not /bootbank or /altbootbank)? I'm troubled by the comment that I shouldn't even have this file in esxi, why is that?
I rebooted all machines again today for another change, and while I didn't have the issue with the vmInventory getting restored with a stale copy this time, I think there are still state issues because some random number of the machines came up and immediately exited maintenance mode on their own while others did not...
Thanks,
Matt
> After poking around a bit, it looks like vmInventory.xml is captured in /bootbank within state.tgz/local.tgz and also within /local.tgz. I see that the bootbank version is written by the autobackup.sh scripts and should be getting updated on shutdown, but cannot figure out where /local.tgz comes from? This file appears to have the old stale vmInventory file which is restored on boot and has to be manually deleted to reconnect the host properly to vcenter. On a problem system if you extract state.tgz completely (i.e state.tgz then local.tgz into /tmp) is the vminventory file stale as well?
Dave Mishchenko wrote:
I helped someone with a similar problem the prior week. It seems to be a bug in ESXi. local.tgz is used with Embedded and state.tgz with Installable. Have you ever restored a config backup on these hosts?
Could you elaborate a bit more on this ESXi bug?
We're experiencing an issue with ESXi losing config on reboot:
http://communities.vmware.com/message/1713515
Do you think it's related to the bug you hit?
Thanks.