VMware Cloud Community
ThomasMc
Enthusiast
Enthusiast
Jump to solution

VUM Scan Host - The host returns esxupdate error codes: 10

Morning everyone, I've been struggling to nail down this error 10 code that I'm getting on all of my hosts (3 ESXi in total). Everywhere i look it tells me that I'm running low on space but when I check the outputs on all the hosts there seems to be more than enough space available.

~ # df -h
Filesystem                Size      Used Available Use% Mounted on
visorfs                   1.5G    324.2M      1.1G  22% /
vmfs3                    499.8G    107.1G    392.6G  21% /vmfs/volumes/4ce1a8ee-814eb77e-1766-68b599e3df73
vfat                       285.9M    140.7M    145.2M  49% /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a
vfat                       249.7M    102.5M    147.3M  41% /vmfs/volumes/65092bef-de8a06b5-22db-2bbbc32dc3d2
vfat                       249.7M    103.7M    146.1M  42% /vmfs/volumes/ff060de6-cecc88e5-4d14-8726d7ed0132
vmfs3                   499.8G    100.2G     399.6G  20% /vmfs/volumes/4ce1a92e-6c624d34-2cf3-68b599e3df73
vmfs3                   499.8G      6.1G      493.6G   1% /vmfs/volumes/4ce1a909-d6b31dea-450f-68b599e3df73
vmfs3                   499.8G    281.6G     218.2G  56% /vmfs/volumes/4ce1a8cf-9914f00c-7975-68b599e3df73
vmfs3                   409.8G    204.8G     205.0G  50% /vmfs/volumes/4d5006ba-5fdcef08-6003-68b599e3df73
~ # vdf -h
Tardisk                  Space      Used
SYS1                      201M      201M
SYS2                       55M       55M
SYS3                        1M        1M
SYS4                       12K       12K
SYS5                       12K       12K
SYS6                       42M       42M
SYS7                       12M       12M
-----
Ramdisk                   Size      Used Available Use% Mounted on
MAINSYS                 32M        4M       27M  14% --
tmp                         192M        4K      191M   0% --
updatestg                 750M       64K      749M   0% --
hostdstats                 78M        3M       74M   5% --
AAMconfig                 128M        3M      124M   2% --
~ #

When I was looking over the VUM logs the only thing I can see that is intresting is;

* The host certificate chain is not complete.
[2011-02-23 09:38:59.652 02884 warning 'Libs'] SSLVerifyIsEnabled: failed to read registry value. Falling back to default behavior: verification off. LastError = 0
[2011-02-23 09:38:59.652 02884 warning 'Libs'] SSLVerifyCertAgainstSystemStore: Certificate verification is disabled, so connection will proceed despite the error
[2011-02-23 09:38:59.652 02884 warning 'Libs'] SSLVerifyCertAgainstSystemStore: The remote host certificate has these problems:

vC and ESXi hosts are all up to date with 4.1u1 and I've also updated VUM and wasn't ggetting any problems before these updates.

Thanks

Thomas McConnell vPadawan
Reply
0 Kudos
1 Solution

Accepted Solutions
COVSupport
Contributor
Contributor
Jump to solution

Im having the same issue on 4 ESXi hosts.  Im wondering if this is the same issue on both embedded and installable esxi? Mine are embedded and I used the Update Manager.

Problem Resolved.  Spoke with VMWare and it looks like a bug.  They had me reboot my hosts a second time and the errors go away.  So after running 4.1 update 1 on my hosts and after they reboot I ran a scan where I received the error.  Rebooted them again and now they are working fine.

View solution in original post

Reply
0 Kudos
28 Replies
ThomasMc
Enthusiast
Enthusiast
Jump to solution

After editing the esxupdate.conf and changing the log file it turns out that the error was

FileIOError: ('/var/tmp/cache/metadata875978848', "Cannot create dir /var/tmp/cache/metadata875978848: [Errno 17] File exists: '/var/tmp'")

so I SCP over to the box and renames /var/tmp to /var/tmp.bak and scan the host again and its now working, I'm off to do the rest of them now

Thomas McConnell vPadawan
Reply
0 Kudos
ThomasMc
Enthusiast
Enthusiast
Jump to solution

I've been digging a little further into this issue and found out that the above was infact a link instead of a actual directory, I decided to see if there where other links that where now broken and found out below

~ # find . -type l | (while read FN ; do test -e "$FN" || ls -ld "$FN"; done)
lrwxrwxrwx    1 root     root                 19 Jan 13 01:36 ./usr/lib/vmware/hostd/docroot/downloads -> /scratch/downloads/
lrwxrwxrwx    1 root     root                 17 Jan 13 01:36 ./var/tmp.bak -> /scratch/var/tmp/
lrwxrwxrwx    1 root     root                 18 Jan 13 01:36 ./vmupgrade -> /locker/vmupgrade/
lrwxrwxrwx    1 root     root                 12 Feb 21 10:16 ./scratch -> /tmp/scratch
~ #

All 3 hosts are the same(HP ML110 G6) and where all updated from 4.1 to 4.1u1 via VUM, is this a possible bug or was I just having a bad day

Thomas McConnell vPadawan
Reply
0 Kudos
jonb157
Enthusiast
Enthusiast
Jump to solution

I'm thinking this is a bug because I recently upgraded to 4.1u1 via VUM and now am getting the same error on some of the hosts I patched.  What's funny is that it patched, rebooted the host, but then failed to do the "post-scan" to indicate it was patched successfully. Now when I scan for updates, I get this error. Would be nice if someone from VMWare could comment on this.

Reply
0 Kudos
CRKochan
Contributor
Contributor
Jump to solution

I'm seeing the same thing on all the hosts that were updated to 4.1u1 via VUM. Running "mkdir -p /tmp/scratch/var/tmp" seems to clear it up as well.

Reply
0 Kudos
bramvermeulen
Contributor
Contributor
Jump to solution

I have the same issue on my 10 ESXi hosts aswell, thanks for posting the solution. Did anybody create a SR for this?

Reply
0 Kudos
dlund
Enthusiast
Enthusiast
Jump to solution

I have the same issue on 5 ESXi hosts. The serveres were installed with 4.1, updated to U1 with VUM, and then the problem started.

The solution worked fine, but this issue can be annoying in bigger deployments so an official response would be nice.

Reply
0 Kudos
COVSupport
Contributor
Contributor
Jump to solution

Im having the same issue on 4 ESXi hosts.  Im wondering if this is the same issue on both embedded and installable esxi? Mine are embedded and I used the Update Manager.

Problem Resolved.  Spoke with VMWare and it looks like a bug.  They had me reboot my hosts a second time and the errors go away.  So after running 4.1 update 1 on my hosts and after they reboot I ran a scan where I received the error.  Rebooted them again and now they are working fine.

Reply
0 Kudos
ahinterl
Contributor
Contributor
Jump to solution

Just to let you know: See here as well: http://communities.vmware.com/thread/302686?tstart=0

I filed a support ticket when I saw that the problems are caused by /tmp/scratch becoming unavailable (update manager scans, log file bundles, host config backups all need the scratch partition to be accessible). Workaround is to reboot the host, then things are fine for a while...

I have ESXi embedded on two servers as well (booting from internal flash drives).

Andreas

Reply
0 Kudos
Gabriel_Chapman
Enthusiast
Enthusiast
Jump to solution

thats odd because my SR is still open and I have not been told to attempt this.

Ex Gladio Equitas
Reply
0 Kudos
geekinabox
Contributor
Contributor
Jump to solution

Currently experiencing this issue on 100+ fresh ESXi 4.1 installs on Cisco UCS.

A reboot temporarily resolves the problem (ie, VUM works thereafter), but it generally reoccurs within several days.

I've opened a SR but haven't seen much postive action.  Anyone else getting traction on this?

Reply
0 Kudos
filbo
Enthusiast
Enthusiast
Jump to solution

Rebooting "fixes" the problem for a bit over 10 days; then it will return.

What's going on here?  /tmp was added in 4.1U1 to the list of directories "cleaned" by /sbin/tmpwatch.sh.  (tmpwatch.sh is run by root's crontab /var/spool/cron/crontabs/root).  tmpwatch is not aware that /tmp/scratch is special and needs to be left alone.

Why does this affect some hosts and not others?  On ESXi Installable, /scratch is a symlink to some /vmfs/volumes/... place, not assaulted by tmpwatch.

Even on ESXi Embedded, I believe /scratch gets setup as a link to permanent storage if any exists at boot time.

So only Embedded is affected, and then only a subset (diskless machines), and then a subset of those (without suitable scratch space on the boot USB key).

Why does /scratch even matter?  The stuff in /tmp/scratch is mostly, in fact, "scratch" which could be deleted at will as long as it isn't in use at the moment.  Also, most or all users of scratch properly do the equivalent of `mkdir -p /scratch/my/little/fiefdom`.  Where this goes wrong is when /scratch is a symlink to /tmp/scratch, and /tmp/scratch is a directory that gets deleted by tmpwatch.  `mkdir -p` knows to make missing subdirectories along the path; but it is not equipped to deal with a broken symlink.

How to fix it?  Unfortunately an in-field repair of this is not very easy.  /sbin/tmpwatch.sh is not "sticky" and can't easily be edited in a way which will persist across reboots.  Even if you did, say, edit the file and rebuild /bootbank/s.z that contains it, you would be vulnerable to that being replaced by any VMware or OEM update.  (... months later when you can't remember what you did to fix it in the first place.)

Looking at /sbin/configLocker, I see that it will choose a FAT partition from the boot USB stick, but only if it's at least 4000000000 bytes (4GB).  So it appears -- by analysis, not by testing -- that if you use a large enough USB stick (8GB or more) and partition it to have a >4GB empty FAT32 partition, /scratch should end up pointing to it.

Therefore, proposed workaround for diskless ESXi Embedded hosts:

1. Create an ESXi 4.1U1 image on an 8GB or larger USB stick
2. Using tools such as Linux `parted` or -- I don't know what on Windows -- repartition it to add a large empty FAT32 partition.  You should keep all existing partitions with their existing sizes, types, contents, and partition numbers.  It should be possible to add an "extended" partition at the end since the standard ESXi Embedded `dd` image is < 1GB
3. Boot the host with this modified stick
4. Get to a shell and `ls -ld /scratch`: if this now points to /vmfs/volumes/[some UUID gibberish], you win.  If it still points to /tmp then talk to me, we'll figure it out some more...

5. Ideally you want to do this with an existing image, including its already configured local.tgz, oem.tgz and whatever else.  You have more practical experience at that than I do, you figure it out.  Steps 1-4 can be done with a completely pristine 4.1U1 dd image just to verify the basic procedure.

>Bela<

ThomasMc
Enthusiast
Enthusiast
Jump to solution

Thanks for the update on this problem filbo

Thomas McConnell vPadawan
Reply
0 Kudos
mdippold
Enthusiast
Enthusiast
Jump to solution

Unfortunately this "workaround" doesn't work on IBM ESXi embedded systems because the USB stick is only 2GB.

Reply
0 Kudos
ahinterl
Contributor
Contributor
Jump to solution

Just for your information: My efforts were successful, VMware has added the disappearing scratch directory problem to their bug list (PR: 697348).

To resolve my problems until a patch comes out, I've created directories in a new VMFS volume on my storage and configured the hosts to put their scratch partitions there. Works since then.

Andreas

Reply
0 Kudos
filbo
Enthusiast
Enthusiast
Jump to solution

Martin Dippold wrote:

Unfortunately this "workaround" doesn't work on IBM ESXi embedded systems because the USB stick is only 2GB.

That seems likely to be the case with any vendor-supplied embedded ESXi.  They would have no reason to use a bigger stick.

I think the physical stick is just a normal USB device plugged into a dedicated internal port.

So there seem to be two points of attack: violate the hardware OEM by replacing the stick, or violate VMware by patching the ESXi image.

It should not be difficult to patch the image so that configLocker chooses the USB stick over /tmp on smaller sticks.

The standard 4.1 image is 900MiB, leaving ~1100M available for an extra FAT filesystem on a 2G stick.  That's smaller than the 4G target size configLocker's looking for.  I don't know what truly needs >4G /scratch -- possibly some VUM operations, possibly it's a function of the number of VMs and datastores to be operated.

There's also the whole question of stressing flash by writing it too often.  But since the script is willing on larger sticks, it must think that's a non-issue.  (Could be an assumption about new write mitigation technologies not being present on older / original "small" 2G sticks.)

Reply
0 Kudos
HannaL
Enthusiast
Enthusiast
Jump to solution

A solution that worked for me with an IBM Embedded ESXI 4.1 host also having this same issue was to set a valid scratch partition using a /vmfs/volumes datastore with plenty of free space.  It is pretty easy to do.  You just go to the Configuration tab of the host, click on Advanced Settings under the Software section, select ScratchConfig, fill in the /vmfs/volumes/datastoreuuid you want to use in the box for : ScratchConfig.ConfiguredScratchLocation.  then reboot.  Before I did this and was having the problem the box for ScratchConfig.CurrentScratchLocation was set to /tmp/scratch which did not exist and would not have enough space anyway.

I was then able to scan with no errors.

Hope that Helps

Hanna

Hope that helps, Hanna --- BSCS, VCP2, VCP VI3, VCP vSphere, VCP 5 https://www.ibm.com/developerworks/mydeveloperworks/blogs/vmware-support-ibm
Reply
0 Kudos
filbo
Enthusiast
Enthusiast
Jump to solution

Thanks, Hanna.  You have hit on what is now the officially recommended solution, as seen at http://kb.vmware.com/kb/1033696/.

A second article is in the works which more specifically addresses the symptoms seen in VUM and a few other places.  Its purpose is to help people recognize these symptoms as equivalent to "/scratch went AWOL", then redirect to KB 1033696 to repair that.

>Bela<

Reply
0 Kudos
bverm
Enthusiast
Enthusiast
Jump to solution

How much space can be written in such a directory? Is there a way to limit it with another advanced setting?

If I select a datastore that contains VMs I wouldn't want the datastore to grow unlimited of course. Smiley Happy

Reply
0 Kudos
HannaL
Enthusiast
Enthusiast
Jump to solution

The scratch partition is always only 4gb.

See page 28 and 29 here: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_esxi_e_vc_setup_guide.pdf

There is no way to limit the growth of anything in a vmfs volume as far as I know.  Use datastore usage alarms to monitor free space.

Hope that Helps

Hanna

Hope that helps, Hanna --- BSCS, VCP2, VCP VI3, VCP vSphere, VCP 5 https://www.ibm.com/developerworks/mydeveloperworks/blogs/vmware-support-ibm
Reply
0 Kudos