I'm a developer on a content inspecting web gateway product. One of the features of our product is that it will recursively unpack downloaded files, in order to perform in-depth content analysis such as malware scanning and binary file type detection.
Our product is currently having problems with the ESXi 4.0.0 ISO image. More specifically, the files "cim.vgz" and "sys.vgz" are causing errors when we try to unpack them. In most cases, this prevents users of our software from being able to download the file. This is a situation that we'd prefer to avoid, especially as how the download is a legitimate VMWare ISO image (checked via MD5).
As far as our code can tell, both of the problem files are standard GZip archives. We use 7-Zip to extract the contents of these files, and there are no problems in doing so. The extracted files then look like TAR archives, which we again run 7-Zip on in order to extract the contents. It is at this point that 7-Zip returns an error, which our code picks up and marks the file as "bad".
We have used a number of tools in order to attempt to extract these files. 7-Zip simply says it can't open the TAR file. WinRAR reports corruption errors, but is sometimes able to report a few files. GNU TAR does best and is able to extract several files, but still reports errors.
Is there something special about these files that prevents them from being unpacked by normal tools? If so, is there any way to extract the contents of these files, in order for our software to inspect them? If not (for legal, technical or any other reason), is there something we can look for that differentiates these files from normal TAR+GZ archives?
Thanks in advance.