VMware Cloud Community
brucekconvergen
Enthusiast
Enthusiast
Jump to solution

VMFS Issue, FB Inconsistecy found, (xxx,xxx) allocated in bitmap but never used

So we had a storm last night, and a server with ESXi 5.1 and local (LSI RAID Array) went down in a less than clean manner due to an UPS issue.  All the VMs on the machines look OK, ran some chkdsks on them, just a few misallocated free space corrections made in the NTFS guest filesystem.  This got me curious, and after a little digging, I found out about the voma tool... so I ran a check on the VMFS volume... and got tons (217,588 to be exact) of these errors (with the numbers in parentheses changed, but often in sequential blocks)

ON-DISK ERROR: FB inconsistency found: (321,70) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,71) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,72) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,73) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,74) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,75) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,76) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,77) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,78) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,79) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,80) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,81) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,82) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,83) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,84) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,85) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,86) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,87) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,88) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,89) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,90) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,91) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,92) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,93) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,94) allocated in bitmap, but never used
ON-DISK ERROR: FB inconsistency found: (321,95) allocated in bitmap, but never used

Is there a tool that can resolve these inconsistencies?  This system does not have support on it.

1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in check mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
   Detected file system (labeled:'Storage_RAID5') with UUID:5113b08f-1e56cd8c-6550-001517d418f4, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
ON-DISK ERROR: <FD c297 r18> : Invalid block count 59803 should be 0.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
ON-DISK ERROR: PB inconsistency found: (3547,1) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,2) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,3) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,4) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,5) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,6) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,7) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,12) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,13) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,14) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,15) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3550,0) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3550,1) allocated in bitmap, but never used
...

and then 60.000 more entries like that.


VMware support said to regard those messages as harmless

> having basic maintenance tool is necessary for any filesystem.

Interesting that you mention that in this context - do you really want to say that VOMA is this "basic maintenance tool" ?

VMFS exists since ten years or more. It is called an enterprise high performance filesystem.
Now in version 5.* it comes with a tool that only runs in system downtimes.
It has no fix-it function and VMware engineers discard frightening results of the tool as false alarms.

In my experience the only check that can tell if a flat.vmdk is healthy is to clone it with vmkfstools -i and check if that proceeds without errors.
There is no defragmentation tool and no tool that can actively fix errors.

A single minor error can effectivey destroy  a 2 Tb flat.vmdk without any warnings.
A single small error in the graintable of a vmfssparse file - and there is no way to recover with VMware tools.

A function that would warn users before a VMFS problem occurs is not available.
No - if a VMFS  volume gets damaged it happens without any early warnings.

You are right - a filesystem should have a good repair tool - especially if very large files are normal.

I really wonder why the users let VMware get away with no repair tool at all - I would expect a decent repair and test tool with version 1.0 of a filesystem - at the very least with version 2.0
If VMFS has the reputation of a reliable filesystem this is borrowed to the fact that it usually runs on SAN-systems.

Honestly I do not know any other filesystem that is as fragile as VMFS.

And I dont know any other filesystem where small errors use to result in such large catastrophes as VMFS.

A small error in filesystems like NTFS or EXT4 and a single file is lost - a VMFS error typically kills a whole VM if not a whole datastore.

IMHO a tool like voma comes years too late.

If I compare HyperV with vSphere for small environments I would say that the VMFS filesystem is one of the strongest arguments against vSphere.

And by the way - if you want to hear feature requests for voma I can give you a very long list Smiley Wink

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

0 Kudos
8 Replies
brucekconvergen
Enthusiast
Enthusiast
Jump to solution

Forgot to add... there's a few errors that say "PB inconsistency found" at the top of the analysis.... when I checked the other host I have.

and I noted that at the top of the analysis, that there is an error under Phase 3: ON-DISK ERROR: <FD c486 r127> : Invalid block count 271834 should be 0.   --- that is close to the total number of errors.

0 Kudos
ramakrishnak
VMware Employee
VMware Employee
Jump to solution

i would suggest to file a SR on this one to get this further triaged.

Thanks,

0 Kudos
brucekconvergen
Enthusiast
Enthusiast
Jump to solution

I do not have support on these systems, these are lab boxes.  I find it relatively ridiculous that VMware does not have basic filesystem maintenance tools available -- other platforms do.  I guess the only answer for me is to move the VMs off and remove / recreate the datastore.

0 Kudos
ramakrishnak
VMware Employee
VMware Employee
Jump to solution

having basic maintenance tool is necessary for any filesystem.

and as a result "VOMA" tool has now become a mainstream tool for support teams to debug which essentially does these and much more. but as it develops more options and knobs gets added.

some of the errors you see may be false positives and some may be genuine issues which needs further triaging to fix.

i would strongly suggest you to file an SR and ask for these type of functionality in tools like voma

Thanks,

0 Kudos
brucekconvergen
Enthusiast
Enthusiast
Jump to solution

I can't file a SR without support, though, right?

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in check mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
   Detected file system (labeled:'Storage_RAID5') with UUID:5113b08f-1e56cd8c-6550-001517d418f4, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
ON-DISK ERROR: <FD c297 r18> : Invalid block count 59803 should be 0.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
ON-DISK ERROR: PB inconsistency found: (3547,1) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,2) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,3) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,4) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,5) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,6) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3547,7) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,12) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,13) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,14) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3549,15) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3550,0) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (3550,1) allocated in bitmap, but never used
...

and then 60.000 more entries like that.


VMware support said to regard those messages as harmless

> having basic maintenance tool is necessary for any filesystem.

Interesting that you mention that in this context - do you really want to say that VOMA is this "basic maintenance tool" ?

VMFS exists since ten years or more. It is called an enterprise high performance filesystem.
Now in version 5.* it comes with a tool that only runs in system downtimes.
It has no fix-it function and VMware engineers discard frightening results of the tool as false alarms.

In my experience the only check that can tell if a flat.vmdk is healthy is to clone it with vmkfstools -i and check if that proceeds without errors.
There is no defragmentation tool and no tool that can actively fix errors.

A single minor error can effectivey destroy  a 2 Tb flat.vmdk without any warnings.
A single small error in the graintable of a vmfssparse file - and there is no way to recover with VMware tools.

A function that would warn users before a VMFS problem occurs is not available.
No - if a VMFS  volume gets damaged it happens without any early warnings.

You are right - a filesystem should have a good repair tool - especially if very large files are normal.

I really wonder why the users let VMware get away with no repair tool at all - I would expect a decent repair and test tool with version 1.0 of a filesystem - at the very least with version 2.0
If VMFS has the reputation of a reliable filesystem this is borrowed to the fact that it usually runs on SAN-systems.

Honestly I do not know any other filesystem that is as fragile as VMFS.

And I dont know any other filesystem where small errors use to result in such large catastrophes as VMFS.

A small error in filesystems like NTFS or EXT4 and a single file is lost - a VMFS error typically kills a whole VM if not a whole datastore.

IMHO a tool like voma comes years too late.

If I compare HyperV with vSphere for small environments I would say that the VMFS filesystem is one of the strongest arguments against vSphere.

And by the way - if you want to hear feature requests for voma I can give you a very long list Smiley Wink

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
brucekconvergen
Enthusiast
Enthusiast
Jump to solution

Ulli, thank for the insight regarding the "allocated in bitmap" messages... I suspected they were innocuous.

And... I absolutely agree with your assessment regarding VMware and how they treat VMFS -- we should have some real tools to use.  I understand the whole "you might blow your data away" as I've seen happen once or twice with other File Systems... but that's why we have backups Smiley Happy  -- warn the admin in flashing red letters, make them type "I understand the risks" 3 times to initiate the tool, whatever ... but we need basic tools.

0 Kudos
ramakrishnak
VMware Employee
VMware Employee
Jump to solution

Thanks for the good compiled list. Suggestions/Comments well taken Smiley Happy

the fs checker does exist for VMFS from long time.  its just that not all bells and whistles are exposed to end users due to the nature of checker. At wrong hands (not enough fs internals) it can do more harm than fixing issues. hence its not general purpose for fixing those.

AFAICT we may proactively be blocking this to ask user to file SR or involve support teams so that the actual root cause is determined and product improved, before using checker tools to fix those issues (minor or non-minor)

lot of times issues exists which are outside of any FS ( hardware / SAN / Storage side issues etc)

Thanks,