VMware Cloud Community
conyards
Expert
Expert
Jump to solution

VMFS file system growth?

Whilst working on issues with VMotion and slow ll listing of /vmfs/volumes I have noticed some discrepencies on what should be static VMFS volumes. When I say static, I mean that the volume contains metadata, a heartbeat region and a Virtual machine disk file. Basically the VMFS volume is using more space then I can account to it; the volume below lists as 50.750GB in VC, working back from a vdf operation this volume size matches up on the host.

When I ll -lah on the volume I get the following for the root of the volume and the VM folder;

total 562M

drwxrwxrwt 1 root root 1.1K Oct 12 23:56 .

drwxrwxrwx 1 root root 512 Oct 29 10:52 ..

-r----


1 root root 896K Oct 12 15:16 .fbb.sf

-r----


1 root root 62M Oct 12 15:16 .fdc.sf

-r----


1 root root 244M Oct 12 15:16 .pbc.sf

drwxr-xr-x 1 root root 560 Oct 12 23:56 RESVPWMINT01

-r----


1 root root 248M Oct 12 15:16 .sbc.sf

-r----


1 root root 4.0M Oct 12 15:16 .vh.sf

From the VM folder also;

total 51G

drwxr-xr-x 1 root root 560 Oct 12 23:56 .

drwxrwxrwt 1 root root 1.1K Oct 12 23:56 ..

-rw------- 1 root root 50G Oct 18 04:57 RESVPWMINT01-flat.vmdk

-rw------- 1 root root 345 Oct 16 20:26 RESVPWMINT01.vmdk

Rounding all those files that are listed below 1M to 1M and adding the values together equals 50.565GB, which therefore by my rekoning means the volume should have 0.185GBs free space available to it.

The problem for me is that the volume is only reporting as having 1M free space, which is unfortunate - as VMFS needs between 4M and 5M for journalling operations.

What is using this space?

The heartbeat region is represented above i assume by the file vh.sf

The Metadata by the files, fbb.sf, fdc.sf,pbc.sf & sbc.sf

The Virtual machine hard disk is the files -flat.vmdk and *.vmdk.

There is nothing else listed on the volume, so again what is utilising the space?

The fact that these volume is reporting as full is undoutabably what is causing the issues with VMotion and the slow processing of VDF and an 'll' from /vmfs/volumes.

Another concern for me is that I can see this descrepency on all of the volumes presented from the SAN to the hosts, less volume space reported as usable then what should be.

I'd really appreciated it if somebody could take the time to compare the free space listed on a SAN attached LUN from a vdf operation, with the files that make up the volume. For clarification, I'd like to see a LUN that only contains VMDK files and does not house virtual machine VMX or VSWP etc.

First two posters to attach/copy the requested output will be awarded helpful points.

Simon

https://virtual-simon.co.uk/
Reply
0 Kudos
1 Solution

Accepted Solutions
conyards
Expert
Expert
Jump to solution

After much discussion, it would seem that what has been encountered are leaked blocks. Simply put the command to clear certain blocks on the file system has been sent, but not carried out.

VMware are working on a resolution utilising disk dumps from our live systems, it is effectivley being treated as a bug.

Something else that came to light as a result of these discussions is that the published equation for working out the cost of metadata on a LUN is not correct. The published equation being 500MB + (x - 1) (0.016KB).

I'll post the correct equation on this thread assuming it doesn't breach NDA.

Simon

https://virtual-simon.co.uk/

View solution in original post

Reply
0 Kudos
6 Replies
conyards
Expert
Expert
Jump to solution

OK I've been able to examine a new volume presented to a test environment;

The LUN presented is 5GB which after formatting with VMFS3 equates 4.75G and states 4.14G as usable

I've examined the LUN & the following can be found on this LUN;

. 980 or 0.000934M

..512 or 0.000488M

.fbb.sf 163840 or 0.15625M

.fdc.sf 64946176 or 61.9375M

.pbc.sf 255655936 or 243.8125M

.sbc.sf 260366336 or 248.3046M

.vh.sf 4194304 or 4M

so...

0.000934 + 0.00048 + 0.15625 + 61.9275 + 243.8125 + 248.3046 + 4 = 558.218M

4.75 - 4.14 = 0.61 or 624.64M

I therefore have a descrepency between the space on the volume and what the files are consuming and what is being reported as consumed of 66.422M

I've then created a new virtual disk on the VMFS of 4096M

The above files stay the same size with the exception of that labelled as '.' which grows to 1260 / 0.001201M; which brings the total for the above files to 558.202M.

the -flat.vmdk is listed at 4294967296 or 4096M

the *.vmdk lists as 308 or 0.000293M

giving a total used of 4954.202M

a vdf operation lists 145408 free or 0.138M

VC lists the free space at 0.142M

a vdf operation lists 4835328 as used

adding up all the files and rounding up to KB = 4765914

a discrepency still of 69414 or 67.787M

So with the addition of the VMDK the gap seems to have grown...

What is using this space?

https://virtual-simon.co.uk/
Reply
0 Kudos
conyards
Expert
Expert
Jump to solution

After much discussion, it would seem that what has been encountered are leaked blocks. Simply put the command to clear certain blocks on the file system has been sent, but not carried out.

VMware are working on a resolution utilising disk dumps from our live systems, it is effectivley being treated as a bug.

Something else that came to light as a result of these discussions is that the published equation for working out the cost of metadata on a LUN is not correct. The published equation being 500MB + (x - 1) (0.016KB).

I'll post the correct equation on this thread assuming it doesn't breach NDA.

Simon

https://virtual-simon.co.uk/
Reply
0 Kudos
conyards
Expert
Expert
Jump to solution

Out of interest, the points are still available if someone wants to post there disk information, it might be interesting to see if this is being encountered elsewhere.

Simon

https://virtual-simon.co.uk/
Reply
0 Kudos
charlie88
Contributor
Contributor
Jump to solution

Digging up an old thread here but did you ever get to the bottom of this? We have a similar issue - intermittent vmotion failures with "MigrateWaitForData: timed out. Migration has failed" logged in vmware.log. This was caused by VMFS datastores filling up - a symptom was slow ls / vdf of /vmfs/volumes. We provision 1 LUN per VM and size it for the disk required + RAM burst - typically we provision a 20GB LUN for an 18GB vmdk + 1 GB vswp with the remainder as overhead. This worked well for us initially but over time log files built up - I can fix the vmotion problems by deleting old vmware-*.log files to reclaim some space. We never saw any impact to VMs but caused a headache for us operationally.

However in investigating this I also found some datastores "leaking" space. I found that some VMs I'd powered off couldn't be powered back on with the same memory reservation as there was no longer enough space to create the vswp file. These datastores seem to have less free space available than should be the case given the total space required by the various VM files. A 20GB LUN formats with a 1MB block size to create a datastore with a "capacity" of 19.75GB and "free" of 19.14GB as reported by VI client. However I now have a datastore where I have deleted all the VM files and it's only got 18.74GB free (when it should be back to 19.14GB). The files on both this and a freshly created datastore report as the same so I don't see where the missing space has gone. Our environment is ESX currently 3.0.2, most hosts were originally running as 3.0.1 then upgraded later.

[root@]# ls -lah

total 561M

drwxrwxrwt 1 root root 980 May 27 07:53 .

drwxr-xr-x 1 root root 512 May 27 10:04 ..

-r----


1 root root 384K Oct 9 2007 .fbb.sf

-r----


1 root root 62M Oct 9 2007 .fdc.sf

-r----


1 root root 244M Oct 9 2007 .pbc.sf

-r----


1 root root 248M Oct 9 2007 .sbc.sf

-r----


1 root root 4.0M Oct 9 2007 .vh.sf

Cheers,

Charlie

Reply
0 Kudos
jameseager
Contributor
Contributor
Jump to solution

Ok, I'm not 100% sure if this is directly related - but we've had some interesting issues here at the my work with VMFS file system growth on a 100 Gb raid 10 volume available to ESX via our SAN. We had two machines on that disk, but moved them over to a larger 400 Gb volume. At one point we too got a timeout error while attempting a migration, also one when trying to add memory to a VM that was shut down for maintenance. After those issues, we noticed that despite being supposedly empty, a whopping 50 Gb of the volume was still showing as used.

From the Virtual Infrastructure Client, the storage shows as 100.75 Gb capacity, but only 50.14 Gb free. Using WinSCP I connected up to one of the ESX servers for a look see, and found the following under the /vmfs/volumes/<appropriate volume name> directory:

Name / Size

.fbb.sf 1,736,704

.fdc.sf 64,946,176

.pbc.sf 255,655,936

.sbc.sf 260,366,336

.vh.sf 4,194,304

Obviously, the .pbc.sf and .sbc.sf are the two files we are most concerned about since they appear to eat up the most space.

Any information that you guys can provide on what those files are, whether or not we can somehow wipe them out and start fresh, or any client features to re-format and recover the space would be greatly appreciated!

Thx in advance,

James.

NEVERMIND - I didn't realize that you have to go in and "refresh" the datastore (right-clicking on it to find that option). Now it shows the correct available size. Oops!

Reply
0 Kudos
kernelphr34k
Contributor
Contributor
Jump to solution

I'm seeing the same thing, but I refresh the datastore and nothing changes. I've rescanned for new storage devices and new vmfs volumes and no change.

The drive is 838GB, and shows 157.27GB remaining. I have 3 other VM's that are using most of the storage. I should be able to make one more VM (like I have had) and that leaves some Gb left on the disk.

-r----


1 root root 4489216 Jul 9 03:12 .fbb.sf

-r----


1 root root 63143936 Jul 9 03:12 .fdc.sf

-r----


1 root root 255655936 Jul 9 03:12 .pbc.sf

-r----


1 root root 260374528 Jul 9 03:12 .sbc.sf

-r----


1 root root 4194304 Jul 9 03:12 .vh.sf

These files are HUGE, taking up too much space. What can I do to resolve?

ESX 3.5, vCenter 2.5.

Thanks!!

Reply
0 Kudos