"There is no more space for virtual disk" on ESXi ...

rcstoutenbeek · ‎05-11-2022

We are running ESXi 7.0 Update 2 on a Dell server. We have a hardware RAID10 datastore which ESXi says has a reported capacity of 14.43 TB, of which 9.79 TB is provisioned and 4.64 TB is free.

This datastore contains 1 single VM (used for making backups). It has 2 virtual disks. Disk #1 is 16GB thick provisioned and disk #2 is 14TB thin provisioned. The VM reports under "resource consumption" that 9.56TB of the provisioned 14.02TB is used.

So plenty of space on the datastore right? Wrong. Every day within a pretty specific time frame when this VM makes new backups the VM locks up completely with the error message/question which goes:

"There is no more space for virtual disk 'xxxxx_1.vmdk'. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session."

If you click "Retry" the VM comes back up again for a while and the question will reappear. Sometimes immediately, sometimes later. If you do nothing or are not around to click anything, the VM will come online again as well and unfreeze itself after a few seconds or minutes. After the backups are finished (and the VM is not writing anything to it's virtual disk) this problem is also over.

I have read other posts that suggest the problem might have to do with thin provisioning and that the virtual disk commitment might in theory be too big for the datastore. But still it seems very buggy that I would be allowed to overcommit a thin provisioned disk and that it reports this error with an actual 4.64TB free on the datastore. Also I don't think it's really overcommitted because the datastore should be 14.43TB while the provisoned space of the VM is 14.02 TB.

This started happening immediately after the setup of this VM and ESXi host which had brand new disks. So I'm kind of doubting it's a hardware issue.

Since this VM is too large to move, my only option is start over with a thick provisioned disk and see what happens, unless on the off chance somebody can solve this issue for us.

Also posting this in case it may help somebody else dealing with this issue in the future, because I have a suspicion this is actually a bug in ESXi itself.

a_p_ · ‎05-11-2022

It may be interesting to see whether the VM's vmware.log contains more details, which will help understanding what's causing this issue.

André

continuum · ‎05-11-2022

Be very careful when answering that message box.
With thin provisioned or sesparse vmdks both answer can end in a corrupted vmdk.
Please show the vmware.log and the vmkernel.log for more details.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rcstoutenbeek · ‎05-12-2022

As requested I attached the vmware.log for the VM and the vmkernel.log.

I concentrated on the period from May 11th ~ 20:00 UTC until about May 12th 02:30 UTC. This is the time frame in which the errors occur. After that it's mostly crickets in the logs because the VM and ESX host are pretty much idling.

rcstoutenbeek · ‎05-12-2022

Thanks, replied with the logs! It clearly shows the VM is offline for about 4 minutes when nobody answers the question and the dialog times out after that.

I don't know if it shows much else. The VMkernel seems to think there really is no space, but "df -h" shows:

# df -h
Filesystem   Size   Used Available Use% Mounted on
VMFS-6      14.4T   9.9T      4.5T  69% /vmfs/volumes/datastore1-ssd-raid10
VMFS-6       1.8T   1.7T     96.4G  95% /vmfs/volumes/datastore3-nvme-bare
VMFS-6       1.8T   1.8T     37.6G  98% /vmfs/volumes/datastore2-nvme-bare
VMFS-L     119.8G   4.6G    115.2G   4% /vmfs/volumes/OSDATA-606e0d61-c556fc30-c3a2-ecf4bbf16b84
vfat         4.0G 201.6M      3.8G   5% /vmfs/volumes/BOOTBANK1
vfat         4.0G 199.2M      3.8G   5% /vmfs/volumes/BOOTBANK2

benkeprashant · ‎05-12-2022

Hi,
Have you checked this command from VMKB1007638, check if you are running out of free inodes there.

" stat -f /vmfs/volumes/datastore1-ssd-raid10 "

You see output similar to:

File: "/"
ID: 0 Namelen: 255 Type: ext2/ext3
Blocks: Total: 1259079 Free: 898253 Available: 834295 Size: 4096
Inodes: Total: 640000 Free: 580065

continuum · ‎05-12-2022

Your log-files show that the datastore is running out of space !!!
Please show a file-listing of the directory of the affected VM so that we think about your options.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rcstoutenbeek · ‎05-12-2022

Good tip, didn't even think of it! Here is the output:

# stat -f /
  File: "/"
    ID: 100000000 Namelen: 127     Type: visorfs
Block size: 4096      
Blocks: Total: 1127155    Free: 917028     Available: 917028
Inodes: Total: 655360     Free: 647438

# stat -f /vmfs/volumes/datastore1-ssd-raid10/
  File: "/vmfs/volumes/datastore1-ssd-raid10/"
    ID: ff5b3d5beecf7490 Namelen: 127     Type: vmfs
Block size: 1048576   
Blocks: Total: 15128320   Free: 4754339    Available: 4754339
Inodes: Total: 2147483647 Free: 2147483647

Not quite sure why it says that every single inode is free on the datastore, but yeah at least they are not exhausted.

continuum · ‎05-12-2022

If a VMFS 6 volume runs out of free inodes - it will expand the metafiles - especially it will expand the file .sbc.sf.
It will do so without any mercy which will result in a fragmented .sbc.sf - if it does so too often sooner or later Datastorebrowser can no longer keep up with it and it will fail to enumerate the files in a directory.

Try
vmkfstools -P -v10 /vmfs/volumes/datastore-name
to see the current state.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rcstoutenbeek · ‎05-12-2022

There you go! Replaced the VM name by 'xxxxx' in output because it contains a FQDN identifying the company.

# ls -la /vmfs/volumes/datastore1-ssd-raid10/
total 2216064
drwxr-xr-t    1 root     root         73728 Nov  5  2021 .
drwxr-xr-x    1 root     root           512 May 12 14:55 ..
-r--------    1 root     root     257261568 Apr  7  2021 .fbb.sf
-r--------    1 root     root     134807552 Apr  7  2021 .fdc.sf
-r--------    1 root     root     268632064 Apr  7  2021 .jbc.sf
-r--------    1 root     root      16908288 Apr  7  2021 .pb2.sf
-r--------    1 root     root         65536 Apr  7  2021 .pbc.sf
-r--------    1 root     root     1577910272 Apr  7  2021 .sbc.sf
drwx------    1 root     root         69632 Apr  7  2021 .sdd.sf
-r--------    1 root     root       7340032 Apr  7  2021 .vh.sf
drwxr-xr-x    1 root     root         77824 Jan 11 23:36 xxxxx

# ls -la /vmfs/volumes/datastore1-ssd-raid10/xxxxx/
total 10620741952
drwxr-xr-x    1 root     root         77824 Jan 11 23:36 .
drwxr-xr-t    1 root     root         73728 Nov  5  2021 ..
-rw-------    1 root     root             0 Dec 20 22:37 xxxxx-0fe6471e.vswp
-rw-------    1 root     root     17179869184 May 12 14:55 xxxxx-flat.vmdk
-rw-------    1 root     root          8684 May 11 02:34 xxxxx.nvram
-rw-------    1 root     root           535 Jan 11 23:36 xxxxx.vmdk
-rw-r--r--    1 root     root             0 Apr 12  2021 xxxxx.vmsd
-rwxr-xr-x    1 root     root          4556 Jan 31 08:42 xxxxx.vmx
-rw-------    1 root     root             0 Dec 20 22:37 xxxxx.vmx.lck
-rw-------    1 root     root           150 May 11 01:46 xxxxx.vmxf
-rwxr-xr-x    1 root     root          4556 Jan 31 08:42 xxxxx.vmx~
-rw-------    1 root     root     15393162788864 May 12 14:30 xxxxx_1-flat.vmdk
-rw-------    1 root     root           543 Jan 11 23:36 xxxxx_1.vmdk
-rw-r--r--    1 root     root        186929 Nov  5  2021 vmware-34.log
-rw-r--r--    1 root     root        186219 Nov  5  2021 vmware-35.log
-rw-r--r--    1 root     root        187332 Nov  5  2021 vmware-36.log
-rw-r--r--    1 root     root        187081 Nov  5  2021 vmware-37.log
-rw-r--r--    1 root     root        238229 Nov  5  2021 vmware-38.log
-rw-r--r--    1 root     root        489908 Dec 20 22:15 vmware-39.log
-rw-r--r--    1 root     root       1972760 May 12 14:25 vmware.log
-rw-------    1 root     root     121634816 Dec 20 22:37 vmx-xxxxx-635e6b16f8f79f51adce7b899475c1ca2d52ba0e-1.vswp

# ls -lah /vmfs/volumes/datastore1-ssd-raid10/xxxxx/
total 10620741952
drwxr-xr-x    1 root     root       76.0K Jan 11 23:36 .
drwxr-xr-t    1 root     root       72.0K Nov  5  2021 ..
-rw-------    1 root     root           0 Dec 20 22:37 xxxxx-0fe6471e.vswp
-rw-------    1 root     root       16.0G May 12 14:55 xxxxx-flat.vmdk
-rw-------    1 root     root        8.5K May 11 02:34 xxxxx.nvram
-rw-------    1 root     root         535 Jan 11 23:36 xxxxx.vmdk
-rw-r--r--    1 root     root           0 Apr 12  2021 xxxxx.vmsd
-rwxr-xr-x    1 root     root        4.4K Jan 31 08:42 xxxxx.vmx
-rw-------    1 root     root           0 Dec 20 22:37 xxxxx.vmx.lck
-rw-------    1 root     root         150 May 11 01:46 xxxxx.vmxf
-rwxr-xr-x    1 root     root        4.4K Jan 31 08:42 xxxxx.vmx~
-rw-------    1 root     root       14.0T May 12 14:30 xxxxx_1-flat.vmdk
-rw-------    1 root     root         543 Jan 11 23:36 xxxxx_1.vmdk
-rw-r--r--    1 root     root      182.5K Nov  5  2021 vmware-34.log
-rw-r--r--    1 root     root      181.9K Nov  5  2021 vmware-35.log
-rw-r--r--    1 root     root      182.9K Nov  5  2021 vmware-36.log
-rw-r--r--    1 root     root      182.7K Nov  5  2021 vmware-37.log
-rw-r--r--    1 root     root      232.6K Nov  5  2021 vmware-38.log
-rw-r--r--    1 root     root      478.4K Dec 20 22:15 vmware-39.log
-rw-r--r--    1 root     root        1.9M May 12 14:25 vmware.log
-rw-------    1 root     root      116.0M Dec 20 22:37 vmx-xxxxx-635e6b16f8f79f51adce7b899475c1ca2d52ba0e-1.vswp

# df -h
Filesystem   Size   Used Available Use% Mounted on
VMFS-6      14.4T   9.9T      4.5T  69% /vmfs/volumes/datastore1-ssd-raid10
VMFS-6       1.8T   1.7T     96.4G  95% /vmfs/volumes/datastore3-nvme-bare
VMFS-6       1.8T   1.8T     37.6G  98% /vmfs/volumes/datastore2-nvme-bare
VMFS-L     119.8G   4.6G    115.2G   4% /vmfs/volumes/OSDATA-606e0d61-c556fc30-c3a2-ecf4bbf16b84
vfat         4.0G 201.6M      3.8G   5% /vmfs/volumes/BOOTBANK1
vfat         4.0G 199.2M      3.8G   5% /vmfs/volumes/BOOTBANK2

rcstoutenbeek · ‎05-12-2022

Output from vmkfstools command. Unsure how to interpret this at first glance:

# vmkfstools -P -v10 /vmfs/volumes/datastore1-ssd-raid10/
VMFS-6.82 (Raw Major Version: 24) file system spanning 1 partitions.
File system label (if any): datastore1-ssd-raid10
Mode: public
Capacity 15863193272320 (15128320 file blocks * 1048576), 4985285771264 (4754339 blocks) avail, max supported file size 70368744177664
Volume Creation Time: Wed Apr  7 19:52:01 2021
Files (max/free): 16384/16355
Ptr Blocks (max/free): 0/0
Sub Blocks (max/free): 24064/22264
Secondary Ptr Blocks (max/free): 256/255
File Blocks (overcommit/used/overcommit %): 0/10373981/0
Ptr Blocks  (overcommit/used/overcommit %): 0/0/0
Sub Blocks  (overcommit/used/overcommit %): 0/1800/0
Large File Blocks (total/used/file block clusters): 29548/0/29548
Volume Metadata size: 2262925312
Disk Block Size: 512/512/0
UUID: 606e0d61-d4dfee10-c4c5-ecf4bbf16b84
Logical device: 606e0d61-cfbf898e-4eff-ecf4bbf16b84
Partitions spanned (on "lvm"):
	naa.6b8ca3a0f9e39a002800c59e22cf946f:8
Unable to connect to vaai-nasd socket [No such file or directory]
Is Native Snapshot Capable: NO
OBJLIB-LIB: ObjLib cleanup done.
WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0

continuum · ‎05-12-2022

Do you really need the full 14 TB for xxxxx_1-flat.vmdk

If it is an option to reduce the size of the partitions and you could manage to free up some space at the end of the disk we could consider to take a scissor and cut it off at the end.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rcstoutenbeek · ‎05-12-2022

Output for vmkfstools (already posted this I think 15 min ago, but I don't see it in the thread anymore)

# vmkfstools -P -v10 /vmfs/volumes/datastore1-ssd-raid10/
VMFS-6.82 (Raw Major Version: 24) file system spanning 1 partitions.
File system label (if any): datastore1-ssd-raid10
Mode: public
Capacity 15863193272320 (15128320 file blocks * 1048576), 4985285771264 (4754339 blocks) avail, max supported file size 70368744177664
Volume Creation Time: Wed Apr  7 19:52:01 2021
Files (max/free): 16384/16355
Ptr Blocks (max/free): 0/0
Sub Blocks (max/free): 24064/22264
Secondary Ptr Blocks (max/free): 256/255
File Blocks (overcommit/used/overcommit %): 0/10373981/0
Ptr Blocks  (overcommit/used/overcommit %): 0/0/0
Sub Blocks  (overcommit/used/overcommit %): 0/1800/0
Large File Blocks (total/used/file block clusters): 29548/0/29548
Volume Metadata size: 2262925312
Disk Block Size: 512/512/0
UUID: 606e0d61-d4dfee10-c4c5-ecf4bbf16b84
Logical device: 606e0d61-cfbf898e-4eff-ecf4bbf16b84
Partitions spanned (on "lvm"):
	naa.6b8ca3a0f9e39a002800c59e22cf946f:8
Unable to connect to vaai-nasd socket [No such file or directory]
Is Native Snapshot Capable: NO
OBJLIB-LIB: ObjLib cleanup done.
WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0

rcstoutenbeek · ‎05-12-2022

Thanks for the help by the way, appreciate it!

You are correct that we don't need the full 14TB and at some earlier point I wanted to shrink the disk to 13TB to maybe solve this issue if it was due to over commitment of virtual disk space. I managed to shrink the ext4 filesystem on the VM to 13TB (see output below), but then it came down to using a hack (editing the vmdk by hand) to shrink the disk and I chickened out. I can't risk breaking this VM if I don't have to, because we regularly use the data on it. And I don't really know what I'm doing with editing the vmdk.

I already am starting up a plan to stop using this VM, delete it and start over by the way. This post was a last effort to maybe save me that trouble and of course understand what the issue is.

Output fdisk within VM:

Disk /dev/sdb: 14 TiB, 15393162788864 bytes, 30064771072 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DDF7CC2F-8DD8-4911-8602-58BAD05A5389

Device     Start         End     Sectors Size Type
/dev/sdb1   2048 27917289471 27917287424  13T Linux filesystem

continuum · ‎05-12-2022

Dont use fdisk for gpt-disk - use gdisk.
Can you run gparted inside the VM to resize the partition ?
Currently the partition has a size of 13312 GB.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

rcstoutenbeek · ‎05-12-2022

Let me add that at the very least I will NOT use thin provisioning ever again. I assume the issue is related to that. Also I regret making a 14TB VM like this, because I can't migrate it anywhere because of the size, limiting my options. I might as well have made this system bare metal (skip ESX) or use smaller VM disks and multiple VM's. But that is a side note and doesn't help the issue! 🙂

rcstoutenbeek · ‎05-12-2022

Yes the partition and ext4 filesystem is already 13TB. But the VMDK remains at 14TB and shrinking the disk is officially not supported I think in ESXi

continuum · ‎05-12-2022

I can cut the 14 tb vmdk - but I would like to see it in gparted before - according to the fdisk output the partition still has a size of 13331 GB.

I cut vmdks with dd and that is quite destructive - thats why I want to see the disk in gparted first.
Use a livecd if your VM has no X.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

benkeprashant · ‎05-12-2022

I strongly believe While backup jobs you are reaching total available inodes. follow this kb step by step that may fix your issue. https://kb.vmware.com/s/article/1007638

try to investigate for the recent presence of more than usual small files, such as files ending .txt or .dmp. also other files in this datastore. If they are present in large numbers , try to cleanup or move old files if any unregistered vm files. then check the available inodes .

rcstoutenbeek · ‎05-12-2022

Thanks for the offer, but I can't risk it. I would have a hard time justifying this move if it went wrong somehow. I'd rather just set up a new backup system and rotate this system out until I can delete it. Also I guess it wouldn't clearly resolve the issue, namely that this is perhaps actually a bug in ESXi?!

All

"There is no more space for virtual disk" on ESXi 7.0 while there is plenty of space!