VMware Cloud Community
71rellimcm39
Contributor
Contributor

Can't consolidate disks for VM that has disks spread across multipe datastores that are too small

So I have an issue that is spanning different things.

Working with vCenter 6.5 with ESXi 6.5 hosts.

I have a WIndows 2008 Server VM that is around 2.8 TB, however the data stores setup are only allocated 1.9 TB of space. It has 5 disks, 4 of them on one data store and another disk on

another data store.  There are a total of 25 to 30 data stores that vCenter can see.  Each data store has around 1.9 TB of space allocated.

Before I arrived, folks were creating nested snapshots and not deleting them.  I've cleaned up all of the snapshots, however its having issues and I need to consolidate disks.  I am not able to consolidate the disks.

I've tried to migrate VMs from one data store to another, and I'm not able to.  I thought this was an option, but its not working.


There is a data store with no VMs or data on it.  Am I able to delete the data store, not cause any issues with deleting it and then re allocate the space to the data store where the Windows 2008 VM lives, so I can consolidate the disks?

Will there be an issue with having datastores over 1.9 TB?  I'm still not sure what the end storage is.  I can't tell if it is SAN or the servers where the ESXi hosts live.

Tags (2)
0 Kudos
21 Replies
vFouad
Leadership
Leadership

ESXi and vCenter will not have an issue with a datastore over 2TB... that was last an issue in the 5.x days...

Your array on the other hand may vary... or may need a firmware upgrade... depends on what it is and how old etc...

As far as the disk consolidation is concerned..

Can you try consolidate with the VM powered off?

This process will be faster and you will not need additional space.

If the VM is on, essentially what happens is you create and run on a new snapshot while you clean up the old ones... then you take and run on a new, snapshot and clean up the one you took at the start of consolidation... and so on till you can stun the VM and consolidate the new snapshot within a vm stun...

Depending on how much data change there is and how fast the array is... this may never happen...

The other option is to clone the disks one by one from CLI... see the article: VMware Knowledge Base

There may also be some orphaned/stale snapshots... if you need help support will happily help you investigate the snapshot chain... also see this different article: VMware Knowledge Base

Thanks,

Fouad

0 Kudos
71rellimcm39
Contributor
Contributor

Yes, I have tried to power off this VM, and try to consolidate disks and it fails.

I've tried to vMotion to a different ESXi host and try to consolidate disks and it continues to fail.

I'm new to ESXi CLI, however I've worked as a Red Hat Linux Admin, so I'm not stranger getting around on the CLI.  I've noticed a few of the ESXi hosts have a .lck file, tied to this VM, which means a locked file.  I want to be careful as I don't want to do anymore damage as we aren't able to backup al of our VMs currently too and the last good backups were made quite some time ago.

And no I haven't tried to create a new snapshot for this VM and then delete the snapshot right away or try to consolidate the disk either.  As everything I try continues to fail.  I didn't know the top command for ESXi, as I wanted to watch the processes from the CLI while I tried different things.  I found a ESXCLI cheat sheet and will follow those today as I continue to troubleshoot.

I will report back.

thanks

0 Kudos
vFouad
Leadership
Leadership

If you are thinking it is a lock on the file from your backup process the vmfsfilelockinfo tool is your friend:

VMware Knowledge Base

Good luck and let us know when you succeed!

Thanks,

Fouad

0 Kudos
71rellimcm39
Contributor
Contributor

Does it matter that 4 of the disks for this VM live on one data store and 1 other disk lives on another data store?

0 Kudos
vFouad
Leadership
Leadership

That shouldn't make any difference.

The consolidation is on a per VMDK basis...

Kind regards,

Fouad

0 Kudos
71rellimcm39
Contributor
Contributor

Using SSH, I remoted into the ESXi host and where that VM lives, I noticed there are a ton of the following files: 

ctk.vmdk

delta.vmdk

flat.vmdk

I found this thread on the VMWare Community Forums. 

Cannot consolidate disk

I'm not sure what a descriptor file is off the top of my head.

Also stated to use vmkfstools command against the .vmdk files, as I'm not even sure what I'm looking for to get these disks to consolidate.

0 Kudos
vFouad
Leadership
Leadership

So Virtual disks consist of typically 2 parts...

The data part VM-name-flat_1.vmdk or VM-name-delta-00001_1.vmdk

and a descriptor VM-name_1.vmdk or VM-name-00001_1.vmdk

The descriptor file is a plain text file that tells you about it

so it will have things like the data file it points to, the PID and CID (parent ID and Child ID), a hint to it's previous in chain file....

with the vmkfstools -p you are basically looking to see if any of the files are locked, i.e something else is using the file, thus blocking the consolidation Smiley Happy

0 Kudos
71rellimcm39
Contributor
Contributor

Ok,now that I've really dug into this, I don't think it is a locked file.  It is a lack of space on the Data Store issue.

I tried to run consolidate disks from vSphere and watch the process from the ESXi hosts with the following command:

tail -f /var/log/hostd.log

When I look at all of the .vmdk and the .delta-vmdk files, which I believe are in KB size

vmfoo.vmdk - 732

vmfoo1.delta.vmdk - 9967767552

vmfoo2.delta.vmdk - 35655680

vmfoo3.delta.vmdk - 18878464

vmfoo4.delta.vmdk - 18878464

vmfoo5.delta.vmdk - 670000222208

vmfoo6.delta.vmdk - 18878464

vmfoo7.delta.vmdk - 2101248

vmfoo8.delta.vmdk - 18878648

vmfoo9.delta.vmdk - 2101248

vmfoo10.delta.vmdk - 2101248

vmfoo11.delta.vmdk - 2101248

vmfoo12.delta.vmdk - 2101248

vmfoo13.delta.vmdk - 2101248

vmfoo14.delta.vmdk - 2101248

vmfoo15.delta.vmdk - 2101248

vmfoo16.delta.vmdk - 2101248

The Data Store only has 2.8 TB allocated with 1.2 TB free.  I have two other Data Stores that are 1.9 TB free, however I don't know if it would be worth adding that free space into this data store to make a difference or not if I were to try and consolidate disk.  Won't it need x2 more space  in order to complete the consolidate successfully?

EDIT:  Also all of the .vmdk and .delta-vmdk files has a Lock entry at the top of the file, however from the info listed, I'm not sure what to make of it.  From some of the examples I've seen online, I don't see a reference to a MAC Address, for another ESXi hosts.

https://www.vmadmin.co.uk/resources/35-esxserver/411-disk-consolidation-needed-unable-to-access-file...     

thanks

0 Kudos
vFouad
Leadership
Leadership

so can the VM be powered off for a period of time?

if yes... then consolidation should not be a space problem...

If no, then you will need to make sure there is some free space.. x2 is the maximum, assuming every block on the disk will change during the time of the snapshot consolidation... So you need think about how much data will change on the VM... so if you can't take an outage on the VM, can you limit the data change or the running services?

Most of those files are pretty small so the should be quick... the 670000222208 file is the biggest one... and that may take some time depending on other datastore I/O

Given that the 16 disk is very small I'm wondering about your snapshot chain and it's order...

If you want to post all of the descriptor vmdks (the small files) we can check them and see if they are all in the snapshot chain...

Or if you have a VMware SR, message me the SR number and I'll give you a call and we can check the information over zoom or something....

The 18878648 files are just broken snapshots... no data... 18MB, but they may be in the chain... consolidate should clean them away...

if the consolidation starts you can monitor the task with a vim-cmd vimdvc/task_list and then looking at the task...

VMware Knowledge Base

and you should be able to monitor progress with:

  • Log in as root to the ESX/ESXi host using SSH.
  • Run the esxtop command.

    Note: This command works only if the virtual machine is powered on. 

  • Press V to see only running virtual machines.

    Note: This is not the same as using the v option.

  • Find the virtual machine running the consolidation.
  • Type e to expand.
  • Enter the Group World ID (value from GID column).
  • Press Enter.
  • Make a note of the World ID (ID column) of the snapshot consolidation process:

    • The process is called something like vmx-SnapshotVMX.

  • Type u to display the disk device statistics.
  • Type e to expand and enter the device where the snapshot consolidation process is writing to.

    For example:

    naa.xxx value

    Notes:
    • For a regular vmdk file, the device is the datastore that the flat file is located.
    • For a flat vmdk, identifying the datastore device ID can be done by running esxcfg-scsidevs -m.
    • For RDM, the vmkfstools -q against the pointer file reveals the vml ID, which needs to be correlated with the output of ls -l /vmfs/devices/disks/ to get the device ID.

  • Identify the Group World ID from step 6.

    Note: You may need to sort by MBREAD/s ( press R) or MBWRTN/s (press T) to see the process at the top of the screen.

  • Review the number IOPS and throughput for the Consolidation process (WRITES/s and MBWRTN/s columns) to ensure that there is activity and the process is working.

The KB with the above info appears to be broken I'll see if I can fix it sometime tomorrow...

How to monitor snapshot deletion using esxtop command (2146232)

Kind regards,

Fouad

0 Kudos
71rellimcm39
Contributor
Contributor

The VM is currently powered off as it has trouble operating correctly when powered on.

However even if the VM, which is having consolidation issues, is powered off, and I try to consolidate disks, it still continues to fail.  When I watched the logs from ESXi host, it stated that it didn't have enough space.

At this time, there are no snapshots tied to this VM, as we deleted all of them and then we started to have issues with consolidate disks.  Tried to consolidate disks and it won't do it.  I don't get a specific numeric error number, just that the is not enough space.

0 Kudos
vFouad
Leadership
Leadership

Sounds like you need a little more help, do you have a VMware SR? can you open one?

I'll happily jump on a zoom session with you, if you direct message me your SR number.

Kind regards,

Fouad

0 Kudos
a_p_
Leadership
Leadership

I'd suggest that you either follow vFouad​'s offer for a live session, or provide more information.

At this time, there are no snapshots tied to this VM, as we deleted all of them and then we started to have issues with consolidate disks

You may not see the snapshots anymore in the Snapshot Manager, but if a Consolidation message shows up, there are snapshots involved.

As a first step, please run ls -lisa in the VM's folders on both involved datastores, and post the command's output along with the free disk space on each of these two datastores.


André

0 Kudos
71rellimcm39
Contributor
Contributor

So, I was hoping to solve this issue with the info gleaned from this thread, however we aren't there yet.

So need to say this, this VM is in an air-gapped environment, and I won't be able to show the output of some of the commands, so I'll type them out here the best I can.

On host esxi2, here is the output for

ls -lisa

/vmfs/volumes/565ca4f3-d83367b3-44e8-8cdcd414d178/foo

88 files total

On host esxi4, here is the output for

ls -lisa

/vmfs/volumes/565e5dd5-c7fcea28-f6c2-8cdcd414d178/foo

9 files total

This is the output for df -h for hosts esxi2

/vmfs/volumes/565ca4f3-d83367b3-44e8-8cdcd414d178/foo

Filesystem  Size  Used  Avail  Used

VMFS-00      1.9T  1.7T  2397G  88%

/vmfs/volumes/VMFS00

However from vSphere for this data store, show the capacity for the Data Store is 2.88 TB and shows free 1.21 TB, which is not reflected in this command.

For host esxi4, this is the output with df -h

Filesystem   Size    Used     Avail       Used

VMFS-01       2.9T    1.4T      1.5 TB     47%

/vmfs/volumes/VMFS01

If I look at this data store in vSphere, this is accurate. 

This VM used to live on the host esxi2, however I vMotioned it to host esxi4 to see if that would fix it. 

And again, this VM has been off during this process. 

0 Kudos
a_p_
Leadership
Leadership

This only help a bit, and from the number of files in the two folders, one can assume that the VM has multiple snapshots.

The point why I was asking for the file listing, was to find out the size of each delta file, and the disk space consumption of each file on disk. Is it possible for you to post the file listing with renamed file names? In this case, only rename the VM's name, but not any suffixes, e.g. with a file like "VMName_1-000001-delta.vmdk" only rename the red part.

André

0 Kudos
vFouad
Leadership
Leadership

So we could verify that the snapshot chain is gone;

if you go to the VM via vCenter; and edit settings.

Then look to see if the option to grow the disk is available or if the disks are greyed out.

If you have the option to grow the disks then there is no snapshot chain.

Then you can safely say that the snapshots are orphaned...

If the disks cannot be changed... then more information is needed.

It may be that only some disks are impacted.... it may be all disks...

Please let us know how you want to continue, but the absence of information is making this very very difficult.

0 Kudos
71rellimcm39
Contributor
Contributor

Here are all of the delta.vmdk files.  I had to type these by hand, as again, this is coming from an air-gapped environment.

--rw------1 root root 2101248 apr 1 01:12 vm2_3_0000014-delta.vmdk

--rw------1 root root 2101248 apr 1 01:35 vm2_3_0000015-delta.vmdk

--rw------1 root root 2101248 apr 1 06:23 vm2_3_0000016-delta.vmdk

--rw------1 root root 2101248 apr 1 06:47 vm2_3_0000017-delta.vmdk

--rw------1 root root 2101248 mar 18 00:41 vm2_3_000008-delta.vmdk

--rw------1 root root 2101248 mar 18 05:31 vm2_3_000009-delta.vmdk

--rw------1 root root 2101248 mar 31 14:48 vm2_3_0000011-delta.vmdk

--rw------1 root root 2101248 mar 31 15:37 vm2_3_0000012-delta.vmdk

--rw------1 root root 2101248 mar 31 20:25 vm2_3_0000013-delta.vmdk

--rw------1 root root 18878464 apr 7 18:15 vm2_3_0000018-delta.vmdk

--rw------1 root root 18878464 apr 8 2019 vm2_3_000003-delta.vmdk

--rw------1 root root 18878464 aug 9 2019 vm2_3_000004-delta.vmdk

--rw------1 root root 18878464 mar 18 00:17 vm2_3_000007-delta.vmdk

--rw------1 root root 18878464 mar 31 14:48 vm2_3_0000010-delta.vmdk

--rw------1 root root 35655860 aug 8 2019 vm2_3_0000002-delta.vmdk

--rw------1 root root 52432896 aug 13 2019 vm2_3_0000005-delta.vmdk

--rw------1 root root 670000222208 mar 17 17:07 vm2_3_0000006-delta.vmdk

--rw------1 root root 9967767552 mar 17 19:29 vm2_3_000001-delta.vmdk

I've also increased the data store to 4.3 TB and its still probably not enough space to consolidate the disks as it failed.  I'm thinking it may need 8 TB in order to

successfully consolidate the disks. 

0 Kudos
71rellimcm39
Contributor
Contributor

Again, this is a Windows 2008 Server, here is the current disk breakdown:

HD1 64 GB allocated, thin provision - not greyed out, able to add space

HD2 200 GB allocated, thin provision - not greyed out, able to add space

HD3 300 GB allocated, thin provision - not greyed out, able to add space

HD4 1 TB allocated, thin provision - greyed out, not able to add space

HD5 1 TB allocated, thin provision - greyed out, not able to add space

0 Kudos
a_p_
Leadership
Leadership

Ok, I see. It would certainly be a pain to type all files by hand. So let's see whether we can reduce this.

Please run ls -lisa *-flat.vmdk on both datastores, so that you have the details of all 5 virtual base disks.

The check whether the flat files have are thin, or thick provisioned, by comparing the provisioned size (in bytes), and the used disk space (the second column in kB). If the used disk space in kB matches the provisioned size (you need to divide the displayed size by 1024), then the disks are thick provisioned, and - if all virtual disks are thick provisioned - you will need no additional  free disk space, if you delete the Snapshots from the Snapshot Manager using the "Delete All" option while the VM is powered off.

If the used disk space is less than the provisioned size, then you the flat.vmdk file may grow up to its provisioned size, which means that you will need temporary disk space. This can be anything between zero, and the difference between the provisioned size minus used disk space.

If all of the VM's virtual disks are thick provisioned, and you don't see a snapshot in the Snapshot Manager, you may simply create one just to enable the "Delete All" option.


André

0 Kudos
71rellimcm39
Contributor
Contributor

Thanks for the response.  Here is the output:

from esxi host2


ls -lisa *-flat.vmdk

(provisioned size)     (used disk size)

432046212 785289520 -rw------ 1 root root 1099511027776 Mar 30 19:19 vmfoo_2_flat.vmdk (used space is greater then provisioned space) 

from esxi host 4


ls -lisa *-flat.vmdk

16810116 51030016 -rw------ 1 root root 68719476736 Apr 7 21:56 vmfoo_flat.vmdk  (used space is greater then provisioned space)

25198724 209693696 -rw------ 1 root root 214748364800 Apr 7 19:52 vmfoo_1_flat.vmdk (used space is greater then provisioned space)

33587332 257932288 -rw------ 1 root root 32212254700 Apr 7 19:55 vmfoo_2_flat.vmdk (used space is greater then provisioned space)

41975940 506316800 -rw------ 1 root root 1099511627776 Aug 27 2019 vmfoo_3_flat.vmdk (used space is greater then provisioned space)

121667716 32768 -rw------1 root root 1073741824 Dec 4 2015 vmfoo_4_flat.vmdk (provisioned space is greater then used space)

To me, if I'm reading this correctly, the provisioned size is the number on the left and the used disk space is number to the right.  For all of the flat.vmdk the used disk space is greater then the provisioned size, with the exception of the last disks vmfoo_4_flat.vmdk. 

Also, the size from the provision size and used disk size should prove that all disks are thin provisioned, not thick. 

So if I'm understanding correctly, subtract the used disk space from provisioned space and this will tell me how much space I will need to grow the data store in order to consolidate disks, correct?

thanks

0 Kudos