VMware Cloud Community
parthmaniar
Enthusiast
Enthusiast

Disk consodilation does not finish

I am running ESXi version 7.0 Update 2. I have a VM that has its own dedicated physical SSD mounted as a partition on the VMM (Ubuntu 20.04.4).The VM had only one snapshot which I have deleted. However, I am unable to consoidate the disks. There is no error at the end of the process but both of the vmdk's are present on storage SSD.

I only have 200 GB left on the SSD. How do I consolidate and reclaim the diskpace?

Labels (1)
Tags (1)
Reply
0 Kudos
15 Replies
a_p_
Leadership
Leadership

Please run ls -lisa from the command line in the VM's folder to determine the used disk space for the different files, especially the sesparse file.

If its size is less then the free disk space (df -h) then it's usually safe to create another temporary snapshot, and run "Delete All" right after this.

Also please ensure that no other application (e.g. backup) still locks one of the VM's files.

André

parthmaniar
Enthusiast
Enthusiast

Hello and thank you very much for your reply.

 

1. Disk space free on the volume is ~275 GB.

2. Size of the snapshot + current disk = ~1.4 TB (758 [disk in use] + 850 [snapshot] GB each).

Given that the volume will overflow in sometime, what is the best course of action? I am a self-funded student and I do not have additional disks to space or upgrade to. Is there a way to consolidate disks if the VM is shutdown (I will try it now, although I feel I have tried it earlier). Update: Consolidation failed 😞

 

Kindly help and thank you.

Reply
0 Kudos
a_p_
Leadership
Leadership

If it is possible to free up the 2TB-SATA-SSD-2, so that it has at least 1.5 TB free disk space, you could then manually clone the virtual disk in a way that would consolidate the snapshot. If that works, you could then delete the .vmdk files on the 2TB-SATA-SSD datastore, and clone the consolidated virtual disk back to it, or simply modify the VM's settings, and use the cloned virtual disk.

André

parthmaniar
Enthusiast
Enthusiast

Hello and thank you once again.

Given the situtaion wherein I will start losing 4 years of research if I can't store the incoming data, I can delete the files on 2-TB-SSD-SATA-2. It infact holds copy of the data from 2-TB-SSD on a parallel VM.

However, I feel I need handholding. What should the steps be?

1. Delete all files on disk 2 in order to have full 1.5 TB.

What is the next step? I m not sure how to keep the "in-use" disk on SSD-1 and copy on SSD-2. Here is a snapshot of total storage on the ESXi box.

 

Thank you very much once again.

Reply
0 Kudos
a_p_
Leadership
Leadership

In order to find out what option we have, please run

for file in /vmfs/volumes/?TB* ; do ls -lisaR $file/* ; done >> /tmp/filelist.txt

and attach the filelist.txt to your next reply. This will provide an overview of all files, and help to find possible options.

Do you have an up-to-date backup, just for the case that something unexpected happens?

André

parthmaniar
Enthusiast
Enthusiast

Thank you very much for hand holding. Please find the file attached.

Second I have a data analytics cluster that captures live data for my MSc research.

Further, I've got redundant architecture (attached) wherein three VMs share a single SSD holding the OS (which is backed up). Data that is being ingested is stored on two seperate SSDs.

I can delete it to carry out the activity you've asked and resync the nodes which will write the data. However, my worry is that I am deleting second copy of the data to repair the first one hence I need to make copy of the vmdks. Is there an easy way to copy the vmdks? Can I attach a USB to copy the data from the esxi host?

Lastly ALL VMs except those starting with SSS are disposable.

 

Thank you very much once again.

Reply
0 Kudos
a_p_
Leadership
Leadership

Since you say that this is a cluster, please allow me just one more quick question.

The snapshot has been created on April, 21st.
What I'm thinking of, is to make the second the active one, the "reset" the first VM's data disk (i.e. drop the sesparse file), and resynchronize the data from the second VM? Just asking to see whether this could be an option.

André

parthmaniar
Enthusiast
Enthusiast

Hello Andre, unfortunately, I don't understand the question. The snapshot was for the primary disk (OS disk); the data disk (causing the low space issue) is not required. Hence if I may ask (and pardon my naivety) - Can i delete the older disk? I will never need to revert to it (in fact, I've deleted the snapshot already). So pardon me for not getting your question. 😞 -- I'm not a ESXi or VMWare expert 😞


There is a minor caveat in this being a cluster - the host in concern is the "primary" host, and while I have restored the secondary host's data disk, I've never done it for the primary host.

 

I will tell you what my stupid idea was:

 

1. Delete the data on SSD-2

2. Delete the volume on SSD-2

3. Expand disk space for SSD-1

4. Consolidate the disks

5. I don't know this part hence didn't try: remove the disk-2 from the new volume capacity (revert to 2 TB volume) for disk-1

6. Recreate volume on SSD-2

7. Sync the cluster

 

 

Reply
0 Kudos
a_p_
Leadership
Leadership

My question was basically whether it is possible to synchronize the primary cluster node's data from the secondary node?

>>> the data disk (causing the low space issue) is not required
I'm not sure if I understand this correctly. Why would you have a 1.5TB virtual disk (the one with the active snapshot), if the data on it is not required?

>>> Can i delete the older disk?
Assuming you are referring to the disk with the snapshot, then no, you cannot do this.
Snapshots in VMware products only contain modified data blocks. A snapshot without its parent(s) is useless. What "Delete Snapshot" does, is to merge the modified data blocks from the separse file into its parent (the flat file in this case). With a thin provisioned virtual disk, the flat file may grow up to its provisioned size, depending on the data in the sesparse file.

Regarding step 5: It's unfortunately not possible to remove an extent from a VMFS volume without destroying the whole VMFS datasore.

If synchronization is only possible from the primary node to the secondary node, and not vice-versa, the the following steps may be something we should discuss:

  1. shutdown the secondary cluster node, and remove (delete) its 1.5TB data disk from the VM's configuration (don't delete the files directly from the datastore!)
  2. shutdown the primary cluster node, and clone its 1.5TB data disk to SSD-2 with consolidating the snapshot
  3. modify the primary node's settings, so that it points to the cloned virtual disk on SSD-2
  4. check whether the VM works as expected
  5. delete the VM's folder with the data disk on SSD-1
  6. optional: migrate/clone the data disk back to SSD-1 (requires additional downtime)
  7. add a new virtual disk to the secondary node, and synchronize the data

Instead of migrating/cloning the virtual disk back to SSD-1 (step 6) you could also place the secondary node's new data disk on SSD-1, and simply rename the datastores if wanted.

If the above steps make sense to you, we can proceed with putting together the detailed steps, and commands.

André

parthmaniar
Enthusiast
Enthusiast

Ah apologies for the poor communication from my end.

 

>>> My question was basically whether it is possible to synchronize the primary cluster node's data from the secondary node?

This is theoratically possible but since primary node "may" hold additional security (authenticaiton/RBAC) data, I am not sure. The application I am using is the Elastic Stack. I have a parallel thread open on their forum to understand if secondary to primary can be done without error or can I have pre-checks to help me mitiage impending issues.

 

>>> Deleting of the disk was a bad idea. I did know it would be changed blocks but I am not sure if it is desperation or exhaution. Thank you for highlighting it again.

 

>>> New idea (only because what you've listed is something I'm still trying to grasp).

 

1. I delete the VM disk (from configuraion) for node-2. Hence SSD-2 has 1.8 TB free.

2. Delete the datastore on SSD-2.

3. Expand datastore for VM-1 (primary node) on SSD-1 with SSD-2.

4. Hopefully - Total space is ~3.6 TB with ~1.7 TB free space.

5. Consolidate the disk.

6. New disk should be around ~700/800 GB with I can transfer to another datastore (One on the NAS as 927.55 TB free space)

7. Delete the newly created (expanded) datastore in step 3.

8. Create two independent datastores of 1.8 TB (each SSD).

9. Transfer the file back to SSD-1

10. Sync the hosts?

 

Is this sounding logical or do I need more sleep 😄

Reply
0 Kudos
a_p_
Leadership
Leadership

No need to apologize. I'm asking all the questions to try, and understand your setup, and to make sure that we find a working solution.

The steps you mention may work, but I have some doubts:

  • I never tried to extend a local SSD with another. This may, or may not work.
  • The backup node's virtual disk is ~1.2TB, so that the disk space on the NAS may not be sufficient.

Now please don't get me wrong, I'm not insisting on the steps I mentioned, but the virtual disk cloning from SSD to SSD is definitely way faster, than to the NAS (assuming that you don't have a 10gbps network, and a fast NAS). I also took a possible failback scenario into consideration, i.e. the source for the clones will only be deleted, after verifying that the target works as expected.

André

parthmaniar
Enthusiast
Enthusiast

Hello, and thank you very much for the extended handholding. I sincerely appreciate it.

I know your answer is the same, and I am afraid of the "cloning" part since I've never done it and my specialisation in hypervisors is academic. I have, however, come up with a new idea (I feel like Arturo Román from the Money Heist, who kept coming up with new destructive ideas.) My reason is that I know the primary node holds one good copy of all the data, so I don't want to touch the disks.

Here is my final proposal before trying precisely what you proposed: 

1. Shutdown the cluster (all 3 VMs)

2. Reconfigure the primary host and add a "new" disk on a different volume (thus keeping the original file [vmdk] and volume [on ssd-1] as is.

3. Check if the sync works from the secondary host to the primary host.

4. If it works, I will have a new disk of ~700 GB that I can copy & paste to SSD-1 (after deleting other files on it)

5. Done!

IF IT FAILS

5. I can reconfigure the primary host to point to the files on SSD-1 (is such a reconfiguration possible?) 

Given my last year in college and a few months to my dissertation, I am contemplating buying a 4 TB disk to ensure I don't lose 4 years' worth of data. I do have data backup on the NAS, but it is not tested, and an untested backup is as good as no backup - hence I am not venturing there.  

 

Additional question

>>> Could I use this thread to inquire about upgrading my ESXi, which keeps failing? I have a Dell Precision workstation that I'm using ESXi 7.0.2 ((Updated) DEL-ESXi-702_17630552-A00 (Dell Inc.)). Since I am reading for a master's in software and systems, I may be asked to clarify why I did not use the most recent version of the hypervisor. Instead of documenting known CVEs, I hoped to upgrade the hypervisor. I use one custom VIB since I have a consumer-grade second NIC.

 

My NAS has 2 x 1 GB ports running in bond mode; hence total throughput is 2 Gbps (however, it uses HDDs instead of SSDs, hence the total throughput will be much less than the NIC capacity).

Reply
0 Kudos
a_p_
Leadership
Leadership

Do I understand it correctly, that the data usage on the second node's data disk (from within the guest OS) is ~700GB?
Is it about the same on the primary node?
If so I do have another idea which should work without any risks.

André

parthmaniar
Enthusiast
Enthusiast

Its two different sizes: ~893 GB as per the OS and ~993 GB as per ESXi

I've attached screen shots from within the OS as well as from the ESXi.

Reply
0 Kudos
a_p_
Leadership
Leadership

Since there seems to be more data on the disk than free disk space on any of the datastores, I still think the previously mentioned steps will solve the issue. What you could do is to copy/replicate the data on the secondary node to your local PC, or an attached USB disk.
Don't worry about the cloning steps. Cloning is save, because the source will not be modified, so that you can always revert to it, if the cloned disk does not work as expected. That's however less likely, because a clone contains the exact same data, including all permissions etc. because it's done from the Hypervisor, and not from within the guest OS.

André

Reply
0 Kudos