TDSAdmin
Contributor
Contributor

Snapshot files not compacting after disk consolidation

A user reported slow network I/O on a Windows server hosted on esxi 7.0u3c. While investigating, the VM locked up completely and I required an answer to a question.

"There is no more space for virtual disk '<name of disk>.vmdk'. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session."

I was able to determine the source of the issue was a lack of available disk space in one of the datastores.

This server was set up using thin provisioning for all datastores on version 6.5. Several upgrades later and a bug in esxi 7.0 was causing purple screens while using thin provisioning. Downgrading wasn't an option and there were no fixes available at the time. (I believe this has been fixed in 7.0u3d.) The only option was to convert all thinly provisioned disk images to thick provisioning with the intention of reversing this later.

Fast forward to today. My thick provisioned images are now causing issues due to storage constraints. Something has caused a large snapshot to be created and I cannot consolidate the disks presumably due to the lack of available free space remaing on the datastore.

I've tried:

  1. Consolidating disks through the esxi UI. This completes successully (or so it says), but the *-000001.vmdk files remain.
  2. Create a new snapshot and then Delete All. Again, this completes successfully according to the UI, but the snapshot files remain.
  3. Attaching external storage via USB with hopes to create a datastore extent to allow the consolidation to complete, however esxi doesn't play well with USB and I cannot get access to the drive. (The drive is visible using lsusb from the CLI, but it doesn't appear as a storage device in the UI.)

 

2022-08-12T08:08:33.533Z cpu6:1049018)vmkusb: umass_attach:1123: umass_attach: Attach device cached_name NULL, cached data ff
2022-08-12T08:08:34.535Z cpu6:1049002)vmkusb: umass_watchdog:1015: umass_watchdog: Register SIM for New Device with 0 sec(s) delay
2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1284: umass_detach: Device umass0 is detaching
2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1300: umass_detach: Detaching umass0 with cached_name NULL, adapter name Invalid, is_reserved 0
2022-08-12T08:08:34.536Z cpu0:1049007)WARNING: ScsiPath: 9487: Adapter Invalid does not exist
2022-08-12T08:08:34.536Z cpu0:1049009)DMA: 687: DMA Engine 'vmhba35' created using mapper 'DMANull'.
2022-08-12T08:08:34.558Z cpu4:1049011)ScsiAdapter: 3418: Unregistering adapter vmhba35
2022-08-12T08:08:34.558Z cpu4:1049011)DMA: 732: DMA Engine 'vmhba35' destroyed.

 

 Without this I'm at a loss. I don't have anymore internal disk adapters so there's no option for adding more internal storage at this time. I'm aware of the 2TB USB disk size limitation within esxi, but that doesn't seem to be the issue here.

Of course, this is a production machine so downtime counts. How can I go about getting these disks consolidated (there's plenty of room inside the base disk images)? Is there a way to get the USB storage option working long enough to expand the datastore? It's my understanding that I only need to add about 1GB to get the consolidation to complete.

0 Kudos
19 Replies
TDSAdmin
Contributor
Contributor

Here's what I see with lsusb.

 

[root@esxi:~] lsusb -d 0781:5575
Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide
[root@esxi:~] lsusb -d 0781:5575 -v

Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x0781 SanDisk Corp.
  idProduct          0x5575 Cruzer Glide
  bcdDevice            1.00
  iManufacturer           1 SanDisk
  iProduct                2 Cruzer Glide
  iSerial                 3 4C530000070415215490
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              200mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         8 Mass Storage
      bInterfaceSubClass      6 SCSI
      bInterfaceProtocol     80 Bulk-Only
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               1
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0000
  (Bus Powered)

The disk is not listed when using esxcli storage core device list.

 

0 Kudos
a_p_
Leadership
Leadership

To understand the current state, please run ls -lisa > filelist.txt in the VM's folder, and attach the filelist.txt along with the output of df -h to your next reply.

Do you have other VMs on the same datastore, which are not as important as this one, that can be shut down for some time? This will free up disk space that's in use for their swap files, which may be sufficient to successfully consolidate the snapshot.

André

0 Kudos
TDSAdmin
Contributor
Contributor

Thanks, André.

I'll have to post back later with the output you have requested. I won't have access to the host until later today again.

To give your an idea what's happening, this server has 2 SSDs and 4 HDDs along with 1 M.2 SSD.

Esxi lives on the M.2 SSD.

The two SATA SSDs are in RAID 1 and they house datastore1 and datastore2. Datastore1 is used for general files (esxi patches, ISOs, etc). Datastore2 houses VMs (one small Linux install for network diagnostics, one Windows 10 workstation, and one Windows 2019 Domain Controller).

The 4 HDDs are each configured with their own individual datastore. Each one is attached to the domain controller as a separate drive which is in software RAID using Windows Storage Spaces.

SSD1/SSD2

        Datastore1

        Datastore2

                DC.mydomain.local/

                        DC.mydomain.vmdk

                        DC.mydomain-000001.vmdk

HDD1

        DC.mydomain.local/

                Disk1.vmdk

HDD2

        ...

HDD3

        DC.mydomain.local/

                Disk3 vmdk

                Disk3-000001.vmdk

HDD4

        ...

 

This issue is with disk 3. The volume takes up the virtually the entire physical disk. The disk image is thick provisioned. The snapshot image has reached around 6.4GB and there's no longer enough space on the disk to perform the consolidation.

0 Kudos
TDSAdmin
Contributor
Contributor

I should also add that I was able to finally get USB drives recognized. I must have made a typo the first few times, but I finally got usbarbitrator disabled and they now appear in storage disks. My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.

0 Kudos
a_p_
Leadership
Leadership

>>> My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.
I strongly recommend against this. Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!

>>> ... the snapshot image has reached around 6.4GB
Is it really GB, or is it TB?

If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.

André

0 Kudos
TDSAdmin
Contributor
Contributor

>>> Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!

I've taken this into consideration. This disk only houses an "attached" disk image. If I can get the disk image consolidated, I can move it to another disk and recreate the datastore and then move it back. (It's also a data disk that's part of a Storage Spaces disk array. It *should* be recreated by the Windows Server if it gets completely destroyed. But not knowing what precisely is staged for consolidation I'm not immediately comfortable just nuking and rebuilding the disk despite the fact that it should be okay to do so.

>>> Is it really GB, or is it TB?

Yep! It's really GB. Sad, right?

>>> If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.

The VM is powered off. The delete indicates that it completes successfully, but the snapshot's delta files are not removed and the chain indicates they are still in use.

0 Kudos
TDSAdmin
Contributor
Contributor

Here's the requested information.

[root@esxi:~] df -h
Filesystem   Size   Used Available Use% Mounted on
VMFS-6     111.8G  12.5G     99.2G  11% /vmfs/volumes/datastore1
VMFS-6     893.0G 893.0G      8.0M 100% /vmfs/volumes/datastore2
VMFS-6       5.5T   5.5T      6.7G 100% /vmfs/volumes/Disk1_T8WEQDLP
VMFS-6       5.5T   5.5T      6.7G 100% /vmfs/volumes/Disk2_T8WEQ26A
VMFS-6       5.5T   5.5T      6.7G 100% /vmfs/volumes/Disk4a_T6NEMH9R
VMFS-6       5.5T   5.5T      0.0B 100% /vmfs/volumes/Disk3a_T9WEKUZ8
VFFS         6.2G   3.4G      2.8G  54% /vmfs/volumes/OSDATA-6193f710-05a1744c-4a23-7c8ae1c668da
vfat       499.7M 173.7M    326.1M  35% /vmfs/volumes/BOOTBANK1
vfat       499.7M 203.4M    296.4M  41% /vmfs/volumes/BOOTBANK2

Disks 1 -4 are part of a striped array. They *should* all be roughly the same size. For whatever reason, Disk 3a is the only one with a delta file and it's enough to fill the physical disk space.

0 Kudos
TDSAdmin
Contributor
Contributor

Can I consolidate and convert this to thin provisioning while migrating with vMotion? This standalone host, but it does have the Essentials license. I'm thinking I could set up another host and migrate this VM to it with vMotion and them migrate it back. Seems like overkill, but it could be a means to an end. I've never used vMotion before so I'm not completely sure of its capabilities.

0 Kudos
a_p_
Leadership
Leadership

I could be wrong, but for me it rather looks like an issue with datastore2 rather than datastore3.

According to the files' time stamps, and sizes, it seems that the consolidation starts on the thin provisioned virtual disk on datasatore2 (same time stamp "Aug 12 03:41" for the flat, and sesparse files), but does not succeed due to the lack of free disk space, and subsequently stops the consolidation process. The virtual disk on datatore3 does not even seem to be touched.

André

0 Kudos
TDSAdmin
Contributor
Contributor

I'm not sure where to go from here. That datastore has free space available. I even deleted all the other VMs. Is there any kind of logging? I couldn't really find anything.

0 Kudos
a_p_
Leadership
Leadership

According to the df command, it's almost full (8.8MB available).

Is there a chance to temporarily add an additional SSD/HDD (>=1TB) that could be used to manually clone the virtual disk on datastore 2?
What I'm think ing of is to evacuate datastore1 + 2 (backup required files), then delete both datastores, and create a single, larger datastore on the SSD RAID, to which the cloned virtual disk, and the backed up files could be migrated back.

André

0 Kudos
TDSAdmin
Contributor
Contributor

I'm following along. I don't have any way to add another SSD/HDD. There are no additional SATA ports available. My only current option for expansion is USB.

I'm looking at the possibility of adding some PCIe SATA expansion cards, but I'm having a difficult time narrowing down compatible hardware.

Datastore1 is mainly used for oddball storage. Mostly esxi patches and ISOs. I could probably offload those files and delete datastore1. Datastore2 could be expanded to take up the returned space.

I'm also looking at the option of adding some NAS storage, but I can't find any good information about support within esxi and looking through menus, I didn't see any obvious way to mount it.

0 Kudos
TDSAdmin
Contributor
Contributor

>>> According to the df command, it's almost full (8.8MB available).

The datastares are all sized to fill all available physical disk space. The df command would show that even if the datastores were empty, wouldn't it?

0 Kudos
TDSAdmin
Contributor
Contributor

Sorry for the delay, but getting parts proved to be rather interesting and I had a backorder. I've added 3 12TB disks to the system.

Will I be able to move the VMs before they are consolidated?

I'm thinking I could create another datastore and move the affected VMs to that one. I believe the disks will consolidate during the move if I'm understanding the process.

The other option involves using the extra disks as extents, performing the disk consolidation, and then trying to get the disk images converted from thick provisioning back to thin provisioning.

0 Kudos
a_p_
Leadership
Leadership

I'd create another datastore, and use the Migration Wizard to migrate the VM to the larger datastore. Since you have an Essentials license (i.e. no vMotion license), this will require some downtime. However, it's likely the most secure way to resolve the current situation.

André

0 Kudos
TDSAdmin
Contributor
Contributor

Silly question. How do I use the Migration Wizard? I don't see it in the web interface. Will it start if I "move" the VM folder in the Datastore Browser?

0 Kudos
a_p_
Leadership
Leadership

Sorry, my bad. I somehow missed that you do run the ESXi hosts as a standalone host.
The Migration Wizard is available in vCenter Server only. Any chance that you deploy vCenter Server?

André

0 Kudos
TDSAdmin
Contributor
Contributor

I don't have a vCenter Server, but I do have a license. Is that a direction I need to go?

0 Kudos
a_p_
Leadership
Leadership

If you do have the required resources (RAM, CPU, storage), deploying a vCSA might be the easiest way to resolve the situation.
There are of course other alternatives, which however require running CLI commands, and editing the configuration file.

André

0 Kudos