A user reported slow network I/O on a Windows server hosted on esxi 7.0u3c. While investigating, the VM locked up completely and I required an answer to a question.
"There is no more space for virtual disk '<name of disk>.vmdk'. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry. Click Cancel to terminate this session."
I was able to determine the source of the issue was a lack of available disk space in one of the datastores.
This server was set up using thin provisioning for all datastores on version 6.5. Several upgrades later and a bug in esxi 7.0 was causing purple screens while using thin provisioning. Downgrading wasn't an option and there were no fixes available at the time. (I believe this has been fixed in 7.0u3d.) The only option was to convert all thinly provisioned disk images to thick provisioning with the intention of reversing this later.
Fast forward to today. My thick provisioned images are now causing issues due to storage constraints. Something has caused a large snapshot to be created and I cannot consolidate the disks presumably due to the lack of available free space remaing on the datastore.
2022-08-12T08:08:33.533Z cpu6:1049018)vmkusb: umass_attach:1123: umass_attach: Attach device cached_name NULL, cached data ff 2022-08-12T08:08:34.535Z cpu6:1049002)vmkusb: umass_watchdog:1015: umass_watchdog: Register SIM for New Device with 0 sec(s) delay 2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1284: umass_detach: Device umass0 is detaching 2022-08-12T08:08:34.536Z cpu7:1049005)vmkusb: umass_detach:1300: umass_detach: Detaching umass0 with cached_name NULL, adapter name Invalid, is_reserved 0 2022-08-12T08:08:34.536Z cpu0:1049007)WARNING: ScsiPath: 9487: Adapter Invalid does not exist 2022-08-12T08:08:34.536Z cpu0:1049009)DMA: 687: DMA Engine 'vmhba35' created using mapper 'DMANull'. 2022-08-12T08:08:34.558Z cpu4:1049011)ScsiAdapter: 3418: Unregistering adapter vmhba35 2022-08-12T08:08:34.558Z cpu4:1049011)DMA: 732: DMA Engine 'vmhba35' destroyed.
Without this I'm at a loss. I don't have anymore internal disk adapters so there's no option for adding more internal storage at this time. I'm aware of the 2TB USB disk size limitation within esxi, but that doesn't seem to be the issue here.
Of course, this is a production machine so downtime counts. How can I go about getting these disks consolidated (there's plenty of room inside the base disk images)? Is there a way to get the USB storage option working long enough to expand the datastore? It's my understanding that I only need to add about 1GB to get the consolidation to complete.
Here's what I see with lsusb.
[root@esxi:~] lsusb -d 0781:5575 Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide [root@esxi:~] lsusb -d 0781:5575 -v Bus 001 Device 005: ID 0781:5575 SanDisk Corp. Cruzer Glide Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x0781 SanDisk Corp. idProduct 0x5575 Cruzer Glide bcdDevice 1.00 iManufacturer 1 SanDisk iProduct 2 Cruzer Glide iSerial 3 4C530000070415215490 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0020 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 200mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0000 (Bus Powered)
The disk is not listed when using esxcli storage core device list.
To understand the current state, please run ls -lisa > filelist.txt in the VM's folder, and attach the filelist.txt along with the output of df -h to your next reply.
Do you have other VMs on the same datastore, which are not as important as this one, that can be shut down for some time? This will free up disk space that's in use for their swap files, which may be sufficient to successfully consolidate the snapshot.
I'll have to post back later with the output you have requested. I won't have access to the host until later today again.
To give your an idea what's happening, this server has 2 SSDs and 4 HDDs along with 1 M.2 SSD.
Esxi lives on the M.2 SSD.
The two SATA SSDs are in RAID 1 and they house datastore1 and datastore2. Datastore1 is used for general files (esxi patches, ISOs, etc). Datastore2 houses VMs (one small Linux install for network diagnostics, one Windows 10 workstation, and one Windows 2019 Domain Controller).
The 4 HDDs are each configured with their own individual datastore. Each one is attached to the domain controller as a separate drive which is in software RAID using Windows Storage Spaces.
This issue is with disk 3. The volume takes up the virtually the entire physical disk. The disk image is thick provisioned. The snapshot image has reached around 6.4GB and there's no longer enough space on the disk to perform the consolidation.
I should also add that I was able to finally get USB drives recognized. I must have made a typo the first few times, but I finally got usbarbitrator disabled and they now appear in storage disks. My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.
>>> My thought was to format one with vmfs and use it to create an extant for Disk3's datastore.
I strongly recommend against this. Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!
>>> ... the snapshot image has reached around 6.4GB
Is it really GB, or is it TB?
If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.
>>> Please note that USB devices are not supported as VMFS datastores (even if you get them to work), and that there's no way to remove an extent anymore without reformatting the datastore!
I've taken this into consideration. This disk only houses an "attached" disk image. If I can get the disk image consolidated, I can move it to another disk and recreate the datastore and then move it back. (It's also a data disk that's part of a Storage Spaces disk array. It *should* be recreated by the Windows Server if it gets completely destroyed. But not knowing what precisely is staged for consolidation I'm not immediately comfortable just nuking and rebuilding the disk despite the fact that it should be okay to do so.
>>> Is it really GB, or is it TB?
Yep! It's really GB. Sad, right?
>>> If you cannot consolidate the snapshot online, you may consider to bite the bullet, and schedule some downtime to try, and delete the snapshot.
The VM is powered off. The delete indicates that it completes successfully, but the snapshot's delta files are not removed and the chain indicates they are still in use.
Here's the requested information.
[root@esxi:~] df -h Filesystem Size Used Available Use% Mounted on VMFS-6 111.8G 12.5G 99.2G 11% /vmfs/volumes/datastore1 VMFS-6 893.0G 893.0G 8.0M 100% /vmfs/volumes/datastore2 VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk1_T8WEQDLP VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk2_T8WEQ26A VMFS-6 5.5T 5.5T 6.7G 100% /vmfs/volumes/Disk4a_T6NEMH9R VMFS-6 5.5T 5.5T 0.0B 100% /vmfs/volumes/Disk3a_T9WEKUZ8 VFFS 6.2G 3.4G 2.8G 54% /vmfs/volumes/OSDATA-6193f710-05a1744c-4a23-7c8ae1c668da vfat 499.7M 173.7M 326.1M 35% /vmfs/volumes/BOOTBANK1 vfat 499.7M 203.4M 296.4M 41% /vmfs/volumes/BOOTBANK2
Disks 1 -4 are part of a striped array. They *should* all be roughly the same size. For whatever reason, Disk 3a is the only one with a delta file and it's enough to fill the physical disk space.
Can I consolidate and convert this to thin provisioning while migrating with vMotion? This standalone host, but it does have the Essentials license. I'm thinking I could set up another host and migrate this VM to it with vMotion and them migrate it back. Seems like overkill, but it could be a means to an end. I've never used vMotion before so I'm not completely sure of its capabilities.
I could be wrong, but for me it rather looks like an issue with datastore2 rather than datastore3.
According to the files' time stamps, and sizes, it seems that the consolidation starts on the thin provisioned virtual disk on datasatore2 (same time stamp "Aug 12 03:41" for the flat, and sesparse files), but does not succeed due to the lack of free disk space, and subsequently stops the consolidation process. The virtual disk on datatore3 does not even seem to be touched.
According to the df command, it's almost full (8.8MB available).
Is there a chance to temporarily add an additional SSD/HDD (>=1TB) that could be used to manually clone the virtual disk on datastore 2?
What I'm think ing of is to evacuate datastore1 + 2 (backup required files), then delete both datastores, and create a single, larger datastore on the SSD RAID, to which the cloned virtual disk, and the backed up files could be migrated back.
I'm following along. I don't have any way to add another SSD/HDD. There are no additional SATA ports available. My only current option for expansion is USB.
I'm looking at the possibility of adding some PCIe SATA expansion cards, but I'm having a difficult time narrowing down compatible hardware.
Datastore1 is mainly used for oddball storage. Mostly esxi patches and ISOs. I could probably offload those files and delete datastore1. Datastore2 could be expanded to take up the returned space.
I'm also looking at the option of adding some NAS storage, but I can't find any good information about support within esxi and looking through menus, I didn't see any obvious way to mount it.
>>> According to the df command, it's almost full (8.8MB available).
The datastares are all sized to fill all available physical disk space. The df command would show that even if the datastores were empty, wouldn't it?
Sorry for the delay, but getting parts proved to be rather interesting and I had a backorder. I've added 3 12TB disks to the system.
Will I be able to move the VMs before they are consolidated?
I'm thinking I could create another datastore and move the affected VMs to that one. I believe the disks will consolidate during the move if I'm understanding the process.
The other option involves using the extra disks as extents, performing the disk consolidation, and then trying to get the disk images converted from thick provisioning back to thin provisioning.
I'd create another datastore, and use the Migration Wizard to migrate the VM to the larger datastore. Since you have an Essentials license (i.e. no vMotion license), this will require some downtime. However, it's likely the most secure way to resolve the current situation.
Sorry, my bad. I somehow missed that you do run the ESXi hosts as a standalone host.
The Migration Wizard is available in vCenter Server only. Any chance that you deploy vCenter Server?
If you do have the required resources (RAM, CPU, storage), deploying a vCSA might be the easiest way to resolve the situation.
There are of course other alternatives, which however require running CLI commands, and editing the configuration file.