I'm trying to solve an issue where certain VM's are hanging on power on, they will progress to 66% and stall, the time they stall depends on how much RAM the VM has, it seems as though the ESXi Host is reading the datastore vswap file for the entire time whilst hung @ ~100MB/sec. If the VM has 512MB ram, the delay is 10 seconds, if it has 32GB, it can be upto 10 minutes.
I don't think it's normal for these reads to be performed before power on, else we'd see it on every single VM.
Running version 6.7U3, Dell M630 with all firmware upto date, connected via FC to a Unity SAN.
Below is a power on delay with 16GB RAM.
Power on - Start 05/26/2021 12:52:23 Finish 05/26/2021 12:55:02
Hello.
Just to be in sync on ideas
Did you update the VMware Tools on the VMs with problems ?
Did you recommend and install a patch level higher than Update 3 (14320388)? If not you could upgrade to Build 17167734 (ESXi 670-202011002), exactly it is not the latest available, but it is the one I have tested on some customers without problems so far.
The patches can be obtained from the following link
https://my.vmware.com/group/vmware/patch#search
I'd rather not apply any further upgrades unless this issue is specifically listed as a resolved in the notes. The issue is only occurring in 1 datacentre out of 8, and only on certain VM's.
Yes VMware Tools has been updated with no change.
Reserving the VM memory completely eliminates the delay.
1. The swap file is created at power on, in the VM home directory alongside the VMX file, and is (memory size - memory reserved)
2. A disk performance issue? LUN thrashing?
3. What have VMware support said?
1. The swap file is created at power on, in the VM home directory alongside the VMX file, and is (memory size - memory reserved)
My main question is, Why does the VM need to read the entire swap file before booting. On a host with 768GB of ram, with only 1 VM with 16GB Allocated, it still reads this swap file - I don't understand why this is happening. The delay is not the creation of the swap file, the delay occurs after the swapfile is created and the esxi host reads the entire amount (16GB) back at 100MB/sec.
2. A disk performance issue? LUN thrashing?
No, the storage is under no stress during this time, something else is limiting it to 100MB/sec.
3. What have VMware support said?
It's been escalated to the Storage team, I'm awaiting their response, Case #21220866705 if you're interested.
That is very strange, a swap file get's newly created, I have no idea why this would need to be read by the VM during the boot, as the VM itself is not even aware of the swap file normally. it is almost like the swap file is zeroed out during boot, but I never heard about that before to be honest.
Thankyou for understanding and acknowledging this is not normal behaviour, it's quite frustrating to waste time with support being told the issue is a 2 year old BIOS and other completely unrelated things. I feel half sane now.
you could validate swap access through "esxtop" probably, and also, is this a power-on or a restart / reset?
Hello.
All operating systems have a maintenance policy, i.e. the application of patches periodically. Patching improves features, performance and prevents bugs.
If your ESXi hosts are on version 7 update 3, you need to read the following article
https://kb.vmware.com/s/article/76159
............what? No offense, but your responses are the equivalent of "SFC /scannow" responses on Microsoft forums that have nothing to do with the actual issue. The article you linked refers to ESXi Boot issues, not VM boot issues.
Only during power on operations from powered off state.
Thanks, just spoke with an engineer who also spoke to you via Reddit I think, he is investigating it, as this is very odd.
Hello.
Of course your problem is in the boot of some VMs.
I pointed you to KB76159 because you are precisely at that level ESXi 6.7 Update 3 (Build 14320388), not as a solution to your problem.
Update:
vMotioning to any other datastore solved the issue, when vMotioning back to the original datastore, the issue re occured, so I thought the issue could be isolated to a single datastore.
However after migrating all VM except one off the datastore in question, I could not reproduce the issue anymore, that leaves me thinking it's something related to queue depths.
Your posts degrade the quality of the forum, if I wanted generic instructions totally unrelated to my issue I'd speak with VMware support.
Deleted
Yes that is very strange indeed that an SvMotion back and forth resolves this issue... I have not come across this ever.
wondering if this has ever gotten a full solution and root cause. I have the same thing however it is only 1 VM, all VMs use the same datastore but just the one hangs, and it has 8GB of RAM assigned but took over 16 hours to power on.