i installed im my lab with a vSphere 7.0.1 enviroment the NetApp VSC 9.7.1 wich brings the VASA provider for VVOLs to thest them on my NetApp Storage connected via NFS.
Installation works without issues and even the migration from the NFS 4.1 datastores to the VVOL datastore.
Then i noticed a problem with the VCSA: VCSA on a VVOL? Is this supported?
After a little bit more testing i noticed that this is a general vMotion problem when the VM is on a VVOL datastore. When the same VM is on a FS 4.1 datastore i have no issue.
I opened a case first at NetApp, but they didin't found something in the logs and told me to open a case at vmware too. There i opened one week ago a case and provided all logs. But they took a very long time to decide which department should work on the case. Yesterday i talked first time with the support and today we finished the tests to be clear, yes the problem is only during a normal vMotion and only when the VM is on a VVOL. So he will forward the case to the storage guys.
While i'm waiting for the support, i think i write down here my issue perhaps someone esle had a similar problem...
vMotion from one "host" to a other "host" stuck at 85% and then VM freezes for about 30 sec, then the vMotion continues and finishes. Then the VM is running again.
Here i have some log parts:
Log of the VM from source-host:
2020-10-14T15:52:14.200Z| vmx| W003: VMX has left the building: 0.
VMKernel from source-host:
2020-10-14T15:52:14.251Z cpu4:2105329)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_VMDK) VVol device rfc4122.80207299-548e-459c-bc0c-4d45318cfae2
2020-10-14T15:52:14.332Z cpu18:2099869)VVol: VVolRemoveDev:7163: Unlinking (VVOL_OBJTYPE_CONFIG) VVol device rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0
The VM has left the host at 17:52:14, so it must be started in the same sec on the destination...
Log of the VM from the destination-host:
2020-10-14T15:52:14.190Z| vcpu-0| I005: Transitioned vmx/execState/val to poweredOn
2020-10-14T15:52:14.191Z| vcpu-0| I005: MigrateSetState: Transitioning from state 12 to 0.
2020-10-14T15:52:54.205Z| vmx| I005: DiskUpgradeMultiwriter: Upgraded open disk 'scsi0:0' from multiwriter.
Here is a large gap between the sec 14 and 54 in the log, there is no message.
VMKernel from the destination-host:
2020-10-14T15:52:12.956Z cpu3:2103898)VVol: VVolMakeDev:6740: Creating a device for rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0 (Type VVOL_OBJTYPE_CONFIG)
2020-10-14T15:52:13.264Z cpu16:2103911)VVol: VVolMakeDev:6740: Creating a device for rfc4122.80207299-548e-459c-bc0c-4d45318cfae2 (Type VVOL_OBJTYPE_UNKNOWN)
2020-10-14T15:52:14.190Z cpu25:2103920)Hbr: 3731: Migration end received (worldID=2103906) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
2020-10-14T15:52:14.191Z cpu8:2103915)VMotion: 3230: 8288837917254555216 😧 VMotion bandwidth in last 1s: 27 MB/s,
2020-10-14T15:52:14.194Z cpu3:2103923)Swap: vm 2103906: 5135: Finish swapping in migration swap file. (faulted 0 pages). Success.
2020-10-14T15:52:44.200Z cpu25:2103905)NFSLock: 3302: lock .lck-1c7bdce900000000 expired: counter prev 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 : curr 584 3fc5805f-1e9c2009-3763-ac1f6bc58788 (loop count 3)
This message i'm wondering about...
Hostd from the destination-host:
2020-10-14T15:52:13.138Z verbose hostd [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] VMotion destination started; powering on
2020-10-14T15:52:13.213Z info hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] VigorMigrateNotifyCb:: hostlog state changed from emigrating to none
2020-10-14T15:52:54.219Z verbose hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] VMotionStatusCb : Succeeded
2020-10-14T15:52:54.219Z verbose hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] VMotionStatusCb: Firing ResolveCb
2020-10-14T15:52:54.219Z info hostd [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: VMX reports needsUnregister = false for migrateType MIGRATE_TYPE_VMOTION
2020-10-14T15:52:54.219Z info hostd [Originator@6876 sub=Vcsvc.VMotionDst.8288837917254555216] ResolveCb: Succeeded
2020-10-14T15:52:54.220Z info hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] Disk access enabled.
2020-10-14T15:52:54.221Z info hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] State Transition (VM_STATE_IMMIGRATING -> VM_STATE_ON)
2020-10-14T15:52:54.225Z info hostd [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vvol:fb1e3913ec4448e4-bf4e00000098990c/rfc4122.1edaed3d-4db9-44d6-a945-79567334ffa0/srv15 - Web-Server.vmx] Send config update invoked
Here the same gap. Here i'm wondering abut the message „Disk access enabled“ in the sec 54, why so late?
The main question, what happens between the sec 14 und 54 and how to fix that?
Still no solution...
NetApp has closed the case some weeks ago, because VMware is to slow, they don't want to wait anny longer. And Now VMware has closed that case too, because they don't found something in my logs and to escalate this case i must first bring the versions i used to the latest versions that are listed in the VMware HCL:
O.k. here has NetApp missed to refresh the HCL infos with the infos of their IMT to VMware, they will do that now, i hope quickly that i can reopen that case...
A short update, the VSC 9.7.1p1 has now support for vSphere 7.0.1, regarding the NetApp IMT, but in the VMware HCL the infos are still missing...
And i have still the problem, when a VM is on a VVOL datastore the VM freezes during vMotion for 30 to 40 sec.
So no change and no solution regarding the vMotion problem...
With these types of situations unfortunately only support can really help you. You would need indeed to have all solutions on a supported version, and then file an SR. I doubt there's anything anyone here can help with considering the level at which your problem occurs.
Only NetApp can help, they must first send updates to VMware that the VMware HCL shows the same like the NetApp IMT.
Or someone tells the support to ignore the VMware HCL and trust the NetApp IMT.
This was the SR: 20164311910
But was closed by VMware in November...
I got a notice in the NetApp community that this is caused by the bug 2668244 and a fix should be targeted for ESXi 7.0 U2. There is a problem with NFS file locking that slows down the migration of VMs on VVOL datastores.
So i tested the december beta:
host1 = ESXi 7.0.1 (U1c)
host2 = ESXi 7.0.2 (december beta)
When i'm doing a vmotion from host1 to host2, the vmotion takes only 5 sec.
But when going back from host2 to host1 then it takes 40sec and the VM freezes the most time.
So i was hitting the bug and the bug is fixed in the next update version...
Somehow my reply wasn't posted last week. I check the PR and it is indeed supposed to be included in U2. I also see that a hotpatch was requested by another customer for this problem. I don't know what the requirements are for being able to request a hotpatch, but it may be something you could do as well...