I've have a strange behavior on our storage configuration and ESX 6.7U3 (Build 17713310).
We have a QNAP 1677XP 16-bay multi tier storage which is configured like the following:
We have two Dell ESX Servers (Both 6.7 Dell Image 17167734) configured identically:
In the attachment you can find an image of the setup.
The storage contains two LUNs each 22,5TB. The first LUN is in use and contains VMs. The second LUN is empty and actually for migration/updates in use.
So the problem is: When I copy some VM files or migrating from LUN0 to LUN1 or vice versa the file system on the destination gets corrupt! Every time!
I can reproduce it every time in the configuration. I deleted the datastore of LUN1 in ESX an create it again, do a filesystem check -> file system is ok -> I copy 300Gb -> file system is corrupt.
What I checked:
What I try this week:
So the problem occur when more than one path is active to the iSCSI target. I think after all may checks and tests that the hardware seems to be ok. Now I'm really perplexed and don't know what is wrong and how to solve it - maybe you has a tip for me.
the problem still exists. We did several things in the last months:
There are no errors on QNAP, on the RAIDs, on the Hard Disk (I did long term test) and not errors on the Switch (HP 6600).
I have two LUNs on the storage and therefore I can migrate the data to a LUN where the file system is ok. It runs for two weeks and then it crashes with around (the last error) 400 errors.
I know that this problem has nothing todo with the QNAP SSD-Cache (which we deactivated).
We have a second QNAP storage with 4x14TB HDDs in a RAID-5 configuration for backup our VMs. This NAS is attached to both ESX hosts via iSCSI too (only one network cable, no trunking, no multi-path aso). There exists no multi-tiering on that storage. We copy around 90TB's of data within the last weeks an we don't have any problems at the moment. Ok, this storage is only for backup and used two days a week but it is the same file system format (VMFS-6).
I can imagine that maybe the problem is the multi-tiering but I cannot say it definitely that is the root cause.
On last monday we removed our second LUN and created an NFS storage (NFS 3.0 with VAAI plugin - NFS 4.1 with VAAI plugin doesn't run successfully - the plugin version 3.2-001 seems to be buggy). I hope that this solves the problem because the file system handling will be done by QNAP directly.
I created a ticket ar VMWare and at QNAP but I think they cannot help me. In my opinion our hardware is ok (the hardware which we removed too).