First of all - what is your current state ?
Do you need to fix any damaged VMs first ?
For the moment stop all Veeam-jobs for misbehaving VMs.
Make sure that the Veeam-VM unmounts all disks it may still have in use. CHECK THIS !!! - do not skip this step.
Provide vmware.logs of the affected VMs , find Veeam logs for those VMs.
Tell us what you need first - so that we can tell you what else we need to know.
At this moment all the servers are up and stable. Veeam jobs have all been disabled for now and any snapshots have been manually consolidated. I will pull logs now and get them posted
I found hints that automatic backuptools may have issues with vmdks in mixed modes.
This may have the unexpected result that the independant flag may be ignored and the time required to remove snapshots after the backup will be much longer than expected.
That would explain some of your issues - however your VMware logs do not cover any snapshot automation.
Can you please check wether you have vmware.logs that were active while Veeam was running a backup job against the VM ?
While searching for the message "
Please read that and compare your symptoms.
A vmkernel.log would be helpful as well.
Let me look for some more logs and one more note. The servers in question were on my other hosts for years and never had this issue. I migrated them over to this new server because it's latest gen and much faster than the other ones. Since that migration to this brand new server we are seeing these replication shutdowns. All disks were always dependent. I only switched them as a test thinking maybe the SQL data drive and veeam didn't play well but then it happened to the mail server so my theory went out the door then.
The mailsvr log in the zip should be the one according to the modified date that would reflect this error on the replication job in veeam
5/14/2019 12:20:51 PM :: Removing VM snapshot... (0% done) Details: The operation cannot be allowed at the current time because the virtual machine has a question pending:
'msg.hbacommon.corruptredo:The redo log of 'jmfxexch2016-000001.vmdk' is corrupted. If the problem persists, discard the redo log.
This is the same error I've been seeing on each of my servers on this new host with the exception of my small eset virtual appliance
Desktop.zip 336.2 K