Since upgrading to vCenter 5.5 we are experiencing an issue with Veeam backups where upon removal of the snapshot sometimes the disk consolidation fails for VMs on a specific host (see below)
Currently this is localised to one host and affects virtual machines running on both local storage as well as an iSCSI SAN
Currently our vCenter instance is on version 5.5.0 Build 1476327 and our ESX hosts on 5.5.0 Build 1331820 (Dell Customized)
If anyone has any insight it would be much appreciated!
Make sure current version VEEM version is compatible with vSphere 5.5, it would be more appropriate if you could raise support request with VEEM if no error/panic logs found in vSphere.
We are on the latest version of Veeam so it has comparability for vSphere 5.5
I have opened a case with Veeam but they have told me to investigate with VMware as well.
Where would be best to look for logs in vCenter/vSphere?
Let me guess ... the VM is running Win 2008 R2 or 2012 and runs a database ?
To prevent this from happening again - check VSS-functionality inside the VM. Does the VM has the latest windows updates ? In case you have not rebooted the VM in months - reboot it occasionally.
Do you have this VM in a backup job that also runs other VMs ?
Do you have enough performance to run the job outside business hours ?
Do you monitor the jobs daily so that you notice a snapshot pile up early enough ?
Do you have a valid vmsd-file ?
Can you consolidate via vmkfstools or do you get errors because of locked files ?
Do you still have to consolidate the snapshots or are you looking for the reason why this happens ?
What are the errors reported by VEEAM ?
I have about one case of this per week so I would have a bunch of further questions - for now it would help if you post a filelisting of the directory of The VM - a screenshot of WinSCP displaying full filenames would also help.
Before you ask - a screenshot of Datastorebrowser is not good enough
skype = sanbarrow
Partially true the majority are running databases of some kind, though the OS' affected by this issue are 2003, 2008 and 2012.
The VMs are not 100 up to date with updates, I will check the VSS settings, these servers do get rebooted irregularly due to uptime concerns.
We have a backup job that backs every server in one job.
We should do the servers have plenty of RAM, disk and network resources as well as the fact no one is usually in the office after business hours.
There is VMSD file present in the folder of one the affected VMs, what do you mean by valid?
I have not tried to use vmkfstools so far, we are still consolidation and are in dialogue with Veeam about why we are getting consolidate issues, as I said the only thing that has changed is vCenter.
Veeam has not been reporting errors even with the snapshot commits failing.
> Veeam has not been reporting errors even with the snapshot commits failing.
That is unexpected - you should have found errors in the Veeam reports.
A vmsd file is invalid when Snapshotmanager displays no snapshots at all - or not all of them.
If the vmx-file says
scsi0:0.filename = "name-00000*.vmdk"
you definetely use snapshots - when none are displayed in Sapshotmanager the vmsd is invalid.
Consolidation can be prevented if the directory has old vmsn-files that are no longer in use. But again this type of error would be displayed in Veeam logs.
Can you list the content of the directory of one of the problem VMs ?
Looks like the vmsd-file is invalid.
Delete the vmsd and try a consolidation again
Checking if virtual machine consolidation is required
To check if virtual machine consolidation is required:
Select a vCenter Server host or a cluster and click the Virtual Machines tab.
Right-click the menu bar for any virtual machine column and click Needs Consolidation. The Needs Consolidation column appears.
A Yes status indicates that the snapshot files for the virtual machine should be consolidated and that the virtual machine's Tasks and Events tab shows a configuration problem.
A No status indicates that the files are OK.
Consolidating snapshots for a virtual machine
The remove snapshot process can take a long time to complete if the snapshots are large.
If the consolidation process is stopped before completing, it may result in data corruption.
The virtual machine performance may be degraded while the snapshot consolidation process.
It is recommended to take the VM down before consolidating snapshot(s). This will speed the consolidation.
To consolidate snapshots:
Right-click the virtual machine and click Snapshot > Consolidate.
Check the Need Consolidation column to verify that the task succeeded.
If the task succeeded, the Configuration Issues message clears and the Needs Consolidation value is No.
There is one pattern on the subject we found so far.
Only machines having RDM LUNs show this problem.
Which machines are affected after backup varies but always machines
with RDMs. VMs only having true vmdks assigned are not affected.
This is def. broken since SP3 (BE 2010).
We had SP2 before and never any problem with that (but a lot of other problems instead :smileydevil:)
Maybe vStorage API library files have changed with that SP.
We noticed similar behavior on some sharepoint machines (without RDMs), we use Custom pre-freeze-script and post-thaw-script. After the backup job the VM's notice that consolidation is needed.
When I try to consolidate the VM an error appears that it isn't possible, if I look on the datastore in the VM folder I see a lot of disks appearing.
I found out that on this moment there still is a VMDK mounted to one Veeam proxy, when I remove the disk manually from the Veeam Proxy consolidation can be done and the message disappears. This keeps happening every day, somehow after the weekend we noticed some errors in Veeam that queiscing wasn't successful (scripts) but the jobs complete and I don't see any consolidation errors. So probably it has something to do with VSS which isn't working properly now. Might be some extra information for your research