VDP Appliance corrupted.. again!

cdomanski · ‎02-28-2016

This has happened to me a couple of times now, so I'm not sure if it's me or what?

I'm running the latest version of vDP appliance 6.1 and backups have been running along great for a couple of months. All of a sudden one morning I get to the office, and try to login to the appliance through vCenter Web Client, and its not available. Logging into the vdp-configure page of the appliance comes up with this message:

The VDP Appliance has experienced an unsystematic shutdown and will likely require a checkpoint rollback to restore data protection functionality. Initiate this process from the Rollback tab.

No services are able to start, and there are no checkpoint rollback points to go back to!

We have decided to invest in vDP and SRM as a backup/DR strategy, but we can't keep starting from scratch! Is this product really this unstable??

Help please!

MattiasN81 · ‎02-29-2016

I ve seen this a couple of times and every time i needed to redelopy the appliance.

But the culprit has always been the target storage and not VDP, i suggest you take a deepdive into you backup storage.

if you are using iSCSI take a look in the hosts vmkernel log for iSCSI deterioration entrys.

This are the things i have discorvered making VDP applainces unusable.

* Missconfigured iSCSI networks and high latency

* Port faults in FC switches and on storage arrays

* Storage controller reboots (depends on storage array) seen this with HP MSA and Whitebox solutions but never with a more fault tolerant array.

* Using deduplication on storage level

VMware Certified Professional 6 - DCV VMware VTSP Software Defined Storage Dell Blade Server Solutions - EMEA Certified Dell PowerEdge Server Solutions - EMEA Certfied Dell Certified Storage Deployment Professional Dell EMC Proven Professional If you found my answers useful please consider marking them as Helpful or Correct

cdomanski · ‎03-01-2016

Thanks for taking the time to give me those suggestions, I appreciate it.

I will dig a little deeper into my storage, and see if I can find anything, and post back here.

ctmsohio · ‎10-22-2016

I've had this happen a few times as well. The thing that typically works for me is to edit the VDP appliance VM settings make sure to remove any hot-added virtual disks that are not associated with the XFS filesystems. You should be able to see them by datastore location. Once you remove these umount all filesystems run a full xfs repair e.g.

for i in `echo {b..z}`; do umount /dev/sd${i}1; done

mount

#verify no xfs volumes mounted

for i in `echo {b..z}`; do xfs_repair -n /dev/sd${i}1; done

Make sure those all complete cleanly.

Reboot and it should recovery. It may perform a gsan rollback. Check top or ps aux | grep san. You can tail the log in e.g. tail -F /tmp/dpnctl-gsan-restart-output-6500 ( you'll see this from the previous ps aux )

Wait till that completes and do status.dpn or dpnctl status all. You should get everything back up and running. I think this happens from HA events in our cluster. Sometimes a host goes down and the VDP appliance gets HA'd to another host and the hot adds are stuck on the VM and it doesn't recover cleanly. Definitely not VDPs fault but it sucks when it happens.