You know, I have used a lot of different software products over my 20 year career in IT. Some good, some not so good, and I am sure that you all have been there. I have always found that what makes a good product is the support and staff that you have behind it. I have used VMware since the early 2.5 days and have never looked back. If you have read all of my blog posts then you know that I have been at 4 different financial institutions in the past year after being with the first for over 18 years. One common theme they have all had was that they were a VMware shop (with the exception of Colonial Bank - I was trying to get them there). Also, I have been a fan of Vizioncore for some time. I have had success with vRanger, was an early beta tester of vReplicator, and my latest deployment from them was vFoglight.
Our System Administrator already had tickets open with Vizioncore, VMware, and DataDomain and was not getting anywhere, but he was not taking an active approach to solving this problem. I dug deep and saw NFS errors and warning in the vmkernel logs. Finally a breakthrough - perhaps. Our Cisco Engineer noticed something that he didn't see before. He could see a Pause Request coming from the interface connected to the DataDomain appliance. Were we perhaps sending too much data at one time? Perhaps, but we had seen failures even if only one backup job was running. I wondered if it was just that the NFS mounts were timing out when the Pause Request occurred. It sounded logical, and it matched up with the warnings in the vmkernel logs.
At the same time I have been anticipating the release of vRanger 4.0 DPP - totally new architecture and some features that may help us out, like the ability to restart a failed job. So I was reading Jason Mattox's blog about the upcoming features and I made a comment to him. I also had typed up from start to finish everything we had done and seen, along with what I thought could be the issue. I posted this out to several places, including Vizioncore's forums. Here is the link to that - http://supportforums.vizioncore.com/forums/thread/12245.aspx
Jason responded by having us try some NFS settings in the Advanced Settings of the ESX host. At this point we were ready to try anything. Well after making those changes we have had 4 nights in a row of 100% successful backups. Now, I am not holding my breath, I would like to see about a month before I call this the fix, but it sure looks good for now. Also, we are not seeing the warnings in the logs any longer. That has to be a good sign, so could this be it?? Stay tuned.