VMware

Jamie Orth's Blog

VMware related blog of Jamie Orth. Systems Engineer, Bartow FL

Previous Next
0

Could this be it??

Posted by jamieorth May 9, 2009


You know, I have used a lot of different software products over my 20 year career in IT. Some good, some not so good, and I am sure that you all have been there. I have always found that what makes a good product is the support and staff that you have behind it. I have used VMware since the early 2.5 days and have never looked back. If you have read all of my blog posts then you know that I have been at 4 different financial institutions in the past year after being with the first for over 18 years. One common theme they have all had was that they were a VMware shop (with the exception of Colonial Bank - I was trying to get them there). Also, I have been a fan of Vizioncore for some time. I have had success with vRanger, was an early beta tester of vReplicator, and my latest deployment from them was vFoglight.

Now, when I arrived at Publix Credit Union they were having a horrible time with vRanger - close to a 50% failure rate. Now, since my past experience tells me if the infrastructure is sound then the product just works. So I checked everything, and then checked it again. We changed things to isolate different parts of the infrastructure. Some of the changes made the issue less prevelant, however I was not satisfied till we start seeing 100% success each and every night.

Our System Administrator already had tickets open with Vizioncore, VMware, and DataDomain and was not getting anywhere, but he was not taking an active approach to solving this problem. I dug deep and saw NFS errors and warning in the vmkernel logs. Finally a breakthrough - perhaps. Our Cisco Engineer noticed something that he didn't see before. He could see a Pause Request coming from the interface connected to the DataDomain appliance. Were we perhaps sending too much data at one time? Perhaps, but we had seen failures even if only one backup job was running. I wondered if it was just that the NFS mounts were timing out when the Pause Request occurred. It sounded logical, and it matched up with the warnings in the vmkernel logs.

At the same time I have been anticipating the release of vRanger 4.0 DPP - totally new architecture and some features that may help us out, like the ability to restart a failed job. So I was reading Jason Mattox's blog about the upcoming features and I made a comment to him. I also had typed up from start to finish everything we had done and seen, along with what I thought could be the issue. I posted this out to several places, including Vizioncore's forums. Here is the link to that - http://supportforums.vizioncore.com/forums/thread/12245.aspx

Jason responded by having us try some NFS settings in the Advanced Settings of the ESX host. At this point we were ready to try anything. Well after making those changes we have had 4 nights in a row of 100% successful backups. Now, I am not holding my breath, I would like to see about a month before I call this the fix, but it sure looks good for now. Also, we are not seeing the warnings in the logs any longer. That has to be a good sign, so could this be it?? Stay tuned.



Add a comment Leave a comment on this blog post.

There are no comments on this post

Click to view jamieorth's profile Member since: Apr 28, 2005

VMware related blog of Jamie Orth. Systems Engineer, Bartow FL

View jamieorth's profile

Communities