Storage Vmotion during sustained high I/O advisabl...

taylorb · ‎04-04-2011

We have a huge batch process that is running. This generates hundreds of thousands of individual documents, takes 5-6 days to finish and is pretty IO intensive. Unfortunately it is saturating the iSCSI SATA Array it is currently running on and I am getting disk latency in the seconds (instead of ms) at times. I want to move the VM running this process to a FC SAN array in hopes that it will alleviate the issue, but the job cannot be stopped at this point without making a mess. I am concerned that the additional overhead of a storage Vmotion will cause potential problems, so I was wondering if anyone has had any experience migrating storage live under an I/O bound situation.

Thanks for any feedback.

Troy_Clavell · ‎04-04-2011

that is a tough one... What I may be inclined to do is move other VM's off that LUN until your batch processing is complete. Once complete move the other VM's back and the high I/O VM to FC Storage. This may be a less disruptive process.

taylorb · ‎04-04-2011

I had thought about that, but won't that create just as much additional overhead? It's still reading from the same LUN to move the other VMs. The VM running the batch is only 225GB, and there are 700GB worth of other VMs on that LUN, so that is going to be a lot of data to move.

Also, I don't know if it makes a huge difference, but the IO is lots of tiny writes, the week-long process makes hundreds of thousands of folders each with 2 or 3 small files in them, but only about 100GB total data.

Troy_Clavell · ‎04-04-2011

if latency is, as you stated, already in the seconds range, what more negative impact can you do? I'm surprised the guests are even functional at this point. At this point, it may be worth either waiting it out, or canceling the batch process. How are the other guest doing? I would think, not so good.

taylorb · ‎04-04-2011

From Vmware's point of view, the latency is in the mid 100ms range, but from the windows VM running the batch, it was measuring seconds earliier, but not too bad right now . Performance is off and on, depending on what the other VMs want to do.

I'm just going to try moving it. I have a feeling things are going get really bad tommorow morning when the users all log in at the same time again, as a lot of user data is on the ISCSI box.

taylorb · ‎04-05-2011

Well the storage vmotion actually went very well without making things noticeably worse and the batch job speed kicked up 2-3x faster once it was finished and on the nice almost empty FC SAN array. On this job that takes a week, that should drop it down to 2-3 days Another job well done by Vmware, but of course I will take the credit! :smileygrin:

Thanks for your replies, Troy.

All

Storage Vmotion during sustained high I/O advisable?