VMware Cloud Community
BHagenSPI
Enthusiast
Enthusiast
Jump to solution

Storage vMotion vRDM to vSAN

I'm trying to move 2 larger (6TB) "drives" to vSAN.

I have 6 hosts with local drives; and the correct ratio of ssd to hdd. I've migrated our old 5.5 environment to this new stack, and have 2 servers left. Both of them are now VMs on a single host (we're not running DRM or HA at the moment). I have iscsi connections to an openfiler datastore, and I present the 6TB drives to the windows vms via esxi. So, no in-guest iscsi connections.

I started a storage vmotion and our environment fell to its knees. We killed a (new) hdd and kept filling up cache ssd's to the point of having congestion errors...up to the 241 range. After days of pain (the vmotion was only up to 68% complete!), the task died, and we're back to the beginning.

I have an open ticket with vmware and they are working with me on this, but my tech isn't available right now so I thought I'd ask this next question here.

As I've been poking around and researching online, I notice that everyone says (to add a vRDM) to add the iscsi software adapter "to the host". I did that; but I only have an iscsi software adapter added on ONE of those hosts.

So the question is: since vsan is across *all* 6 hosts, do I need to add an iscsi software adapter to the other 5 hosts, and add the iscsi connections to those two 6TB drives on each host? Would that increase thruput, and relieve the congestion problems we have been seeing?

1 Solution

Accepted Solutions
BHagenSPI
Enthusiast
Enthusiast
Jump to solution

I really just need to know if adding the iscsi connections to all hosts in the cluster is the way to go. I've not tried adding the connections to more than one host, and can't find any info on whether this is even doable, let alone advisable.

In the mean time, I've figured out a cure for my congestion issues and have now successfully moved one of the 6TB data sets from vRDM to vsan.

But first: A very good vmware tech explained to me that vRDMs are moved (using storage vmotion) in 256Gb chunks. 12TB / 256Gb is a *lot* of chunks. On top of that, my FTT policy is 1...so effectively I'm doubling the amount of data being moved to 24TB. 6 hosts, with 2 cache drives and 4 storage drives each just can't handle that amount of motion, even though everything is on 10GB networking. So when I started the storage vmotion, I saturated the cluster. As the SAS storage drives filled up, the smaller ssd cache drives started trying to take the load...and eventually filled up too. That made things slow. The activity exposed a flaw in one of the storage drives, and that drive "failed"...but not completely, so vsphere/esxi/vsan didn't know it was failed and kept trying to use it. Apparently this is a known issue with 6.0, with no cure, but fixed in 6.5. Too bad for me, because that failure brought my cluster to it's knees. It took many hours to figure out what happened, with users down the entire time.

The answer? Upgrade to 6.5. Um, no; I will not. Next? Move less data at a time. Case closed.

Wow...helpful.

Actually, it was, in a way. With that info, I got it figured out. Here's the solution:

I created a vsan policy of FTT=0. Then, with the iscsi still connected to only one host, I started another storage vmotion; but this time I chose the FTT=0 policy on the destination vsan, and chose only one of the 6TB datasets to move, not 2 of them. It took a long time, but the data set migrated just fine to my vsan. At that point I changed the policy on that 6TB drive to FTT=1. It took about 25 minutes and was done. Then I removed the iscsi connection from the host, and viola: my 6TB vRDM to vSAN migration is complete! (And Veeam is now happily backing it up with no windows agent.)

2 more 6TB drives to go, and I'll be done.

View solution in original post

2 Replies
TechMassey
Hot Shot
Hot Shot
Jump to solution

We really need additional details.

1. Hybrid vSAN? Cache and Data disk sizes?

2. RDM Host - Gigabit or 10G?

3. Do you have any environment metrics? Any type of resource like vsan, CPU, memory, network throughput, etc are extremely helpful.

4. Single NIC? Dual NIC? How are NICs seperated, one for vmotion, vsan, data, etc?

My educated guess is you ran into a network and/or storage IOPS crunch.


Please help out! If you find this post helpful and/or the correct answer. Mark it! It helps recgonize contributions to the VMTN community and well me too 🙂
Reply
0 Kudos
BHagenSPI
Enthusiast
Enthusiast
Jump to solution

I really just need to know if adding the iscsi connections to all hosts in the cluster is the way to go. I've not tried adding the connections to more than one host, and can't find any info on whether this is even doable, let alone advisable.

In the mean time, I've figured out a cure for my congestion issues and have now successfully moved one of the 6TB data sets from vRDM to vsan.

But first: A very good vmware tech explained to me that vRDMs are moved (using storage vmotion) in 256Gb chunks. 12TB / 256Gb is a *lot* of chunks. On top of that, my FTT policy is 1...so effectively I'm doubling the amount of data being moved to 24TB. 6 hosts, with 2 cache drives and 4 storage drives each just can't handle that amount of motion, even though everything is on 10GB networking. So when I started the storage vmotion, I saturated the cluster. As the SAS storage drives filled up, the smaller ssd cache drives started trying to take the load...and eventually filled up too. That made things slow. The activity exposed a flaw in one of the storage drives, and that drive "failed"...but not completely, so vsphere/esxi/vsan didn't know it was failed and kept trying to use it. Apparently this is a known issue with 6.0, with no cure, but fixed in 6.5. Too bad for me, because that failure brought my cluster to it's knees. It took many hours to figure out what happened, with users down the entire time.

The answer? Upgrade to 6.5. Um, no; I will not. Next? Move less data at a time. Case closed.

Wow...helpful.

Actually, it was, in a way. With that info, I got it figured out. Here's the solution:

I created a vsan policy of FTT=0. Then, with the iscsi still connected to only one host, I started another storage vmotion; but this time I chose the FTT=0 policy on the destination vsan, and chose only one of the 6TB datasets to move, not 2 of them. It took a long time, but the data set migrated just fine to my vsan. At that point I changed the policy on that 6TB drive to FTT=1. It took about 25 minutes and was done. Then I removed the iscsi connection from the host, and viola: my 6TB vRDM to vSAN migration is complete! (And Veeam is now happily backing it up with no windows agent.)

2 more 6TB drives to go, and I'll be done.