Solved: Migrating FusionIO to vSAN in a View Environment

virtualDD · ‎01-23-2017

Hi Guys,

Recently I was called to a customer to do a vSAN migration.

They used to have 4 hosts running with local FusionIO and SAN Storage for their View Environment with Linked Clones where the VMs where provisioned on the local Fusion IO (as configured in the View Pools).

They have bough new SSDs to serve as Capacity Tier and wanted to use the FusionIOs for the Cache Tier in a All Flash configuration.

We provisioned the SSDs as per Best Practice for the array controller being used and shut down all the pools and vms running in that environment.

When we wanted to create the vSAN Datastore however the FusionIO Drive did not show up, only the new ssds showed up.

We thought at first it would have something to do with the firmware/driver combination being used but since the hosts can use FusionIO for VMware View that could not have been the issue.

My question to the community is this:

Do I have to remove all the View Pools (or at least the configuration on the pool pointing to the fusion drives) to be able to create a vSAN Datastore?

A little more background to that question: We tried to do a unmount of the fusionIO on the console and always got a error "busy" from the esxi host but everything was shut down in the environment.

virtualDD · ‎01-24-2017

So today I was at the office and had time to do some additional research. We found the issue!

ESXi hosts were installed on SD cards so they had to choose a scratch partition somehow. I don't know how they ended up choosing the fusionIO datastore as the driver for it was added later on (especially in the case of the freshly installed host from yesterday) but that was the case on 3 of 4 hosts.

On the one host we still had in maintenance today we were able to reconfigure the scratch location, reboot and then delete the fusionIO datastore.

Lessons learned: If VMware error messages tell you a filesystem/datastore is in use, it is really being used somehow.

Steps taken to solve the issue:

Find out scratch location and global log dirs using PowerCLI:

Connect-VIServer [vCenter Servername]

Get-VMHost | Get-AdvancedSetting -Name “Syslog.Global.LogDir”

Get-VMHost | Get-AdvancedSetting -Name “ScratchConfig.ConfiguredScratchLocation”

Changing Scratch Location using ssh to the host:

ls /vmfs/volumes

-> Choose a datastore to your liking

mkdir /vmfs/volumes/[DatastoreName]/.locker-hostname

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/[DatastoreName]/.locker-hostname

reboot

Now we'll do the migration on the coming friday and I don't expect any more issues.

View solution in original post

depping · ‎01-23-2017

Did you clean out the device? It needs to be empty before vSAN can claim it. Note that if you have active workloads on those you will need to move those out first. VSAN will reformat it, so that means a fresh start.

virtualDD · ‎01-24-2017

Thank you Duncan for your input. Last night we tried again. Here is what we did:

- We reconfigured the linked-clone pool to use SAN datastores and had the composer do a rebalance to move replicas to the SAN datastores (according to KB 1028754)

- We manually deleted zombie files on the FusionIO datastores so with a capacity of 1.10 TB it showed 1.09 TB free space

- We then tried to delete the datastore so vSAN could claim it in the next step. One host out of four succeeded in deleting it, three said the filesystem was damaged or busy

- After restarting the management agents on the host where the delete was successful it showed up in the vSAN wizard (partially successful)

- However on the remaining three hosts we were facing the problem of a very stubborn vmfs datastore

Here's what we tried to delete this datastore:

- We found that vmkdumps had previously been configured for that datastore (esxcli system coredump file list), we tried to remove those entries with esxcli system coredump file remove -f (path to file) --force but esxi was not able to unlink the file

- We tried to find the lock with "vmfslockfileinfo" and found out that the esxi host has a lock on the vmfs

- We tried to find the process responsible with the command "lsof" but could not find any process linking to that datastore

- We tried to use gparted and vmkfstools to delete that datastore, error: busy

- We tried to break the lock on the datastore with vmkfstools, error: invalid parameter/busy

- We rebooted one host and tried to delete right after it became available again, to no avail, same error as before

- We reinstalled a host from scratch with the HP custom image for esxi 6.0 U2 and added the driver with a VUM baseline but could still not delete that vmfs datastore running on fusionIO

Because it was already late and I ran out of ideas on how to remove that datastore we rolled everything back. I am now trying to find a way on how to delete that datastore and would be glad for any suggestions.

My next approach might be something like using a live-linux with the fusionIO driver and the try to delete that fusionIO, but I think there should be a easier way to achieve that.

One host remains in maintenance mode so any suggestions could be tried in a timely manner.

depping · ‎01-24-2017

I would suggest reaching out to support. What you tried is what I would test as well. Not sure what else you can do at this point to be honest.

virtualDD · ‎01-24-2017