VMware Cloud Community
godbucket
Enthusiast
Enthusiast

Reclaim storage space on SAN caused by virtual disk migrations

I had a serious problem last year: my Compellent SAN's were running out of free space, and FAST. Turns out, this happens to lots of storage vendors and is caused by virtual disk (vmdk) migrations. Anytime a virtual disk moves, the SAN sees it as new writes and doesn't relinquish the original space. So if you do lots of SVMotion jobs or disk-based backups that create snapshots then delete them, this could be a serious problem!

So, here is the extremely condensed version of what I did to resolve the issue and ultimately reclaim 30TB, yes, THIRTY TERABYTES on my production arrays.

I ended up installing vCenter Orchestrator and using scheduled workflows to automate the process.

(I owe credit for this idea to my good friend Sean Howard at VMware)

There's a few catches though, few things you have to do before creating workflows in Orchestrator:

- you have to enable SSH on a host (or a few; depending on how many hosts you want running the workflows)

- you have to/should suppress the SSH security warnings on those hosts

- you have to enable SSH password authentication on those same hosts

- you have to enable use of the UNMAP commands for space reclamation (VAAI primitives)

- then you can issue the UNMAP comands against the datastores/LUN's

Enable SSH

SSH is required for command-line access to ESXi hosts.

  • Using the vSphere Client, select the ESXi host, then the “Configuration” tab, then under “Software” click “Security Profile”.
  • In the top-right corner, click “Properties”, then select “SSH”, then click “Options”
  • From here you can start or stop the SSH service and set the startup type to manual or automatic.

Disable SSH security warnings

If SSH is enabled, it will generate warnings on the summary tab of each host. These warnings can be suppressed.

  • Using the vSphere Client, select the ESXi host, then the “Configuration” tab, then under “Software” click “Advanced Settings”.
  • Locate “UserVars” on the left-hand pane, then change “UserVars-SuppressShellWarning” to 1.

SSH Password Authentication

To allow for applications such as vCenter Orchestrator to have root or shell access to an ESXi host, Password Authentication must be enabled.

  • Log into the console with SSH (putty).
  • Navigate to /etc/ssh and edit sshd_config using vi
  • Type vi sshd_config and press Enter.
  • Now you are in the vi text editor. You can move around using page up, page down, and the arrow keys.
  • Enter insert mode. You can press i to start editing where the cursor is, you can also press shift+o to start editing on a new line above the cursor, or o to start editing on a new line below the cursor.
  • Change the line "PasswordAuthentication no" to "PasswordAuthentication yes".
  • To save and quit press Esc, :, w, q.
  • To quit without saving, press Esc, :, q, then press Enter.
  • Restart the SSH daemon on the ESXi host under “Configuration” tab, and “Security Profile”

Enable use of UNMAP commands for space reclamation

vSphere 5.0 introduced VAAI Thin Provisioning Block Space Reclamation (UNMAP) Primitive. This feature was designed to efficiently reclaim deleted space to meet continuing storage needs. ESXi 5.0 issues UNMAP commands for space reclamation during several operations.

  • Log into the console with SSH (putty) and issue the command below
  • esxcli system settings advanced set --int-value 1 --option /VMFS3/EnableBlockDelete
  • This is a per-host setting and must be issued on each ESXi 5.0 host in your cluster.

Issue UNMAP command against VMFS datastore (manually)

The actual space reclamation command is as follows. This writes a “balloon file” to the top blocks of the datastore to persuade the SAN into relinquishing free space

  • Log into the console with SSH (putty)
  • Change directories into the VMFS datastore you wish to reclaim space from
  • cd vmfs/volumes/datastorename (where datastorename is the actual name of the datastore)
  • Issue the UNMAP command
  • vmkfstools -y 60 (where 60 is the percentage of space the balloon file will attempt to reclaim space from)

So to automate this, what you can do is create custom scheduled workflows in vCenter Orchestrator and run something like this:

cd vmfs/volumes/LUN-NAME;vmkfstools -y 60

Where "LUN-NAME" is the VMFS datastore name.

You will see tremendous spikes in front-end I/O while this runs. Don't be suprized if you see 25,000,000 KBps. This is normal however, and its not truely using that much. The physical host is just writing the balloon file as fast as it can to the datastore, then deleting it immediately.

That's pretty much it!!! I hope this helps someone.

12 Replies
RParker
Immortal
Immortal

Good post,however you didn't consult Dell before you had the problem..

Dell compellent tool actually has a "reclaim" tool to get the space back which would have made ALL this hard work unncessary...

So it's a good idea to at least consult the techs that support the product first..

Also you said MOST storage vendors have a problem with reclaiming of space.. that's not true.. I worked on Netapp, and as many complaints that I had with Netapp, I didn't find Netapp lacking in space or configuration problems...

I don't see how you can MOVE a file to a different place on the SAN and NOT have the space reclaimed, you must be doing something weird or didn't setup  permissions correctly, which could have also been remedied by a tech support session with Dell.

I think you did all of this on your own, which is fine, good learning opportunity, but if you had invovled Dell you would have found out very quickly where your problem was.

I can guarantee it was NOT a compellent issue, this was a USER configuration issue.

0 Kudos
godbucket
Enthusiast
Enthusiast

Ugh... was only a matter of time before I encountered an abrasive know-it-all on here... GOD I hate arguing, especially in forums... anyway, here we go!...

Check the brakes buddy, I DID consult Dell/Compellent. Who do you think I worked with over the course of 2 months to accomplish this? Copilot Support Case# 133535. Look it up. Call 'em. They've written KB articles that outline what I discovered. I'm sure they'd be happy to provide you with the documents.

From Edward Sandberg, one of THE TOP senior escalation engineers at Copilot Support:


"You are correct, the ESX box does not inform us that the data has been moved. It rewrites it to the destination volume and then deletes its pointers to the data on the source volume without actually zeroing the blocks.
You have series 40s so if you are using ESX5 you could update your controllers to 6.0.5 and the new VAAI support would free space up when you do vmotions, but you'd still have to reclaim that space using UNMAP commands.
Other wise the only way to get the space back is to delete the source volume you vmotioned the datastores from."
Edward Sandberg
Enterprise Engineer, Compellent Storage
Dell | Support Services – Copilot Support

They even thanked me for my discoveries. From Wade Stahlberg, in DEVELOPER support:

"Thanks so much for the info you sent in. I have been spreading the word with other Copilots. I don't have an ETA but we are planning changes in our documentation as well."

Wade Stahlberg | Copilot Solutions Engineer
Office 952.562.3021|
wade.stahlberg@compellent.com
Copilot Support | 866 EZSTORE | support@compellent.com

NOT TO MENTION, the nifty little "tool" you're refering to is called the "Windows Server Agent" and GUESS WHAT?!?! It only runs on WINDOWS! VMware ESXi however, IS NOT Windows! Smiley Happy If you've worked with VMware, you'd know this.

So you see, by throwing out unsolicited advice and opinions and NOT really knowing all the facts that surrounds a topic or issue, you're really doing more harm than good in these forums Smiley Happy

souperstar
Contributor
Contributor

Does anyone know if this is still necessary in ESXi 5.5 with Compellent Storage Center 6.3.10 and SC8000 controllers?

0 Kudos
godbucket
Enthusiast
Enthusiast

Yes, the process is still required with 6.3.10.106 on the SC8000's (that's exactly what I have), however the process has completely changed.

With vSphere 5.5, they removed all the previous UNMAP commands and replaced them with much better ones.

The new EsxCli command is:

$myEsxCli.storage.vmfs.unmap($l,"datastore-name")

So you can run it from a PowerCLI script and schedule it as a task:

$myEsxCli = Get-EsxCli

Connect-VIServer -Server servername -user username -password password

$myEsxCli.storage.vmfs.unmap($l,"datastore-name")

And if you don't want the username and password hanging out there in a text file, you can use:

Connect-VIServer "Server" -User user -Password pass -SaveCredentials

This will of course generate a ton of IO against each datastore/LUN it runs against, so may be best to schedule after-hours. The next day, after the LUN Replays expire, you'll see your space back.

Hope this helps.

0 Kudos
souperstar
Contributor
Contributor

Thanks very much for the reply.  I am going to look into this, and am dangerous enough to know how to do this in powershell.  If I make a working script, which I assume I'll be doing so that I can schedule it to run somewhat frequently, I'll be sure to post it here.

You've already done 95+% of the heavy lifting.  Thanks so much!  I can't believe there aren't more posts in this topic, surely people migrate VM's on occasion?

0 Kudos
Kahonu84
Hot Shot
Hot Shot

Must be you're lucky day.. I haven't seen Mr. Parker in quite awhile.. He must have ran out of re-fills.

godbucket
Enthusiast
Enthusiast

Yeah, so here's what I have in-place and working. Hope it works for you.

Pick the ESXi hosts you want to run the reclamation processes from. Ideally, you'll want to spread the load across multiple hosts so they can run the jobs against multiple LUN's. May be more efficient that way. Or you can run all the jobs from one host, its up to you.

Once you decide on the host(s), connect to the console/desktop of your vSphere server running vCenter, pull up a PowerCLI prompt, and save the credentials for the ESXi hosts you've selected into the credential store. This way you wont have the username/passwords saved in text files. Do this for each host you want to run the jobs from:

Connect-VIServer esx01.companyname.com -User username -Password password -SaveCredentials

Then create a directory on the vCenter Server to host the space reclamation scripts themselves. Each of these '.ps1' scripts will look like this:

Connect-VIServer esx01.companyname.com

$myEsxCli = Get-EsxCli

$myEsxCli.storage.vmfs.unmap($l,"LUN01")

$myEsxCli.storage.vmfs.unmap($l,"LUN02")

$myEsxCli.storage.vmfs.unmap($l,"LUN03")

$myEsxCli.storage.vmfs.unmap($l,"LUN04")

$myEsxCli.storage.vmfs.unmap($l,"LUN05")

Then open the Windows Task Scheduler and create the scheduled tasks to run the scripts. Each scheduled task should call the PowerShell command and the appropriate 'arguments':

C:\WINDOWS\system32\windowspowershell\v1.0\powershell.exe

-PSConsoleFile "C:\Program Files (x86)\VMware\Infrastructure\vSphere PowerCLI\vim.psc1" " &  "C:\Space-reclamation-scripts\LUN01-05.ps1""

Again, this works for me, and may not work for some. If it seems ghetto, I apologize in advance. I'm not much of a scripting guy. I just do what I can to get by.

Hope this helps you. Good luck!

0 Kudos
souperstar
Contributor
Contributor

Great feedback, thanks godbucket!  I'm no powershell guru, but I find myself using it somewhat frequently because I won't do these tasks manually.  Who has the time?  I don't get fancy with passing parameters or writing too many functions when a simple loop will do.  Others may need to comprehend my scripts some day, so I try to write them pretty plain with a ton of comments, because that someone may be me in two years and I won't recall a darn thing then.  Plus, I'm a network guy not blessed with innate programming ability.

This script should grab every Datastore presented to the ESXi host you specify and run the unmap command against a single datastore, then wait 5 minutes before running unmap on the next datastore.  If you have more than my 10 datastores, you might want to decrease the time in between.

##################################################

#Chris Redel | MOSERS | 02/11/2014

Clear-Host

#Notes:

#This script should grab all datastores presented to a specific ESXi host, and run the unmap

#    command on them sequentially with 5 minute breaks in between.

#

#Tested with PowerCLI 5.5

#KB Article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=205751...

#Forum Post: https://communities.vmware.com/thread/436628

#PowerCLI Reference: https://www.vmware.com/support/developer/PowerCLI/index.html

#

## STUFF YOU MAY EDIT ############################

$pServer = "esx1"                            #An ESXi servername or IP address in your environment

# You can use -SaveCredentials first to avoid having user and password info in this script, read below

# Connect-VIServer "Server" -User user -Password pass -SaveCredentials

$pUser = "username"                            #user with sufficient privileges on $pServer, use "root" if you want

$pPwd = "password"                            #password for $pUser

$pSleep = "300"                                #time to sleep, in seconds, between unmap commands on Datastores. Default "300" (5 minutes)

$pCertAction = "Ignore"                        #Action to take with Set-PowerCLIConfiguration -InvalidCertificateAction, default "Ignore"

$pSnapInName = "VMware.VimAutomation.Core"    #Name of PowerCLI snap-in, default "VMware.VimAutomation.Core"

##################################################

## DO NOT EDIT BELOW THIS LINE ###################

#Check PowerCLI snap-in, load if not already loaded

If ((Get-PSSnapin -Name $pSnapInName -ErrorAction SilentlyContinue) -eq $null ) {

    Add-PSSnapin $pSnapInName

}

#Set PowerCLI to ignore self-signed or invalid certificates

Set-PowerCLIConfiguration -InvalidCertificateAction $pCertAction

#Connect to an ESXi host

Connect-VIServer -Server $pServer -user $pUser -password $pPwd

#Store Get-EsxCli output

$pEsxCli = Get-EsxCli

#Get all datastore names

$pDSs = Get-Datastore #-Name "DSiso's"        #use -Name to specify a specific datastore, great for testing

#Loop all the datastores, run unmap on each, sleep between runs

ForEach ($pDS in $pDSs) {

    $pEsxCli.storage.vmfs.unmap($l,$pDS)        #Run unmap command

    Start-Sleep -s $pSleep                        #Sleep before next run

    }

I ran it against a single datastore last night, with replays and Data Progression running later on.  When I checked this morning I'd saved ~3TB on my first tier (RAID 10) storage.  Can't wait to schedule this and see what happens...

godbucket
Enthusiast
Enthusiast

Ha! Great script! Like I said, I suck at scripting, and you're obviously worlds beyond me. This looks great, thanks for sharing!

One noob question: what is "DSiso's"  

I'll probably trash what I have setup and use this. Thanks again!

0 Kudos
souperstar
Contributor
Contributor

Haha, that was a test datastore name.  I forgot to take it out and leave the "-Name" part.  I knew I would do something like that.  The script is a lot more readable if you put it into a powershell editor, or download the .ps1 zip file, it would make things much clearer for anyone who is looking at this.

dwagner_xpert_t
Contributor
Contributor

Thank you for this!

I know this is a bit old. But do you happen to know if this is still needed in 6.6.5 and also if the commands are the same for ESX 6.x?

0 Kudos
Viewaskew
Enthusiast
Enthusiast

Great thread. Prompted me to look up the Whats New in 6.5 and missed this little gem when it was first released.

From the Whats New PDF for 6.5   https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/whitepaper/vsphere/vmw-white-paper...

Automated UNMAP

UNMAP is a VMware vSphere Storage APIs – Array Integration primitive that enables reclamation of dead or stranded space on thinly provisioned VMFS volumes. In vSphere 6.0, this can be initiated by running a simple ESXCLI command that can free up deleted blocks from storage. vSphere 6.5 automates the UNMAP process by which VMFS tracks the deleted blocks and reclaims deleted space from the backend array in background. This background operation ensures a minimal storage I/O impact due to UNMAP operations. UNMAP works at a guest OS level with newer versions of Windows and Linux.

0 Kudos