VMware Cloud Community
Viktor_2018
Enthusiast
Enthusiast

Snapshots on storage

Good day!

There is a storage DELL SC3020. A LUN was created on it to store virtual machines. The storage made the snapshots, after which the LUN was full and the virtual machine stopped with the error: "msg.hbacommon.outofspace: "/ vmfs / volumes / 5c4b18ea-39e14e84-39e7-f4e9d4cf4810 / AD / AD -000002.vmdk". Click Cancel to terminate this session " . VCENTER did not allow to increase LUN, it was necessary through ESXI. Is there any idea to use snapshots on the DELL SC3020 device?

34 Replies
Viktor_2018
Enthusiast
Enthusiast

pastedImage_0.png

Reply
0 Kudos
StephenMoll
Expert
Expert

The snapshots on the Dell SC are used as mechanism to allow data progress from being stored at RAID-10 to RAID-5 or RAID-6 (depending on whether you are using single or dual disk redundancy). If you are not using Dell SC snapshots to restore your systems to a previous point in time, you could consider creating your own storage profile, so you only have one snapshot in existence at any one time. If your application shows no noticeable benefit to having tier-1 RAID-10, you might also consider using a profile that writes directly to RAID-5/6.

Your screen shot is not the whole picture. It shows that your LUN (700GB) has very little space used (only about 500MB). This suggests that the problem is not with your LUN, but the underlying array. You have to remember the Dell SC is ALWAYS thin provisioned, so it is very easy to define LUNS (Dell SC Volumes) that add up to more than the amount of physical storage available.

Take a look at the "Disks" section of the "Storage" tab in the DSM, I suspect you'll see the disks are nearly full. You should be careful that the SAN doesn't go into emergency mode, as this will impact all other LUNs on the SAN and prevent hosts and VMs writing new data t them too.

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Thank you so much!
And what is the advantage if the data goes from Raid 10 to Raid5? Since I use reservations on a single disk.
I took that screenshot for example. But in this screenshot Snapshot takes 700GB. Though I already transferred the virtual machine from this LUN.
Storage 3.jpg
As I understand it, the discs are 50% full. But on the tab Storage Types is filled to 90.79% I do not understand why.

Storage.jpgStorage2.jpg


Do you suggest not making snapshots but using another backup tool?
Did you manage to restore the snapshot virtual machine?
Tell me how to remove snapshot correctly?

Storage 4.jpg

Reply
0 Kudos
IRIX201110141
Champion
Champion

And what is the advantage if the data goes from Raid 10 to Raid5?

Because "old" data takes less space on a RAID5 compared to RAID10. 

Do you suggest not making snapshots but using another backup tool?

A storage snapshot is not a backup. It can be part of the your DR strategy of course or your Backupproduct can read the data from the san snapshot for performance reasons.

The SC needs a Snapthot because

- For Dataprogression(only when have 2 or more tiers) and Autotiering OnDemand. Otherwise it can takes up to 12 days before changing Tiers

- You wanna be prepare for a mass restore. Think about whats happend if a crypto trojaner comes into your environment you need to restore a huge VM

- Some ppl use SAN snapshot during the day and VM Backup in the night and place Data on a different Storage.

If we deliver and setup a SC for a customer

- During the VM migration we create a custom StorageProfile or using Import to lowest Tier to be sure that our mass Data goes into a RAID5/6

- We enable the Snapshots AFTER the migration

The SC "allocate" space because the system needs to be prepared that maybe data needs to be writen. I have seen sometimes that the SC allocates on the wrong place... mostlikely on Tier1 RAID10 because you have import huge amount of data trough your Tier1. Dell support can fix that.

About the % allocated for the Disk.... i think its a HighWatermark and as long you dont run the space reclamation/scsi unmap in an ESX/Datastore this values never decreased again. There is no Blockstorage which nows which data are deleted by a Host because the Host neededs to tell this to the storage.

Btw: I can see a yellow flag on your SC. There must be a notice or maybe you havent change the state from setup/maintenance back up production.

Regards

Joerg

Reply
0 Kudos
StephenMoll
Expert
Expert

We've seen stuck snapshots before. The only way we've found of getting rid of it easily was to create a new Volume. Create new vSphere Datastores in the new Volume. Copy all the stuff we wanted over from the old to the new Volume(Datastore). Removing the Datastore associated with the old Volume. Deleting the old Volume. This got rid of the snapshot too.

I can highly recommend reading this, if you haven't already : Understanding RAID with Dell SC Series Storage - Dell Engineering - September 2016

Basically that snapshot is MASSIVE, far bigger than it would normally be. A good set-up is going to depend largely on what your workloads are like. We have found that frequent snapshots (every 2 hours) with short expiry times (2 hours), maintains a high rate of data progression from RAID-10 to RAID-6, whilst only keeping one snapshot at any time. This is ok for us as we do not need or intend to revert the storage to a previous point in history. So for us the snapshot mechanism is purely there to aid data progression.

Your system is a single redundant system so uses RAID-10 and RAID-5-9.

Writes will normally be done to RAID-10, which is 50% space efficient, i.e. it uses twice as much disk space to write a given amount of data.

Data progression will move cold data to a lower tier and/or RAID-5, which is 80% efficient for RAID-5-5 or 88% for RAID-5-9 (as you have it), so yes there is an advantage for data being allowed to progress from RAID-10 to RAID-5.

One key thing to remember ( I think I said this before) Dell SC storage is only ever "thin provisioned". So when you create a 1.5TB Volume with a 1.5TB VMFS datastore in it, it won't get allocated and reserved 1.5TB (or more) of actual disk space. It will only ever get the amount of disk space required to hold the actual live data in whatever RAID format it is in, plus any snapshot overheads.

Viktor_2018
Enthusiast
Enthusiast

From all the above, I realized that I want to use images on SC with a lifetime of no more than 1 day.


The SC needs a Snapthot because
- I do not know about this technology!
- get a snapshot needed?
if I get a crypto trojan, I can connect the snapshot to esxi and restore the virtual machine files. Since ESXI is connected via SAS
-the storage is one and therefore there is nowhere else to add the data.

- We enable the Snapshots AFTER the migration
And if I have created a LuN in which 5 virtual machines are stored and they migrate between ESXI

-What software product you back up virtual machines?


-Do you have all the virtual machines stored in one LUN?


Btw: I can see a yellow flag on your SC. There must be a notice or maybe you havent change the state from setup/maintenance back up production.

At first there was a red error, the connection with APC was lost.

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Thank you, I will delete this volume.

Poor English is difficult to give information.

It turns out you need to adjust the snapshot every 2 hours and what would it be deleted after every 2 hours and at the same time I will get the performance? I plan to backup virtual machines BACKUP EXE. Now I am performing this action by cloning, since the programe product has not yet been purchased.

And what software do you use for creating backup copies of virtual machines?

It turns out if I create a 1.5 TB LUN and make a VMS 1.5 TB storage, then the images will be stored in additional volume. And perhaps the size will be 2 TB. And how to create LUN for VMFS? How much LUN should VMFS 3 TB create correctly?

Reply
0 Kudos
StephenMoll
Expert
Expert

Dell support do all the calculations for us. I have asked about how the allocations to RAID-10 and RAID-5 are done, but I haven't seen an answer. I think the calculations are proprietary and not divulged.

Basic working assumptions are that 20 - 30% of actual data on a VMFS datastore and hence a SC Volume will be at RAID-10, and therefore will occupy 2x the space on the SAN disks. This is hot data.

70 - 80% will therefore be at RAID-5, for RAID-5-9 the actual data will be roughly 1.13x when on disk. This is cold data.

Then add on top snapshot overheads. I reckon a single snapshot overhead should be about 20 - 30% of your actual data occupying 1.13x space on disk. This effectively a cold copy of your hot data, at a previous point in time.

These are very rough numbers, as I say we had to ask Dell for the recommended SAN size based on expected performance numbers we gave them.

We are not using a backup solution. None is required for the workloads we support. We do have very high redundancy on storage instead. Dual Dell Compellent SANs, each running in dual redundancy, with dual controllers, must-pathing and each SAN mirroring the other, with an independant tie-breaker node.

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Then it is not clear what LUN should be created for optimal VMFS operation.

And the instructions for which you provided it say that double redundancy ate better than 900 GB disks. It turns out I need to rebuild the storage for 2 redundancy and get RAID 10DM 6-10

We just purchased this storage system. I study! 3 ESXI servers connected via SAS to this storage.

Reply
0 Kudos
IRIX201110141
Champion
Champion

We always have multible Datastore->LUN/Volume because

- Never put all Eggs into one basket Smiley Wink

- A SC Volume is always assigned to one of the two SC Controller. This controller will do all the Work for this Volume. If you only have one Volume than only 1 of 2 controller perform the work. The 2nd. will idle around until a failover will occur. If you have multible Volumes the load will spread of the 2 controllers

- You always assign a Storage profile to a Volume. Maybe you have a need for a volume which holds your SQL/Terminal Server VMs and the required a RAID10 like storage for Read and Write. So you will assign the "Performance" Storage Profile.  For other Appliations VM you can assign the "Balanced" Profile which means Raid10 for writes and than Auto tiered into RAID5.  You Archive or Low Performance VMs may can use the "Capacity" Profile which stores the data as RAID5 for Read and Write.

- Maybe you wanna enable "compression" for one Datastore/Volume

- Maybe you wann use different snapshot schedules for one Datastore/Volume

You can "convert" a Snapshot with a mouse click into a new Volume, assign it to the Hosts or Cluster, perform a Rescan on ESX level and mount this new volume (after a new signature is written) as new Datastore and you will see all VMs there. If you know how it works it take less than 3min.

Our Customer an by our self we use Veeam B&R, Quest vRanger and other Software. The Backups are always stored on a different system which means Server with local HDDs, a NAS like QNAP/Synology or a DataDomain.

If you now create a new 1.5TB Volume on the SCv, present it to the ESxi Cluster and format it with a VMFS 6.0 the SC only needs a few Megabytes for this because its Thinprovisioning

- Assing the "Capacity" Storage Profile (you can change it later to the one which fits your needs)

- Dont assign a Snapshot profile yet

- Use svMotion to migrate the VMs and please select "Thin Provision" during the process

- Watch your storage consumtion on the SC during the migration

- It only tooks only a few Clicks to expand a SC volume and ESXi Datastore if you need a larger one

If you are finished with the first Datastore

- Delete the DS

- Delete the Volume and watch if you get storage back

- Change Storage profile back to "Balanced" or what ever

- Add Snapshot profile

Our Snapshot profile is "Company Name"

- Snap at 11:45 and holding 4 Days

- Snap at 14:45 and holding 4 Days

Our workload is typicly 80% Read so the space for the changes == Snapshot size is rather low.

Regards

Joerg

Reply
0 Kudos
IRIX201110141
Champion
Champion

Victor,

right now you store your backups on the one and only SC you have right? I take notice of your "Backup" folder right now.

If you go this way...

- Assign the "Capacity" Profile to that Volume

- Dont use a Snapshot profile for that Volume

Iam pretty sure that you have specify the number of retension points within your Backup software and it makes multible incremental backups right? If you there is no sense to use snapthots on the storage as well. If you only use 1 retension point than it looks different.

You can always switch from "Redundancy" to "Double Redundancy which mean RAID10 Dual Mirror and RAID6............ BUT YOU HAVE TO WATCH YOUR STORAGE CONSUMTION. The SC will took every Block (512 bytes) and write it again... and now this write goes to a R10DM which needs 4x the capacity for a period of time.

Regards

Joerg

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Yes, right now I’m using one of the LUNs to store the Backups virtual machine. BACKUP EXE is installed in this virtual machine, which performs file storage backup.
2.jpg
Will the storage be faster and more reliable if I switch to 2 reservations? Do I have enough space now for 2 reservations?
Storage 18 disks of 1.8 TB = Raid 10 5-9 total volume 27TB
Storage of 18 disks of 1.8 TB = Raid 10 DM 6-10 total volume?
3.jpg
Do you propose to create NEW Storage Types that would work 2 controllers and distribute the virtual machines included on the other backups to one virtual machine?

1.jpg

Reply
0 Kudos
IRIX201110141
Champion
Champion

Do you propose to create NEW Storage Types that would work 2 controllers and distribute the virtual machines included on the other backups to one virtual machine?

No, because you only needs to create multible Volumes.  How many volumes in total you have already created? You have atleast 2 volumes or more right?

About Double Redundancy. Yes the Storage Vendors suggest to choose RAID6 because of the large drives today and the period of time to rebuild a large drive in case of a failure. During the rebuild there is a change of a double disk failure and when this happends you will loose data.

From the pictures i can see than you run SCOS <= 7.2.x .... i suggest that you call Dell and ask them to schedule a upgrade to > 7.3.x because this release gives up to 40% more performance an more important it introduce the feature of the  "distributed spare" instead of a singe dedicated spare. Rebuild times are dramaticly reduced!

How much space are required for all of your VMs? If there is enough space available go for DoubleRedundance if not stay at Single Redundancy

We often choose Singe for our TIER 1 (SSD) and for TIER 3(10K SAS) the Double. Because we dont like to see the waste of Storage for a RAID10 Dual Mirror (again 4x times the amount of storage is needed). But similar to to the other guy we also have often 2 SC and use sync. replication.

Normaly the question about single or double needs to be answered for ordering the system because the Dell Midrange-Sizer returns the usable capacity and the number of needed drives which needs to be specify before buying the array.

How many Backuppoints your BackupExec creates and what are your requirements of holding the savepoint on disk over the time? How likely/often you need to restore single items like Files, SQL Tables or Windows AD objects?

Regards

Joerg

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Just now I saw that creating Multiple Volumes creates one disk on Top Controller, and the other on Bottom Controller.

Only 5 Lun. 2 on Top CONTROLLER, 3 on Bottom Controller

Call DELL and schedule an update.

For security, you do not need to create Create Storage Type for Disk Folder or is it enough of one thing in common?

Backup Exe creates an incremental copy at 13:00 and 23:00. As well as a full once a week.

At the moment, backup only happens to SQL Database and File Storage. Frequent recovery is not required, only when deleting a file.

Virtual copies of machines are stored on another Lun using the Vcenter CLONE tool.

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Am I setting up snapshot correctly? create afternoon at 14:00 pm. Store 24 hours.

This LUN is intended for enabled virtual machines.

And which method is better to choose for snapshots?

pastedImage_0.png

Reply
0 Kudos
StephenMoll
Expert
Expert

I'll leave you in the hands of IRIX201110141. I think his use-case is more aligned with yours. Ours is a very strange setup and I cannot advise on anything involving Dell SupportAssist, because all our systems are isolated from external networks, and not doing backups and the like. Good luck.

Reply
0 Kudos
Viktor_2018
Enthusiast
Enthusiast

Thank you for help. You really helped. I hope in the future to contact you.

IRIX201110141
Champion
Champion

For security, you do not need to create Create Storage Type for Disk Folder or is it enough of one thing in common?

Please dont create another Storage Type.  Think about when you add 4 more enclosures with 100Disks. Now you an separate these or some of theses into new Disk folders to solve the problem when you loose one enclosure which brings down your entire system. If you have more than one Diskfolder than you can think about to create another Storage Type and assign this to the new Disk Folder.

With your 18 Disks i cant see a reason to separate these.

Does you BackupExec backup the entire VM or is it just the Agent inside and you save "just" the GuestOS files?

Virtual copies of machines are stored on another Lun using the Vcenter CLONE tool.

Storing a full VM clone as Backup on the same storage doesnt sound that "smart" for me. But iam unsure which problem you have to solve.

Yes we wanna prepared for the following

- Deleting a entire VM or parts of it

- VM comes unusable for different vSphere related things

- The entire VMFS went away

As a solution we use Volume Snapshot on the SC for this

Pro:

- Catch all VMs... also new created and now manual work is needed

- Very space efficient because than SAN snapshot it like Inkremental and SC use a tiny blocksize of 512KB

Con:

- VM State is only crash consistent because we dont have SC Replay manager licensed

- "Restore" takes a couple of minutes and needs some work of the Admin

Regards

Joerg

Reply
0 Kudos
IRIX201110141
Champion
Champion

  • Standard – When selected, takes snapshots in series for all volumes associated with the snapshot.
  • Parallel – When selected, takes snapshots simultaneously for all volumes associated with the snapshot.
  • Consistent – When selected, halts I/O and takes snapshots for all volumes associated with the snapshot.

We use "Standard" or "Consistent" but otherwise your Settings are fine and correct.

Hint: If you right click on the sc in the upper left tree you can change the "defaults" for volume creations. So you can assign your Snapshot profile, Storage Profile as default for all new created Volumes.

Regards,

Joerg

Reply
0 Kudos