Arkady
Contributor
Contributor

De-duplication in vSphere

Hello,

We are exploring an option to use "in-place" de-duplication for virtual Windows and Linux servers.

Would like to know pros and cons from peole who use this techology.

My concern is performance. Also concern is the following potential situation.

For example we use dde-duplication on 100 Windows servers. Let's take one OS related file for example meaning we have 1 real file and 99 markers.

If this file gets corrupted or infected by virus, assume we have a situation when 100 servers gets potentially corrupted or infected, which leads to the disaster. is this a true statement?

Please share your opnion.

Thnak you in advance!

0 Kudos
6 Replies
weinstein5
Immortal
Immortal

What product or method are you looking at?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Arkady
Contributor
Contributor

NetApp or EMC . Both vendors offer de-duplication option for NFS

0 Kudos
a_p_
Leadership
Leadership

I would also be interested in knowing which product you are talking about. And what do you mean by "in-place" de-duplication?

Most dedup system do a delayed de-duplication to provide sufficient write speed. I'm aware of only a few dedup systems which do online de-duplication (like Datadomain), however these are usually used for backup and it's not recommended to use them for virtualization.

If this file gets corrupted or infected by virus, assume we have a  situation when 100 servers gets potentially corrupted or infected, which  leads to the disaster. is this a true statement?

No! De-duplication just saves storage space by saving same blocks of data only once, but transparently presents the data to each system. If one of the 100 systems modifies a file, this will not affect the other systems.

André

PS: Saw your latest reply after posting the message. So ignore my question about the products!

Arkady
Contributor
Contributor

NetApp use de-duplication in FAS storage system, and EMC use in VNX device

In-place meaning inline deduplication, removal of redundancies from data before the backup

0 Kudos
mcowger
Immortal
Immortal

Theres always a performance hit to this, regardless of the technology used (FAS, VNX, etc).  The question is whether it actually gains you more than it costs you.  In my experience, in systems that have been around for a while (e.g. aren't a ton of nearly identical systems just built), that dedupe gets you very little (sub 20%) compared to the performance cost.  Now, if your VMs dont do much work, then that might be fine compared to a minimal performance hit.  However, if you have medium-heavy VMs, that performance hit might not be worth it.

(disclaimer: I work for EMC, the maker of VNX).

--Matt VCDX #52 blog.cowger.us
0 Kudos
EdWilts
Expert
Expert

For example we use dde-duplication on 100 Windows servers. Let's take one OS related file for example meaning we have 1 real file and 99 markers.

If this file gets corrupted or infected by virus, assume we have a situation when 100 servers gets potentially corrupted or infected, which leads to the disaster. is this a true statement?

This is in general a false statement.  De-duplication is typically on a block level.  If one server changes that block, it will get a unique copy of that data.  The other 99 servers will see the original de-duped data block.

.../Ed (VCP4, VCP5)