Traditionally ZFS had a problem with de-duplication at scale, as well as performance problems when you began to run out of space. Tegile forked the ZFS code when developing the product, and in doing so made some significant architectural changes to the way data is written to the file system, specifically how meta data is handled.
Tegile have a patented method of stripping metadata away from application/ user data as it is ingested into the system. The meta data is then placed on high performance SSD cache allowing it to be referenced very quickly. In addition to the meta data being housed on SSD, the de-duplication database is also housed on SSD (this would normally be on disk on a standard ZFS appliance). This means that as blocks enter the system, they can be hashed and checked against the de-dupe table at very high speed, if the block is duplicate then a meta data entry is made, and an entry in the de-dupe table, no need to write to disk.
The reason why ZFS has performance issues when you start to fill your pools is down to it not having a 'block bitmap' showing which blocks in the file system are in use, and which are free. Instead, ZFS breaks your pool down into 'metaslabs' which are used to store free/ used block information. As the file system fills up, for each write the system has to then read each and every metaslab, which takes time, and since metadata writes/ overwrites happen on average 6-10x more often than data block writes, the performance starts to drop off quite significantly, especially when using features such as de-dupe, snapshots, clones etc.
With Tegile's Intelliflash OS, this is mitigated by having all metadata housed on L2 cache (SSD), and never on disk. Allowing the system to be more heavily utilised without adversely affecting the overall performance of the array. In addition, having de-duplication and compression features enabled allows for more 'hot blocks' to be cached in either DRAM or SSD read cache, accelerating things even further.
Tegile's de-duplication technology is in-line, and can be enabled or disabled on a per LUN, or per share basis. Typical virtualised environment using Hyper-V or VMware see around 45%-75% data reduction.
Hope that makes sense.