Tuesday, February 3, 2009

Consider data de-duplication plus compression

Data de-duplication gets a lot of ink and engenders a lot of debates, but the option of combining data de-dupe with real-time compression on primary storage is rarely discussed, maybe because there aren’t many vendors targeting compression for primary storage resources.

That’s what Storwize specializes in, and the company recently released the results of tests that suggest a serious look at combining de-duplication with data compression to further enhance overall data reduction.

Storwize claims that adding real-time compression can improve data reduction by more than 200%, based on its internal testing as well as customer deployments. (In the internal tests, Storwize used data de-duplication solutions from both Data Domain and NetApp.)

Storwize positions data de-duplication as being particularly advantageous for highly redundant backup data sets, but points out that using real-time data compression on the front end (primary storage devices) can significantly reduce de-duplication processing time and enhance throughput, in addition to reducing overall capacity requirements.

To assess the validity of the test results, check them out here

And if you’re among the early adopters of data compression on primary storage, respond below or email me at daves@pennwell.com and let me know what types of real-life data compression ratios you’re getting.

1 comment:

permabit said...

Dave,

Compression and deduplication are definitely "two great tastes that taste great together", but chaining them like this is a bit like eating a peanut butter sandwich and THEN a jelly sandwich -- they both taste good, but could be put together better. For file-level deduplication (also called single-instance storage), this is a fine strategy. With more advanced variable deduplication, however, you can lose a lot of your deduplication opportunities.

That's because of the nature of stream compression. A single byte change of input in a large file will make the compressed output past that point very different, which means that deduplication can't eliminate redundancies later in the file. The right place for compression is actually after deduplication has occurred, or at least the segmentation for deduplication.

Our Permabit Enterprise Archive (http://www.permabit.com) incorporates both technologies in this order for maximum benefit. As data is being written an in-line process breaks files up in to variable-sized segments for optimal deduplication. Then these segments are compressed, deduplicated, and written to disk. This provides the best of both worlds.

Regards,
Jered Floyd
CTO, Permabit