Thursday, April 30, 2009

Welcome to the data dedupalooza

May 1, 2009 – I’ve been looking into data reduction (or capacity optimization) technologies for primary data storage recently, including data de-duplication, compression and others. Of course, compression and de-duplication are well-proven and now common on secondary storage devices such as VTLs and other backup/archiving systems.

However, despite the fact that vendors such as NetApp and Storwize have been selling (or, in the case of NetApp, giving it away for free) data reduction software tuned for primary storage for some time, end-user adoption has been relatively slow.

It looks like that’s about to change.

For one, EMC’s entry into the market with (free) data de-duplication and compression for its Celerra file servers lends legitimacy to data reduction on primary storage devices. See “EMC adds de-dupe, SSDs to NAS.”

Two, NetApp is seeing some real traction in this space. The company claims to have shipped more than 30,000 systems with its ‘NetApp deduplication’ technology. The question is how many of those customers have actually fired up the data de-duplication functionality. The answer is “at least 15,000,” according to Larry Freeman ( aka Dr. Dedupe ), senior marketing manager for storage efficiency at NetApp. And it’s probably a lot more, because the 15,000 are only the ones that NetApp can track via its autosupport feature, which not every customer uses.

Since NetApp can de-dupe across all of its storage platforms (including FAS, V-Series, and VTLs), the next question is: Of those 15,000+ systems that are actually running de-duplication, what percentage are primary systems? That one’s difficult to track, but Freeman estimates that “about 60% of those systems are de-duping at least one primary application.”

Interesting stats, but any way you slice it the entry of EMC, and the success of NetApp, in data reduction for primary storage means that this market is poised for takeoff – despite end users’ lingering concerns about performance and data availability and reliability issues which have, for the most part, been addressed by the vendors.

And a rising tide lifts all boats (maybe), which is good news for the smaller players and startups in this space, including vendors such as greenBytes, Hifn, Storwize and Ocarina Networks (which recently inked about six partnership agreements).

All of the vendors go about data reduction in very different ways, but the end goal is the same: A reduction in primary storage capacity, which results in cost savings and/or purchase deferments, as well as power, cooling and space savings. In this economic climate, who can argue with that?

More on this topic as I delve deeper, but for now . . .

The best article that InfoStor has posted on this subject is “Primary storage optimization moves forward,” by the Taneja Group’s Eric Burgener.

Also check out SNIA’s Data Management Forum’s Data De-duplication and Space Reduction Special Interest Group.

And sign up to attend an upcoming InfoStor Webcast, titled "Leveraging Capacity Optimization to Reduce Datacenter Footprint and Storage Costs," Tuesday, May 12 at 10:00 PST, 1:00 EST. Presenters will include Noemi Greyzdorf, research manager, storage software, at IDC, and Peter Smails, senior vice president of worldwide marketing at Storwize. The Webcast will cover data reduction for all types of data, including primary, nearline and secondary data. To register, click here.

No comments: