Friday, December 10, 2010

Musings on the future of data dedupe

December 10, 2010 – I recently chatted with a few vendors in the data deduplication space. As conversations often do at this time of year, the talk turned toward the future of data deduplication. Here are a few snippets.

“Tier 1 storage vendors will move past point solutions for deduplication next year,” says Tom Cook, CEO at Permabit. “They’re working toward end-to-end deduplication, across SAN, NAS, unified [block and file], nearline and backup.”

“When that happens, once their customers ingest data and get it into a deduplicated state they’ll never have to re-hydrate that data throughout its lifecycle. The data will stay deduplicated through processes such as replication and backup. That’s a huge savings in workflow, footprint and bandwidth,” say Cook.

“Today, the big vendors use a variety of point solutions, but they’d like to use a single data optimization product across all their platforms, whether it’s block or file, primary or secondary. End-to-end deduplication will creep into the market in 2011 and 2012,” Cook predicts. (Permabit sells deduplication software – dubbed Albireo – to OEMs.)

Personally, I don’t think that single-solution, end-to-end deduplication will happen that quickly, in part because of the huge investments that the Tier 1 vendors have made in their “point solutions,” but we’ll see.

Dennis Rolland, director of advanced technology at Sepaton, has some predictions that are similar to Cook’s, as well as some differing opinions regarding trends in the data deduplication market.

“Dedupe will be required in more places going forward, including primary storage in addition to nearline storage, and end users will have to cut down on how many dedupe solutions they have because of the complexity in managing many disparate solutions,” says Rolland, “but we’ll probably still have distinct solutions for primary and nearline storage deduplication.”

Rolland thinks that the emphasis on deduplication benefits such as capacity, footprint and cost savings is shifting. “Dedupe enables low-bandwidth replication, which in turn enables companies to economically deploy DR [disaster recovery] sites,” he says.

Rolland also links two technologies that will no doubt make my list of The Hottest Storage Technologies for 2011 (assuming I get around to making such a list): data deduplication and cloud storage.

“Dedupe is an enabler for cloud storage,” says Rolland. “It makes it practical to deploy cloud storage because you’re sending, say, 10x less data over the WAN. That has significant implications for deploying cloud-based DR.”

(Sepaton bundles data deduplication software with its virtual tape libraries, or VTLs.)

Meanwhile, Quantum released the results of an end-user survey this week that suggests U.S. companies could save $6 billion annually in file restore costs by adopting deduplication.

According to the survey of 300 IT professionals, respondents spend an average of 131 hours annually on file restore activities, with 65% restoring files at least once a week. Based on the average wage for IT professionals in the US ($31.55 per hour according to, that equates to $9.5 billion. However, Quantum’s survey also found that those companies that are most efficient at file restoration predominantly use deduplication and can complete restores in approximately one-third the average time of all respondents. So, according to Quantum’s press release: “If the broader US market was to achieve similar data restore efficiencies, the potential annual savings for US businesses would be approximately $6 billion.”

This survey seems a bit misleading to me because it’s not really focused on the advantages of data deduplication per se in a file restore context but, rather, the advantages of disk-based backup/recovery.

Steve Whitner, Quantum’s product marketing manager for DXi, explains: “If you back up to regular [non-deduplicated] disk and you have a need for DR, you have to get that data to another site and you can’t keep data on conventional disk for very long – maybe a few days or a week. So the real issue is not the speed of restore; it’s the fact that companies can now store a month or two of deduplicated backup data on disk.”

You be the judge. Here’s Quantum’s press release and here are some supporting slides from the survey results.

One thing is clear: In 2011, the focus will shift from deduplication for nearline/secondary storage to deduplication for primary storage. Witness two of this year’s biggest storage acquisitions: Dell buying Ocarina Networks and IBM acquiring Storwize. (Storwize’s technology is now in the IBM Real-time Compression business unit.)

Related blog posts:

What is progressive deduplication?

Data deduplication: Permabit finds success with OEM model


DrDedupe said...

Nice post Dave. In addition to points mentioned, I've posted my musings here - I am a little more bullish in the area of pervasive deduplication

Happy Holidays


josephmartins said...

Dave, people need to get out of the backup/DR/primary mindset when discussing data dedupe. It's not about one type versus the other. To have that discussion is to miss the point of what intelligence such as dedupe can accomplish if we embrace it across the board.

In my opinion the key is in embedding such functionality into storage devices rather than relying upon separate external devices as we have up to now.

Our storage devices can be designed from the ground up to more intelligently and efficiently store, organize and manage our data on their own in a way that enables free exchange/flow/movement of information from one storage device to another without the hindrance (and information access "lock-in") of standalone intelligence.

Much the same way that RAID, audio and video processing intelligence were eventually integrated directly onto motherboards we will see features such as dedupe integrated directly into storage devices. And just as RAID, audio and video are still available as plug-in cards, standalone dedupe options will continue to be available, but a majority of consumers will simply buy storage with dedupe embedded.