Data deduplication is the process of examining data to identify any redundancy. The same data keeps getting backed up over and over again, consuming more storage space and impacting cost—thereby creating a chain of inefficiency. Rather than duplicate a copy of the same file, a pointer can be used to go to a previously stored version, reducing the physical capacity needed for a backup job.

Our featured guest speakers for this chat include Greg White, Dell / EMC storage marketing manager for Global Commercial Marketing, Sanjeet Singh, and Paul Davis from Dell’s Product Group, as well as Darin Camp from CommVault.

Technical Community - Background Reading

You may be considering adding deduplication into your environment, or you may have just heard all the buzz about it and want to know more about the specifics of deduplication. This chat discusses deduplication in a backup setting and how it helps make the cost of backup to disk approach the cost of backup to tape. Join us for a chat with experts that can help you understand:

  • The capacity savings versus processing resources and time required for file, block, and sub-block–level deduplication methods
  • The potential benefits of deduplication, such as:

    • Shortening the backup window
    • Reducing storage capacity, power, cooling and space requirements
    • Eliminating redundant data transfers that are needlessly consuming network resources
    • Centralizing data protection and archive, reducing the burden on staff, and eliminating tape requirements at remote offices
    • Enabling cost-effective disaster recovery (DR)
  • How deduplication may move from backup to disk to archive and other areas with static or inactive content as the technology evolves
  • The pros and cons of source-based deduplication, such as reducing network congestion, common management benefits, ease of use, and reduced costs versus the limited availability of it
  • The pros and cons of target-based deduplication, such as ease of integration into complex environments and ease of implementation versus the cost and management considerations and no network congestion reduction
  • The time and resources trade-offs of in-line versus post-processing deduplication
  • Other considerations, such as if your data would be a good candidate for deduplication, deduplication ratios, deduplicating across multiple sites, protecting remote offices, and more

Deduplication White Papers

Chat Transcript

Good afternoon folks! Thanks for joining us for today's special Thursday edition Dell TechCenter chat on Deduplication. My name is Jeff Sullivan, the *newest* member of the Dell TechCenter team, focusing on Storage and Linux–related topics.
cgreenoh Nice, a Linux dude
A few housekeeping items: today's chat is recorded; you can view the transcript the next day on this same link that got you to the page...speaking of links *** If you click a link, use a right-click; otherwise you'll likely be bumped out of the chat *** should you get disconnected, rejoin and you can select Action, Recent Room History to see previous dialogue
CommVault-DarinC Ward, we like compression at the client to minimize data transmitted between client and media agent
So, let's get to it! Deduplication—Today we have a number of guest speakers to answer all your dedupe questions: Greg White, Dell / EMC storage marketing manager for Global Commercial Marketing; Sanjeet Singh and Paul Davis from Dell's Product Group; as well as Darin Camp from CommVault; and more! Please feel free to throw out your question at any time.
Dell_gregorydwhite Anyone hear anything about this dedupe stuff before?
wwolfram What is the value add of data compression between clients? Easy to implement or low overhead, or?
JasonPowell I do some Datadomain resales on the side but that's probably going away now that NetApp grabbed 'em
JasonPowell Would love some Dedupe in EqualLogic :-)
Dell_gregorydwhite You never know; DD may end up at EMC
Dell_gregorydwhite For high-level information, check out
JasonPowell I also recall seeing a Dell hardware product bundled with CommVault for doing dedupe and backup GS
Dell_gregorydwhite We've got a good informative white paper coming out in the next few days that goes in depth as well
CommVault-DarinC Ward, data compression actually exists between the IDA on the client and the Media Agent.
JasonPowell We’re running CommVault here for backups, so I'm hoping to hear today about dedupe in CV :-)
Dell-JeffS There’s a new page on Dell TechCenter as well that might help a noob get started too:
CommVault-DarinC Ward, saves bandwidth for the network segments between data agents and media agents
cgreenoh How's the DL2000 compare to EMC's Avamar?
CommVault-DarinC Jason, you’re referring to the DL2000
JasonPowell Darin, yeah. I think that was what I'd seen
JasonPowell We also have an enormous amount of A/V stuff so I'd love to know what kind of dedupe compression we might see
CommVault-DarinC Jason, check out
Dell_PaulD Cgreenoh, the DL2000 is similar to the Avamar platform in some regards, such as where dedupe happens. But DL2000 has a broader offering in terms of supported client and features such as archive, search, etc.
cgreenoh Interesting, Does Dell offer the software to back up to existing disk chassis/PowerVaults we already have?
Cvitale Dedupe and compression are actually different but compliment each other nicely
JasonPowell Darin, our editors all use Final Cut, so we end up with large (several Gigs) files. I'll go look at the file type
JasonPowell We have five 9 GB Quicktime movies files each weekend
Cvitale From a dedupe standpoint, you will only store each movie once, regardless of how many of the same copies you have saved
JasonPowell A quick scan shows a ton of Quicktime files
Cvitale Every dedupe-ful will wind up looking like an incremental on disk
cgreenoh Interesting, I hadn't heard of this software before. Is it client- or hardware-side dedupe? Agents for all operating systems, I assume? Microsoft SQL? Is this the right place to be asking about it? :)
JasonPowell Does the latest version of Galaxy Express do any dedupe?
Cvitale No dedupe in Galaxy Express yet
JasonPowell When I contact CommVault this spring they said call back in a few months
Dell_PaulD Cgreenoh, yes. This is the Simpana 8.0 software from CommVault. It does compression at the client side and deduplication at the media server. And, yes, it has all the agents needed for purchase (Microsoft SQL, Exchange, etc.)
Cvitale The client will create a hash and send the data across the wire to be matched with a database, and then the file gets stored or receives an updated pointer to the file if it exists already. This will allow for compression and encryption across the wire as well
Cvitale I don’t think any other dedupe products compress and encrypt client side
CommVault-DarinC Jason, I'll need to dig a bit and get back with you. I don’t want to get you wrong information.
Cvitale In that price range
JasonPowell Darin, thanks. We love the SMB version of Galaxy but getting information about it is like pulling teeth :-)
cgreenoh Interesting, this just might be the hot setup to replace an aging Backupexec infrastructure
CommVault-DarinC Jason, check out: for more information on Express
cgreenoh Because you know, everybody likes it when your agents randomly fail for no clear reason and you miss backups
DELL-ScottH Are there any good demos?
Dell-JeffS Has anyone deployed dedupe?
Dell_PaulD Cgreenoh, we are planning to have a demo (+ some training) on the Web site soon. You will be able to get to it from
Dell-JeffS In case you haven't seen it, Greg (aka @gregorydwhite on twitter), I put up a blog about dedupe a few days ago:
cgreenoh Paul, any idea when? I can set a reminder
Dell_PaulD I believe it should be up within a month
Dell-Sanjeet Cgreenph, You can follow up with
cgreenoh Rock on, will send an e-mail
wwolfram Is there the ability to dedupe to tape?
JasonPowell Would you expect a WAN accelerator would improve dedupe off site?
cgreenoh I look forward to the day when I can drunken-dance in little clothing around a pile of burning tapes. Tape, I loathe you
Dell-Sanjeet Wwolfram. CommVault Simpana 8.0 has the ability to dedupe to tape. The DL2000 that combines Simpana 8.0 on the platform...also supports that
wwolfram Do I have the ability to turn on or off dedupe to tape with CommVault?
Dell-Sanjeet Wwolfram, yes. You can turn on or off dedupe to disk as well as dedupe to tape
Dell-JeffS Right-click on links to avoid getting bumped
cgreenoh I'm excited; dedupe is becoming more prevalent, I have a 3.5 TB Microsoft SQL database that takes 48 hours to fully back up right now
wwolfram How does Dedupe with the CommVault/DL2000 improve my backup performance?
wwolfram much does Dedupe...
CommVault-DarinC Jason, WAN accelerators always help when WAN pipes are too small or the data starts to grow beyond the WAN pipe's capabilities
Dell-Sanjeet DL2000 (with dedupe turned on) can perform backups up to the rate of 1.5 TB/hr. Here's a good white paper that describes how:
Dell-JeffS There are several white paper/resource links on
Dell-JeffS Also, on our wiki I've linked to those and a few other spots; and anything new that pops up:
Dell_PaulD Cgreenoh, are you moving totally to disk-based backup?
CommVault-DarinC Dedupe is very well suited to virtualized environments. Lots of redundant data with VMs
cgreenoh Paul, yes I want to try with tape as a DR only. Factors are $$$ and time, and maybe patience...
Dell_PaulD Is replication interesting for DR?
cgreenoh So data > disk > tape. Replication to me is awesome, but not applicable in my environment. If I had the option I would do disk-based replication and drop tape completely
Dell-Sanjeet Cgreenoh, with tape deduplication you can at least reduce your headache of managing tapes—the number of tapes will go down
Cvitale Why won’t it work for you?
CommVault-DarinC Pauld, certainly is. You can replicate from media agent to media agent or host to media agent with Simpana 8. You choose what best suits your environment. Works well for DR
Cvitale I am reading you can replicate only changes of deduplicated data
cgreenoh Yeah, the dedupe to tape is a good feature as well. I don't have an area to securely replicate to, I deal with a lot of PII
