Data deduplication is the process of examining data to identify any redundancy. The same data keeps getting backed up over and over again, consuming more storage space and impacting cost—thereby creating a chain of inefficiency. Rather than duplicate a copy of the same file, a pointer can be used to go to a previously stored version, reducing the physical capacity needed for a backup job.

Our featured guest speakers for this chat include Greg White, Dell / EMC storage marketing manager for Global Commercial Marketing, Sanjeet Singh, and Paul Davis from Dell’s Product Group, as well as Darin Camp from CommVault.


Technical Community - Background Reading

You may be considering adding deduplication into your environment, or you may have just heard all the buzz about it and want to know more about the specifics of deduplication. This chat discusses deduplication in a backup setting and how it helps make the cost of backup to disk approach the cost of backup to tape. Join us for a chat with experts that can help you understand:

  • The capacity savings versus processing resources and time required for file, block, and sub-block–level deduplication methods
  • The potential benefits of deduplication, such as:

    • Shortening the backup window
    • Reducing storage capacity, power, cooling and space requirements
    • Eliminating redundant data transfers that are needlessly consuming network resources
    • Centralizing data protection and archive, reducing the burden on staff, and eliminating tape requirements at remote offices
    • Enabling cost-effective disaster recovery (DR)
  • How deduplication may move from backup to disk to archive and other areas with static or inactive content as the technology evolves
  • The pros and cons of source-based deduplication, such as reducing network congestion, common management benefits, ease of use, and reduced costs versus the limited availability of it
  • The pros and cons of target-based deduplication, such as ease of integration into complex environments and ease of implementation versus the cost and management considerations and no network congestion reduction
  • The time and resources trade-offs of in-line versus post-processing deduplication
  • Other considerations, such as if your data would be a good candidate for deduplication, deduplication ratios, deduplicating across multiple sites, protecting remote offices, and more

Deduplication White Papers

Chat Transcript

JasonPowell Bring on the Dedupe!
Dell-JeffS It’s been brought. We'll get started here in just a min or two
DELL-ScottH That Dedupe!
Dell_gregorydwhite Welcome all
DELL-ScottH To you!
Dell-JeffS In the meantime, feel free to make fun of Scotth
DELL-ScottH Cuz, we do it so good it requires too Caps :-)
JasonPowell CommVault dude in here too? I need information on the newest Galaxy Express version :-)
Dell_gregorydwhite Jasonpowell, you getting any sleep with your after-hours projects going on?
DELL-ScottH Yes, Darin is CommVault
Dell-JeffS Yep, Darin and Karl are your CommVault experts
karl_friedrich Who is Sanjeet?
DELL-ScottH Hey David, how's it going, you lurker?
JasonPowell We're doing 2 pm–10 pm shifts :-)
DELL-ScottH Oh man, bring on the Mt. Dew
JasonPowell So we get to see wife/kids in the morning now that school is out. I spent the morning gutting basement storage for a garage sale tomorrow...what fun!
Dell_gregorydwhite Cool
JasonPowell Found a bunch of 1994/95 computer magazines; smoking hot 100MHz machines. I was laughing hard looking through them
cgreenoh Mid-90s computer
DELL-ScottH Oh, man, I Loved computer shopper magazine!
Dell-JeffS Who didn't?
Dell_gregorydwhite I swore off garage sales after the last one. All goes to charity now and deduct from taxes :)
Dell-JeffS S’pose I ought to kick this thing off
DELL-ScottH Let’s get this party started
wwolfram Darenc, when do you position CV compression? Or what is its sweet spot?
Dell-JeffS Good afternoon folks! Thanks for joining us for today's special Thursday edition Dell TechCenter chat on Deduplication. (Our normal chats are every Tuesday at 3 pm Central; check the main page for upcoming topics.) My name is Jeff Sullivan, the *newest* member of the Dell TechCenter team, focusing on Storage and Linux–related topics.
DELL-ScottH Welcome to the team, Jeff
cgreenoh Nice, a Linux dude
Dell-JeffS ‘Bout time right!
JasonPowell Jeffs = @sanpenguin?
DELL-ScottH Yeah, the "penguin" part of his twitter name :-)
DELL-ScottH Yup
Dell-JeffS A few housekeeping items: today's chat is recorded; you can view the transcript the next day on this same link that got you to the page...speaking of links *** If you click a link, use a right-click; otherwise you'll likely be bumped out of the chat *** should you get disconnected, rejoin and you can select Action, Recent Room History to see previous dialogue
DELL-ScottH Welcome mark
cgreenoh I mean, I got love for AD, but I also have a slew of Linux Web servers, so... :)
CommVault-DarinC Ward, we like compression at the client to minimize data transmitted between client and media agent
DELL-ScottH Welcome Paul
Dell-JeffS So, let’s get to it! Deduplication—Today we have a number of guest speakers to answer all your dedupe questions: Greg White, Dell / EMC storage marketing manager for Global Commercial Marketing; Sanjeet Singh and Paul Davis from Dell’s Product Group; as well as Darin Camp from CommVault; and more! Please feel free to throw out your question at any time. You are not being rude by doing so...this is a chat :-)
DELL-ScottH Awesome Super-de-duper
Dell_gregorydwhite Anyone hear anything about this dedupe stuff before?
wwolfram What is the value add of data compression between clients? Easy to implement or low overhead, or?
ecastro Nope
JasonPowell I do some Datadomain resales on the side but that's probably going away now that NetApp grabbed 'em
DELL-ScottH Hey Todd
Dell-JeffS Hey Todd
VirtualTodd Hey everybody
JasonPowell Would love some Dedupe in EqualLogic :-)
Dell_gregorydwhite You never know; DD may end up at EMC
DELL-ScottH What's the best paper to read if I'm noob to this?
Dell_gregorydwhite For high-level information, check out www.dell.com/deduplication
JasonPowell I also recall seeing a Dell hardware product bundled with CommVault for doing dedupe and backup GS
Dell_gregorydwhite We've got a good informative white paper coming out in the next few days that goes in depth as well
DELL-ScottH Cool, I'll keep an eye out for that
CommVault-DarinC Ward, data compression actually exists between the IDA on the client and the Media Agent.
JasonPowell We’re running CommVault here for backups, so I'm hoping to hear today about dedupe in CV :-)
Dell-JeffS There’s a new page on Dell TechCenter as well that might help a noob get started too: http://www.delltechcenter.com/page/deduplication
CommVault-DarinC Ward, saves bandwidth for the network segments between data agents and media agents
cgreenoh How's the DL2000 compare to EMC's Avamar?
CommVault-DarinC Jason, you’re referring to the DL2000
DELL-ScottH Welcome snowflyer. Use Action, Recent Room History to catch up
JasonPowell Darin, yeah. I think that was what I'd seen
JasonPowell We also have an enormous amount of A/V stuff so I'd love to know what kind of dedupe compression we might see
Dell-JeffS Watch out for those links; make sure and right-click to avoid getting bumped
CommVault-DarinC Jason, check out http://www.dell.com/deduplication
MarkGifford Our backup software: Arcserve (don't laugh) Any dedupe action in that direction?
Dell_PaulD Cgreenoh, the DL2000 is similar to the Avamar platform in some regards, such as where dedupe happens. But DL2000 has a broader offering in terms of supported client and features such as archive, search, etc.
ecastro Deduplication = e-mail archiving
cgreenoh Interesting, Does Dell offer the software to back up to existing disk chassis/PowerVaults we already have?
CommVault-DarinC Jason, what are the specific A/V file formats you guys are storing?
Cvitale Dedupe and compression are actually different but compliment each other nicely
Dell_gregorydwhite Cvitale, very true
cgreenoh No wait, someone still uses Arcserve? :) Kidding.
Dell_PaulD Cgreenoh, Actually yes. You can purchase CommVault software and run it on existing hardware
DELL-ScottH Yeah, probably on Netware :-) Or maybe OS/2
MarkGifford Ouch
JasonPowell Darin, our editors all use Final Cut, so we end up with large (several Gigs) files. I'll go look at the file type
JasonPowell We have five 9 GB Quicktime movies files each weekend
Cvitale From a dedupe standpoint, you will only store each movie once, regardless of how many of the same copies you have saved
JasonPowell A quick scan shows a ton of Quicktime files
Cvitale Every dedupe-ful will wind up looking like an incremental on disk
cgreenoh Interesting, I hadn't heard of this software before. Is it client- or hardware-side dedupe? Agents for all operating systems, I assume? Microsoft SQL? Is this the right place to be asking about it? :)
JasonPowell Does the latest version of Galaxy Express do any dedupe?
Cvitale No dedupe in Galaxy Express yet
JasonPowell When I contact CommVault this spring they said call back in a few months
Dell-JeffS Hi Muffadal. Use Action, Recent Room History to catch up
Dell_PaulD Cgreenoh, yes. This is the Simpana 8.0 software from CommVault. It does compression at the client side and deduplication at the media server. And, yes, it has all the agents needed for purchase (Microsoft SQL, Exchange, etc.)
Dell-JeffS Or not
Cvitale The client will create a hash and send the data across the wire to be matched with a database, and then the file gets stored or receives an updated pointer to the file if it exists already. This will allow for compression and encryption across the wire as well
Cvitale I don’t think any other dedupe products compress and encrypt client side
CommVault-DarinC Jason, I'll need to dig a bit and get back with you. I don’t want to get you wrong information.
Cvitale In that price range
JasonPowell Darin, thanks. We love the SMB version of Galaxy but getting information about it is like pulling teeth :-)
cgreenoh Interesting, this just might be the hot setup to replace an aging Backupexec infrastructure
CommVault-DarinC Jason, check out: http://documentation.commvault.com/dell/release_8_0_0/books_online_1/default.htm for more information on Express
cgreenoh Because you know, everybody likes it when your agents randomly fail for no clear reason and you miss backups
DELL-ScottH Oh those pesky users :-)
Dell_PaulD :)
DELL-ScottH Are there any good demos?
cgreenoh Dang, stole my question, Scott
cgreenoh Are there any good demos?
DELL-ScottH Oh it's so much better coming from you ;-)
Dell-JeffS Has anyone deployed dedupe?
cgreenoh Does rsync count? :lol:
Dell-JeffS Lol
Dell_PaulD Cgreenoh, we are planning to have a demo (+ some training) on the Web site soon. You will be able to get to it from www.dell.com/dl2000
Dell-JeffS In case you haven't seen it, Greg (aka @gregorydwhite on twitter), I put up a blog about dedupe a few days ago: http://en.community.dell.com/blogs/insideit/archive/2009/06/05/the-future-of-data-deduplication.aspx
JasonPowell Yes, must have demos :-) Words are pointless ;-)
cgreenoh Paul, any idea when? I can set a reminder
Dell_PaulD I believe it should be up within a month
Dell-Sanjeet Cgreenph, You can follow up with me...sanjeet_singh@dell.com
cgreenoh Rock on, will send an e-mail
wwolfram Is there the ability to dedupe to tape?
JasonPowell Would you expect a WAN accelerator would improve dedupe off site?
cgreenoh I look forward to the day when I can drunken-dance in little clothing around a pile of burning tapes. Tape, I loathe you
Dell-Sanjeet Wwolfram. CommVault Simpana 8.0 has the ability to dedupe to tape. The DL2000 that combines Simpana 8.0 on the platform...also supports that
wwolfram Do I have the ability to turn on or off dedupe to tape with CommVault?
DELL-ScottH @cgreenoh, lol...we will have a tweetup and celebrate together!
Dell_gregorydwhite Love that mental image
cgreenoh Hah!
DELL-ScottH Maybe at Burning Man!
Dell-JeffS Hi Joel
joel_perreault Good day
Dell-JeffS You can select Action, Recent Room History to see previous dialogue
Dell_gregorydwhite Great idea, tape inspired art to take to the next Burning Man
Dell-Sanjeet Wwolfram, yes. You can turn on or off dedupe to disk as well as dedupe to tape
Dell-JeffS Right-click on links to avoid getting bumped
cgreenoh I'm excited; dedupe is becoming more prevalent, I have a 3.5 TB Microsoft SQL database that takes 48 hours to fully back up right now
wwolfram How does Dedupe with the CommVault/DL2000 improve my backup performance?
joel_perreault Thanks
wwolfram Sorry...how much does Dedupe...
CommVault-DarinC Jason, WAN accelerators always help when WAN pipes are too small or the data starts to grow beyond the WAN pipe's capabilities
Dell-Sanjeet DL2000 (with dedupe turned on) can perform backups up to the rate of 1.5 TB/hr. Here's a good white paper that describes how: http://www.dell.com/downloads/global/products/pvaul/en/storage-dl2000-commvault-backup.pdf
cgreenoh Mmm white paper
Dell-JeffS There are several white paper/resource links on www.dell.com/dedup
cgreenoh Rock on
Dell-JeffS Also, on our wiki I've linked to those and a few other spots; and anything new that pops up: http://en.community.dell.com/blogs/insideit/archive/2009/06/05/the-future-of-data-deduplication.aspx
Dell-JeffS Oops, wrong link, dang you cut and paste. http://www.delltechcenter.com/page/deduplication
DELL-ScottH Let’s dance around the fire to that too
KongY-Dell Any best practice for dedupe and virtualized environments?
Cvitale Does tape burn well?
cgreenoh With enough gasoline, you bet your @$$. :)
Dell_gregorydwhite Just don't inhale the fumes
cgreenoh Yeah. that's probably not a good idea
Dell_PaulD Cgreenoh, are you moving totally to disk-based backup?
CommVault-DarinC Dedupe is very well suited to virtualized environments. Lots of redundant data with VMs
cgreenoh Paul, yes I want to try with tape as a DR only. Factors are $$$ and time, and maybe patience...
Dell_PaulD Is replication interesting for DR?
cgreenoh So data > disk > tape. Replication to me is awesome, but not applicable in my environment. If I had the option I would do disk-based replication and drop tape completely
Dell-Sanjeet Cgreenoh, with tape deduplication you can at least reduce your headache of managing tapes—the number of tapes will go down
Cvitale Why won’t it work for you?
CommVault-DarinC Pauld, certainly is. You can replicate from media agent to media agent or host to media agent with Simpana 8. You choose what best suits your environment. Works well for DR
Cvitale I am reading you can replicate only changes of deduplicated data
cgreenoh Yeah, the dedupe to tape is a good feature as well. I don't have an area to securely replicate to, I deal with a lot of PII
Dell-JeffS Its about that time. Thanks for joining. Should you have further questions, please feel free to post in our "Ask the Experts" section!
Cvitale Let’s take the party to the blogs
cgreenoh Ah well, thanks for the information. Good chat.
Dell-JeffS Thanks Cam!
Cvitale Np
Dell_gregorydwhite Thanks all
VirtualTodd Good chat, much better than CNBC!
ecastro Thanks, we need Spanish Chat
VirtualTodd Funny blog post with link to CNBC hosts trying to talk about what they do not know: www.storagerap.com/2009/06/dedupified-at-cnbc-since-when-am-i-the-expert-on-testicle-stuff-.html