December #SANchat Transcript – All about deduplication

Inside Enterprise IT

Strategic insights on using IT to achieve business goals

Inside Enterprise IT

Strategic insights on using IT to achieve business goals

December #SANchat Transcript – All about deduplication

  • Comments 1

 Posted on behalf of Alison Krause, who works in Dell's Storage Product Group, Social Media & Communications.

We have been so excited about all the great new things Dell is doing with the acquisition of Ocarina. We held a SANchat last month to talk all about compression and this month we continued the Ocarina conversation by talking about deduplication. It was a “part 2,” if you will. We had a great conversation!

As I did last month, want to repeat my thanks to Mike Davis (@mike_davis) for joining us as our expert. Before diving into this chat’s transcript, if you missed the post about last month’s discussion (as well as a link to further explain SANchats), you can find that here.

Mike did a great job explaining the difference between compression and dedupe. He described it as, “Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files. 2 diff implementations.” He also answered questions about examples of when to use each and why Dell used one over the other on some of our products. Read through the transcript below to see more!

You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Join us in January as we talk about Dell Storage Forum 2012 London!

 

dell_storage

#SANchat starts in 1 hour1 talking all about deduplication today - come chime in!

NewFulcrumPoint

RT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today - come chime in!

gminks

We're running a bit behind schedule this morning for our #dedupe  #SANchat

gminks

I'm gonna blame it on the cold - it's 32 degrees and everyone in Austin is frozen :O #SANchat

dell_storage

It's time for #SANchat! We recommend using tweetchat to join the converstation, this month we're talkin dedupe! http://t.co/GBBzIa7v

iSCSIKing

@gminks It is not supposed to be that cold in Texas, so yes we are all frozen
 #SANChat

LiemNguyen

MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? :)

mike_davis

Ahhh maybe they dedupe'd the temperature reading in TX #SANchat

LiemNguyen

RT @iSCSIKing: @gminks It is not supposed to be that cold in Texas, so yes we are all frozen #SANChat <<I'm ashamed to admit I agree.

iSCSIKing

RT @mike_davis: Ahhh maybe they deduped the temperature reading in TX < I think so  #SANChat

gminks

as we wait for the austinites to thaw ;) ...here's last month's transcript on compression: http://t.co/7V8eGkLy #SANchat

gminks

hi @iscsiking @mike_davis @liemnguyen! you guys were here last month... any take aways #SANchat

NewFulcrumPoint

RT @gminks: as we wait for the austinites to thaw ;) ...here's last month's transcript on compression: http://t.co/7V8eGkLy #SANchat

gminks

or we can just jump into talking about #dedupe!  #SANchat

johnobeto

Tuning in to #sanchat on dedupe. Shhh. I'm in learning mode

gminks

We're talking #dedupe this morning .... grab a coffee and join in #SANchat

mike_davis

We're here to talk dedupe. Last time was compression. Two different animals, implemented differently,using different sys resources. #SANchat

gminks

@johnobeto nice to see you....anything in particular you want to learn this morning? #SANchat

mike_davis

Hey @johnobeto, you're a veteran at dedupe...from #techfieldday 2010 #SANchat

iSCSIKing

@johnobeto Hey John, thanks for chatting with us this morning
 #SANChat

johnobeto

@gminks Hello Gina. How dedupe can be driven down to SMBs cost-efficiently. #sanchat

gminks

@mike_davis - so i know you are an expert on #dedupe and compression -- what's your background again? #SANchat

DennisMSmith

Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/DtncDqbW

gminks

actually - maybe everyone could introduce themselves. :) #SANchat

gminks

@DennisMSmith hey Dennis! welcome to the early morning edition of  #SANchat

DennisMSmith

@gminks Thanks Gina!  Teach me all I need to know :) #SANchat

iSCSIKing

@gminks Hey Gina, this is Lance from the Dell TechCenter Storage team #SANChat

mike_davis

Hi @gminks, I ran Marketing for Ocarina Networks (acq 17 mo ago). Implemented dedupe&compress concurrently in our products. #SANchat

gminks

@mike_davis ok cool! so - lets start with 101 questions,...what's the main diffs between compression and deduplication? #SANchat

johnobeto

I learned my baby steps about dedupe from Ocarina...before they grabbed their pot of gold from Dell. They didn't give me any :( #sanchat

gminks

I'm Gina from Dell Storage.  #SANchat

DennisMSmith

I'm Dennis with the @DellTechCenter team
 #SANchat

gminks

@johnobeto well @mike_davis is sharing the knowledge now, that's worth more than gold right? ;) #SANchat

mike_davis

@johnobeto DD for SMB is definitely useful, although implementation needs low cost overhead = embedded in arrays. #SANchat

storageDiva

good morning all! Sheryl from Dell here @mike_davis - is it fair to call dedup a type of compression? #SANchat

johnobeto

@gminks @mike_davis On the surface. However, I'm a superficial guy. Gimmie the Latinum! #sanchat

mike_davis

Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files.2 diff implementations #SANchat

johnobeto

@mike_davis Very true. It also need to be totally abstrated from the average SMB supt drone, easy to implement & use. #sanchat

gminks

@storageDiva good morning and welcome to  #SANchat

hansdeleenheer

everytime you guys say goodmorning, I'm trying to go home. Everytime you say good afternoon I'm trying to sleep. Damn' techchats #sanchat

mike_davis

@storagediva technically compression is different. But a less strict interpretation could include DD as a method of compression. #SANchat

gminks

RT @mike_davis: Compression = using math to describe patterns. Dedupe = eliminating redundancy either across/within files<nice def! #SANchat

johnobeto

@mike_davis The initial cost of an SMB dedupe soln can be accounted for. It just needs to be a total background device/process IMO #sanchat

gminks

@hansdeleenheer sorry! What time is it, so I know what to say to you #SANchat

JeffSullivan

RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/4zxE4rJm

iSCSIKing

@hansdeleenheer well thanks for joining us today!
 #SANChat

mike_davis

@johnobeto making sure it's transparent is hard. All Dell implementations make it end-user xparent, but always a resource cost. #SANchat

DennisMSmith

@hansdeleenheer so would it be good evening to you now :) #SANchat

gminks

@JeffSullivan hey Jeff.  #SANchat

mike_davis

We're very cautious about use-case. In backup we can anticipate different things than primary storage. data patterns, types, etc #SANchat

WarrenAtDell

RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh

hansdeleenheer

what has the most impact on performance? Compression or Dedupe? (lets assume at block level in the SAN) #sanchat

cyndenabc

RT @WarrenAtDell: RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh

calmo

@mike_davis how about comparing dedupe vs. compression WRT recoverable space potential (real world)? #SANchat #SANchat

mike_davis

We need to tailor the design; in-band,post-proc, different levels of aggressiveness. In different products you will see differences #SANchat

hansdeleenheer

Would you prefer Compression/Dedupe at source or at target? before or after initial write? #sanchat

mike_davis

@hansdeleenheer both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what's available. #SANchat

gminks

@cyndenabc @WarrenAtDell @calmo welcome to  #SANchat

storageDiva

@mike_davis amen on use-case - it's no good being the guy with the hammer - esp. when we can anticipate characteristics #SANchat

mike_davis

Content aware compr will have much different resource overhead than generic fast comp (LZ etc). So we play those together also. #SANchat

gminks

@hansdeleenheer wow we can tell its not early morning for you. Great questions!  #SANchat

gminks

RT @mike_davis: Were very cautious about use-case. In backup we can anticipate different things than primary storage.  #SANchat

mike_davis

@calmo in a typ backup workflow, just DD can deliver 90%+ savings. But apply the same alg to primary storage and maybe 40%. YMMV. #SANchat

hansdeleenheer

is the impact on performance on rehydration same level as on initial write? #sanchat

johnobeto

Does dedupe require you to tier your storage? If so, are there cost efficiencies in doing that? #sanchat

storageDiva

RT @mike_davis: both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what's available. #SANchat

mike_davis

In primary storage we'd like to have content awareness and policies. But that implies file-based (eg our DX). Block is tougher. #SANchat

hansdeleenheer

@Gminks I can be a pain in the ass all day long. Ask the DellTechCenter people :-) #sanchat

gminks

@hansdeleenheer oh so you are one of us.  #SANchat

mike_davis

@hansdeleenheer we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat

JeffSullivan

@hansdeleenheer Not at all!  #sanchat

gminks

RT @storageDiva: @mike_davis amen its no good being the guy with the hammer - esp. when we can anticipate characteristics #SANchat

DennisMSmith

@hansdeleenheer @Gminks haha, not at all.  You have great questions, we're just here to make finding the answer easier :) #SANchat

johnobeto

@MBLeib Thanks, Matt. My goal is trying to find a sweet [financial] spot where tiering becomes fiscally prudent to implement #sanchat

iSCSIKing

@hansdeleenheer You ask great questions, keeps us on our toes ... #SANChat

mike_davis

@johnobeto DD/Compr definitely a 'virtual' tiering in some cases (post proc). In backup less so. #SANchat

gminks

@mike_davis why is block so much tougher? #SANchat

gminks

hi @MBLeib - you joining us for  #SANchat

gminks

RT @mike_davis: we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat

mike_davis

@gminks Block is easy to implement; generic algorithm.But hard to be content aware; data is opaque in most cases.And less CPU/RAM!  #SANchat

hansdeleenheer

if Compression/Dedupe is such a great thing, why not implementing it in all storage solutions (at block level!) #sanchat

mike_davis

@hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat

gminks

@mike_davis so there is content aware and non-content aware #dedupe (and compression??) #SANchat

storageDiva

another Q for @mike_davis: why did we choose compres (vs. dedup) for the DX? #SANchat

johnobeto

From a technical standpoint, is it possible to add a 'dedupe processor/co-processor' to storage to improve performance? #sanchat

gminks

RT @Mike_Davis: @hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat

hansdeleenheer

grwat question Gina! why would I need content-aware block dedupe? that is what block is all about. #sanchat

mike_davis

content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat

mike_davis

One feature we suppt is 'object' DD. If we see a JPG for ex in a stream, we will treat as 1 'chunk'. Improves performance. #SANchat

gminks

RT @Mike_Davis: content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat

hansdeleenheer

@gminks It's not! thats what I meant #SANchat

iSCSIKing

@mike_davis which is better in-line or post-process #dedup ? or does it matter? #SANChat

MBLeib

RT @MBLeib: @johnobeto The idea of adding a proc to the purpose of dedupe on the array is very cool, but I've yet to see it done #SANchat

johnobeto

@MBLeib Seems like it would help in offloading the performance hit in dedupe #sanchat

mike_davis

@johnobeto Co-proc is definitely something we consider. break out compute overh into sep box. Adds cost, but also perf, flexibility #SANchat

gminks

@hansdeleenheer I'm confused.  #SANchat

mike_davis

@MBLeib One interesting co-proc idea is GPU. Great at FP operations. Not so much for dedupe. #SANchat

hansdeleenheer

@gminks I want dedupe to happen on block level so I don't need content aware solutions #SANchat

MBLeib

@johnobeto The idea is that most enterprise San's have processor to spare, so unnecessary. @mike_davis makes a good point #sanchat

gminks

@hansdeleenheer aH.ok thx for spelling it out. #needanothercupofcoffee #SANchat

iSCSIKing

RT @mike_davis: @johnobeto Co-proc is def something we consider. break compute overh in2 sep box. Adds $ but also perf, flexibility #SANchat

mike_davis

@storageDiva chose aggress content aware compr for DX because it's an archival workload. getting every GB out is P1. #SANchat

Justin_Lauer

@MBLeib @johnobeto With multicore CPUs why would you need to dedicate a CPU to only dedup?  Should be native to array by now. #SANchat

MBLeib

@Justin_Lauer We came from EMC, in Ent, the vMax has processor expanability up to 8, I believe, so no need to dedicate to dedupe #SANchat

mike_davis

@Justin_Lauer If we have CPU+RAM surplus, can definitely lever that. But some use cases want to throttle/schedule. #SANchat

mike_davis

Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU  ;-) #SANchat

gminks

@Justin_Lauer good morning & welcome to  #SANchat

johnobeto

@Justin_Lauer @MBLeib  Not necessarily a CPU. Maybe a core, ASIC or some dedicated silicon. Anything that nullifies the dedupe hit #sanchat

gminks

RT @mike_davis: Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU  ;-) #SANchat

hansdeleenheer

@Justin_Lauer lso: if you can handle dedupe/compression by he server it doesnt hit the network / connectivity of the SAN #sanchat

Justin_Lauer

@johnobeto @MBLeib I guess what i'm saying is hardware more than powerful enought.  Make it native to the filesystem #sanchat

MBLeib

RT @Justin_Lauer: @johnobeto @MBLeib Make it native to the filesystem>> Well said, Justin. Completely agreed. #sanchat

mike_davis

@Justin_Lauer @johnobeto Agreed, sys resources are more than enough, esp in SMB environment. SQL cares about xaction latency though #SANchat

hansdeleenheer

@Justin_Lauer native to filesystem = MS has this integrated in Win8! #sanchat

johnobeto

@Justin_Lauer @MBLeib Good. That allows me to segue into this: forthcoming Windows Server 8 has rudimentary dedupe built into it...#sanchat

hansdeleenheer

@Justin_Lauer MS has this integrated in Win8! ... but I guess you weren't waiting for this answer :-) #sanchat

gminks

RT @hansdeleenheer: @Justin_Lauer MS has this integrated in Win8! ... but I guess you werent waiting for this answer :-) <haha #SANchat

rogerlund

RT @LiemNguyen: MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? :)

hansdeleenheer

@Mike > this brings u to: what will be the impact of MS dedupe over SAN dedupe? #sanchat

Justin_Lauer

@johnobeto Win8 is great, but that is only one OS.  Large Enterprise run all sorts of stuff on VMware.  Dedup at array solves that. #sanchat

gminks

@rogerlund hi Roger - you joining for the last five minutes of  #SANchat

mike_davis

The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat

mike_davis

@hansdeleenheer data red at host, in SAN, in file sys, in archive, and in backup will all work together...complementary. #SANchat

hansdeleenheer

@mike_davis will it lead to dedupe on dedupe on dedupe or will the end result be the same? #sanchat

mike_davis

@Justin_Lauer dedupe in array can solve, but helps to be content aware (VMDK). Other solutions can exist in Hypervisor or agent #SANchat

gminks

hey @ISCSIking are these the sorts of questions @hansdeleenheer normally asks? LOL #SANchat

calmo

Interesting question Justin. Googling "filesystem dedupe" fills out my reading list for the next week. #SANchat

mike_davis

@hansdeleenheer keep in mind moving data in deduped form delivers big benefits too. #SANchat

gminks

We're getting close to the end of our hour, thanks @mike_davis for joining us again to have a #dedupe discussion on  #SANchat

Justin_Lauer

@mike_davis Ahhh!  Dedupe and context aware!  Now you are talking my language and it sounds like @TintriInc  #SANchat

johnobeto

RT @Mike_Davis: The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat

iSCSIKing

@gminks yes @hansdeleenheer always has great questions
 #SANChat

johnobeto

@mike_davis Agree. Anything that improves Exchange is "A Good Thing" #SANchat

gminks

@mike_davis so where will you be in the next couple of months? maybe we can continue this discussion on a future  #SANchat

hansdeleenheer

RT @mike_davis: ...moving data in deduped form delivers big benefits too. > Yep - bandwith reduction for example! #SANchat

johnobeto

@gminks @mike_davis We've got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y'all & @LiemNguyen Cheers #sanchat

mike_davis

@gminks I spend my time half in Sunnyvale, half in Austin, half on Southwest. #SANchat

hansdeleenheer

Goodnight all! #sanchat

mike_davis

@hansdeleenheer and backup window, and restore time, and replication synchronicity #SANchat

gminks

Ok everyone, time to close it out. If you didn't introduce yourself, now's a good time. Also plz let us know what you're working on #SANchat

gminks

I'm working on the Dell Storage Forum...maybe we can do this live from London next month! #dellsf12 #SANchat

iSCSIKing

@hansdeleenheer good night Hans! Thanks for joining
 #SANChat

gminks

@hansdeleenheer good night hans thank you for joining and making it so lively today #SANchat

iSCSIKing

Be sure to join us next week for the TechChat - Tuesday at 3 PM #SANChat

gminks

RT @iSCSIKing: Be sure to join us next week for the TechChat - Tuesday at 3 PM <thanks for joining today #SANchat

storageDiva

thanks @gminks @mike_davis et al - that was a great chat today
 #SANchat

web20education

#WeVideo video editing and collaboration in the cloud #edtech20: #elearning #edchat #ukedchat #sanchat #socialmedia - http://t.co/SirBuXaU

equallogic

We will have the #SANchat transcript posted soon, will tweet the link once it's ready!

web20education

At #leweb via @techcrunch Facebook To Launch A Subscribe Button For Websites http://t.co/2JB2cd3Y  #edtech20 #socialmedia #edchat #sanchat

gminks

RT @johnobeto: @gminks @mike_davis We've got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y'all & @LiemNguyen Cheers #sanchat

gminks

RT @storagediva: thanks @gminks @mike_davis et al - that was a great chat today
 #SANchat <thx for joining ms. diva!

web20education

#Crowdbooster #SocialMedia Analytics and Optimization #edtech20: #edchat #ukedchat #elemchat #leweb #sanchat #smm - http://t.co/RRLwYd0q

johnobeto

@mletschin @MBLeib Hey Mike. Missed this tweet during the #sanchat. Yes, we should

 


To post a comment login or create an account
.

Comment Reminder

Unrelated comments or requests for service will be unpublished. Please post your technical questions in the Support Forums or for direct assistance contact Dell Customer Service or Dell Technical Support.. All comments must adhere to the Dell Community Terms of Use.

  • If you have been following our SANchat theme the last couple of months you know we’ve been talking

Page 1 of 1 (1 items)