Posted on behalf of Alison Krause, who works in Dell's Storage Product Group, Social Media & Communications.
With all of the excitement surrounding our acquisition of Ocarina last year, we decided to host a SANchat today that was all things compression. This was a part 1 chat, with part 2 being all things dedupe coming up December 7th. As a reminder, #SANchat is a monthly chat hosted by Dell Storage. For more background on #SANchat see this blog post over on Dell Compellent’s Around the Block blog.
I’d like to extend a huge thanks to Mike Davis (@mike_davis) and Mohammed Farhat (@mfarhat) for joining as us our experts. We had a great conversation with a few customers. We discussed whether or not we can look forward to native block compression/dedupe on EqualLogic in the future, why compression is important, when compression is implemented, and more. Mike also provided a link to a very helpful book titled “Data Compression Explained.”
You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Don’t forget to join us on Wednesday, December 7th to talk about deduplication!
LiemNguyen Have you followed @mike_davis and @mfarhat? They'll be talking about data #compression in 45 minutes on #SANchatiSCSIKing 10 minutes until our chat on Compression - Join the discussion. #SANChatLiemNguyen @iSCSIKing Quick, how many puns can you think of in 10 minutes. I'll start: I'm aware I'm compressed for time. #SANchatDellTechDE RT @iscsiking: 10 minutes until our chat on Compression - Join the discussion. #SANChatRafaelKnuth RT @iscsiking: 10 minutes until our chat on Compression - Join the discussion. #SANChatiSCSIKing "@LiemNguyen That is a very compressed time line to think of puns #SANChat"LiemNguyen @iSCSIKing Oops, looks like we've reduced our time to nothing #SANchatLiemNguyen OK, ready for SANchat to begin now...@mike_davis and @mfarhat are you out there? #SANchatiSCSIKing @LiemNguyen Looks like it ... #SANChatiSCSIKing RT @LiemNguyen: OK, ready for SANchat to begin now...@mike_davis and @mfarhat are you out there? #SANChatmattjamesdavies @LiemNguyen @iSCSIKing You need to decompress some time guys! that will help! #SANchatAlisonatDell lets talk compression!! what questions do you have for our experts? #SANchatiSCSIKing RT @AlisonatDell: lets talk compression!! what questions do you have for our experts? #SANChatMike_Davis "I'm online. Just having a conversation with a customer about h264 compression. #SANchat"iSCSIKing RT @Mike_Davis: Im online. Just having a conversation with a customer about h264 compression. <Awesome! #SANChatLiemNguyen Hey @mike_davis thanks for joining us! Why don't you start with telling everyone what you do for a living. #SANchatLiemNguyen "@mike_davis I'll ask you about the customer in a second :) #SANchat"Mike_Davis I managed Marketing and product planning for Ocarina Networks, acquired 15 months ago by Dell. The ole team is still working hard. #SANchatDellCompellent #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow the conversation here: http://t.co/SEBG7RUAdell_storage RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow the conversation here: http://t.co/SEBG7RUAJeffHengesbach Can we look fwd to native block comp/dedupe on Equallogic in the future? #sanchatiSCSIKing RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow us here: http://t.co/vD9lHIcC #SANChatLiemNguyen @mike_davis Glad to hear that, and full disclosure, I came back to Dell via the #Compellent acquisition. @mfarhat what's your role #SANchatMike_Davis @JeffHengesbach I think we're on record that all Dell platforms will include data reduction. Too early to specify dedupe vs compr. #SANchatMike_Davis @JeffHengesbach Eql data reduct will first arrive with NAS, then at the block level later. #SANchatmfarhat Hi Liem, glad to be here, I am a Product Manager on our DX Object Storage Platform, including our DX6000G Storage Compression Node #SANchatiSCSIKing RT @Mike_Davis: @JeffHengesbach Eql data reduct will first arrive with NAS, then at the block level later. #SANChatLiemNguyen @mfarhat @mike_davis First time we've discussed compression in detail here. Let's start w/ basics: Why is compression important? #SANchatiSCSIKing Welcome @mfarhat glad to have you with us today #SANChatDennisMSmith RT @iSCSIKing: RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression starting now! http://t.co/or2hgRRD #SANChatMike_Davis Compression importance: not all data is easily deduped, not all data sets have redundancy to take advantage of.... #SANchatInformaZen RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression starting now! http://t.co/Dpzl9vM6 #SANchatMike_Davis ...so the data reduction solution needs to be tailored to the workflow. Vertical apps tend to benefit more from compression #SANchatMike_Davis Dedupe is fantastic for backup workflows (full-full-full...) but does nothing against precompressed files (video, images, MSoffice) #SANchatbdwill RT @JeffHengesbach: Can we look fwd to native block comp/dedupe on Equallogic in the future? #sanchat | PLEASE!Mike_Davis dedupe; not an ounce of benefit for a video archive. Applying generic (eg LZ) compression won't work either. #SANchatmfarhat Compression is an important element of storage management, allowing significant efficiencies in physical storage utilization #SANchatMike_Davis To compress video further we developed a specialized set of algorithms that understand the formats (eg EXR, raw/DV, AVI, etc). #SANchatmfarhat The DX6000G Storage Compression Node allows the tiering of data through multiple compression options.. #SANchatMike_Davis "@mfarhat What are the attributes a customer can use for compression policies? #SANchat"mfarhat @Mike_Davis customers using DX Object Storage with compression can set policies based on file type or age (life point).. #SANchatMike_Davis Are there any customers online who have deployed SW compressors on server/host (external to app) to shrink data? #SANchatmfarhat These policies enable tiering of data through multiple, optimized, compressors. For example, customers may choose.. #SANchatLiemNguyen RT @Mike_Davis: Are there any customers online who have deployed SW compressors on server/host (external to app) to shrink data? #SANchatmfarhat ..to leave certain file types uncompressed for a period of time, and then apply a compressor optimized for speed of access.. #SANchatMike_Davis h.264 (MPEG4 variants) are a tough one. We found ways to shrink, but at a high cost to CPU, but small savings can have huge payoff #SANchatmfarhat ..and at a later time still, apply a compressor that deliver maximum space savings for dormant data. #SANchatMike_Davis Re bloak/array-based data-reduction, it's harder to apply interesting art...data is generall opaque. So we run inline naive algor. #SANchatMike_Davis ...but choice of algor also constrained by CPU/RAM resource available in the array. We don't want to cause DOS attack on IO! #SANchatthe_saltworks thanks for the heads up @LiemNguyen I had no idea I was missing a compression discussion on #SANchatthe_saltworks @Mike_Davis there are a cpl ways to further reduce storage consumed by multimedia 1) More lossy compression & 2) single instancing #SANchatmfarhat welcome @the_saltworks, what interests you in compression? #SANchatMike_Davis System resources are an interesting variable on design...dedupe is ram IO heavy, compr is CPU heavy. Optimizing for both=hard. #SANchatthe_saltworks @mfarhat what interests me most w/compression is the misinformation about it. I'm quite familiar w/ it in all its forms #SANchatMike_Davis @the_saltworks yep, SIS can help, but most video repositories don't store files rendantly...maybe in home-shares. #SANchatMike_Davis @the_saltworks ...lossy is interesting. Have some work in that area. lots of non-visual info that can be optimized in these files.. #SANchatthe_saltworks @mfarhat I prefer to speak of data reduction in terms of inter- and intra-file techniques. #SANchatMike_Davis @the_saltworks block dedupe is both inter and intra, so we way dedupe eliminates redundancy, compr uses math to predict patterns. #SANchatmfarhat thanks @the_saltworks, can we dispel some of the misinformation about compression today? what are some common myths you hear? #SANchatthe_saltworks @Mike_Davis It turns out quite a lot of MM content can be (and is) distributed in lesser quality/format than the original content #SANChatMike_Davis @the_saltworks the video workflow is complex, and transcoding is part of day-to-day life using special tools at workflow level #SANchatthe_saltworks @Mike_Davis yes, in fact one of the beauties of modern dedupe is that it successfully blended the two #SANchatthe_saltworks @mfarhat one myth is that you can't further compress already compressed formats (e.g. JPG). Of course you can via lower quality #SANchatMike_Davis DV/raw capture formats are easy to compress in stor, h264/VP7 distribution formats are hard to compress further...very efficient. #SANchatMike_Davis @the_saltworks We have JPG lossless compr that will deliver 30-60% savings, partly because it knows the file format. #SANchatthe_saltworks @mfarhat and for those who cannot (or simply don't want to) compromise quality…well, I was blindsided this year with another way #SANchatLiemNguyen "@mike_davis So when would you not want to compress data? #SANchat"the_saltworks By far one of the most interesting and promising methods of intrafile compression I've seen in recent yrs comes from @balesio #SANchatMike_Davis @LiemNguyen Compression has overhead, so anything transactionally sensitive can feel pain. DB for example. #SANchatthe_saltworks That a biz has found a smarter more compact way to write existing file formats threw me a curveball. I didn't think it was possible #SANchatMike_Davis @the_saltworks There's no magic here; applying lossy compression/resize to MSoffice docs. #SANchatLiemNguyen @mike_davis @mfarhat Speaking of docs, any good resources you can point to for more info? #SANchatthe_saltworks @Mike_Davis yep, if only all formats were highly efficient. Sadly most SW devs create crappy inefficient file formats #SANchatmfarhat @the_saltworks, there is always room for higher efficiency :) #SANchatthe_saltworks @Mike_Davis you meant lossless right? I'd hate to see lossy office document compression. ;-) #SANchatMike_Davis @the_saltworks The main problem here is users indiscriminantly pasting 2MB JPEG images into their PPT. This is where Balesio wins. #SANchatrootwyrm What about ways of detecting whether data is/should be compressible at time of write prior to disk commit? #SANchatMike_Davis @the_saltworks Re Balesio, it is lossy. the images are being resized. "visually lossless"=lossy. #SANchatmfarhat @rootwyrm compression is almost always implemented as a post-process operation, as is the case with the DX Storage Compression Node #SANchatrootwyrm @mfarhat Exactly. I suppose the question is: why? Isn't it faster to do a boolean operation on the data while it's in write cache? #SANchatMike_Davis Not much research in compression last 20yrs. Our Chief Scientist drafted a new book here: http://t.co/Dj9ODneN #SANchatmfarhat @rootwyrm, write speeds are not impacted -- files are compressed (or not) based on user defined policies and timing #SANchatiSCSIKing Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChatrootwyrm @mfarhat Sure - I'm wondering why folks aren't doing avoidance earlier in the line though. Is there significant CPU impact there? #SANchatDellTechCenter RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChatDennisMSmith RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChatLiemNguyen RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANchatMike_Davis @rootwyrm avoidance? #SANchatLiemNguyen And join @mike_davis and @mfarhat next month, Dec. 7, for a followup #SANChat on #deduplication! #SANchatthe_saltworks @rootwyrm would be an interesting solution. Would be better if SW devs simply learned to write more efficient formats #SANchatrootwyrm @mike_davis Yup; tag data early while in write cache and then avoid the post-write check for compressibility. #SANchatmfarhat @rootwyrm, the cost of write I/O is generally higher than the cost of the compressible space, left uncompressed for a short period #SANchatAlisonatDell huge thank you to @mike_davis and @mfarhat for the compression chat! looking forward to 12/7 when we talk dedupe! #SANchatrootwyrm @the_saltworks Write.. more.. efficient formats? But, how can you not love XML with embedded binary? ;) #SANchatiSCSIKing RT @LiemNguyen: And join @mike_davis and @mfarhat next month, Dec. 7, for a followup #SANChat on #deduplication! #SANChatthe_saltworks btw, on a briefing at the moment. I will take a look at this stream again shortly and comment #SANchatrootwyrm @mfarhat Sure, but from an op standpoint, at post-proc you're doing disk read, cache load, CPU test, metadata write, correct? #SANchatLiemNguyen @the_saltworks Thanks for joining us on #SANchat today!mfarhat @rootwyrm, the operations needed to determine which data to compress and when are the same whether they are done at write or later #SANchatmfarhat @rootwyrm, post-process compression allows those operations to be done at an optimal time, without impacting system performance #SANchatrootwyrm @mfarhat Ah, so to an extent it does come down to compressibility testing one way or the other, and not always headers and such. #SANchatmfarhat @rootwyrm DX compression is very flexible, users select which file types to compress and when, and with which compressors to do so #SANchatmfarhat @liemNguyen @the_saltworks @rootwyrm thank for participating today, looking forward to continuing the discussion #SANchat
To post a comment login or create an account.
Unrelated comments or requests for service will be unpublished. Please post your technical questions in the Support Forums or for direct assistance contact Dell Customer Service or Dell Technical Support.. All comments must adhere to the Dell Community Terms of Use.
Posted on behalf of Alison Krause, who works in Dell's Storage Product Group, Social Media &
If you have been following our SANchat theme the last couple of months you know we’ve been talking