Shop
Support
Community
TechCenter
Home
Topics: All
Wikis
Forums
Blogs
Video
TechChat
Events
About
TechCenter
Dell Community
Search Options
Search Everything
Search TechCenter Chats
TechCenter
>
TechCenter Extras
>
TechCenter Chats
>
TechCenter Chats - Wiki
>
05-11-10 Dell DX Object Storage Platform
Join
Sign in
05-11-10 Dell DX Object Storage Platform
TechCenter Chats
Home
Wiki
Group and Wiki Navigation
Loading...
Search
Article
History
05-11-10 Dell DX Object Storage Platform
TechCenter Chats - Wiki
Dell DX Object Storage Platform
Technical Community - Background Reading
Object Storage
Dell DX Object Storage Platform
Object storage allows the attachment of metadata, which is additional identifying information, to unstructured data. It utilizes an enormous, flat address space for this content, removing some of the limitations and complexity of managing this data on traditional systems. Dell believes that the benefits of object storage will lead to widespread adoption of this technology for fixed digital content.
The Dell DX Object Storage Platform is designed to access, store and distribute up to billions of files or other digital content, from archiving all the way to the cloud. It uses an elegant, self-managing, future-proof and cost-effective peer-scaling architecture that is based on Dell’s award-winning x86 standards-based rack server platforms. The platform is optimized for storage and includes fully integrated software for a complete end-to-end solution. A basic configuration consists of a DX Cluster Services Node and two DX Storage Nodes with 12 hard drives each. You can start as small as 6TB of raw capacity and add additional DX Cluster Services and DX Storage Nodes to increase capacity, throughput and access, including replication between other clusters.
It will be available in May.
Dell DX Object Storage Brochure
Dell DX Object Storage Spec Sheet
Chat Transcript
Dell-JeffS
Welcome everyone to the Dell TechCenter TechTuesday Chat on Dell DX Object Storage.
Dell-JeffS
We've got a couple of experts on hand to take all your questions -- Greg White from our Product Group Storage Marketing team as well Brandon Canaday, the DX Product Manager.
Dell-JeffS
if you see a link, make sure to right mouse click on it. Otherwise the interface may boot you out.
Dell-ScottH
So what's Object Storage?
EricRR
How does this compare/compete with current offerings (Dell PowerVault/ Dell EqualLogic/ CLARiiON)?
Brandon
Thanks, Scott, for the question. Object Storage is a technology that is uniquely designed for the storage of unstructured data in the virtual era..
Brandon
Think of an object as a file, or a collection of files, enhanced by extensive metadata about the file.
needcaffeine
So it's de-duped raw storage?
Glenn_grabowski
How fast is unstructured data growing compared to things like databases?
ceri
So do we need special clients? This is not a NAS (Networked Attached Storage) type solution, right?
Brandon
By using metadata, an object provides "structure" to "unstructured" data.
abartlett
So how can Dell DX Object Storage improve security? Can it better keep track of access/changes than unstructured storage like NAS shares?
Brandon
Using metadata, customers can establish policies and employ technologies that leverage the metadata for things like data movement, storage optimization, data elimination, etc.
needcaffeine
Is the metadata cataloged in some giant XML which is replicated elsewhere?
erson
So, does the object storage have a front end towards users or do I hook it up as a backup for an existing document "handler" like Microsoft SharePoint, content managers and so on?
Brandon
Since unstructured data represents up to 80% of all data and is growing at 60+% Y/y, utilizing these features and capabilities gives customers advantages in cost and reduced complexity, from a TCO (Total Cost of Ownership) perspective..
Dell-Greg-W
Object storage is a different class of storage versus our Dell PowerVault, Dell EqualLogic & Dell/EMC product lines. Those are either file or block (or unified in some cases). Dell DX Object Storage is for unstructured data - think files, video, movies, voice -- not structured data like databases. Thus, it will complement Dell PowerVault, Dell EqualLogic & Dell/EMC product lines. It can be a lower cost 2nd or archive tier for the unstructured data in those areas.
erson
Since the DX Object Storage doesn't use RAID but is using multiple copies spread among the nodes, can I increase the number of copies depending on the type of data?
erson
I first thought of it as a distributed file system with redundancy built into the software that manages it instead of in the hardware.
needcaffeine
Is this an EMC Centera? Competitive product?
brian_p_sully
Does it automatically extract metadata from objects/files or known types (i.e. DOC, DICOM, etc.)?
erson
How many storage nodes should I have per service node? Since the storage nodes are netBooting from the service nodes, should I have at least one service node per geographical location?
ceri
The main question is, how this is presented to users? It seems to be a lot to ask of them to put stuff in the correct directory on a shared drive; if they have to use a different client then we are probably out of luck with this.
Dell-Greg-W
Ceri, object storage as a whole uses application interfaces. For Dell's DX Object Storage there are options of using HTTP or using an API (Application Programming Interface) designed for a specific application.
ceri
Thought so, thanks Greg.
Brandon
Erson, re: how many storage nodes per service node: The "cluster services node" is an out-of-band "management" node that provides various services to the cluster.
Brandon
For instance, the services node has native Proxy Server, NTP, DNS and other necessary deployment-related services to ease in cluster setup and deployment as well as to provide a centralized interface for cluster monitoring..
Dell-Greg-W
Erson, you can designate how many copies you want to keep and the storage will spread them around and monitor their health.
Brandon
Each geographic location (i.e. a local cluster) will have a minimum of one Cluster Services node.
EricRR
How is storage accessed (SCSI, iSCSI, SAS)?
ceri
If I add a new node, does the data slurp on to the new node automatically?
Brandon
A second Cluster Services node can be deployed for HA (high- availability) but is not necessary to read from or write-to the cluster.
WalidIbrahim
Do we really need a service node? Can’t we just hit the storage nodes directly with the request, and each node can have its configuration?
WalidIbrahim
Does the service node do any load balancing /fail over or any other management things rather than providing common configuration?
erson
It feels like there is very little hardware in the storage box. Was the reason for going with a normal 2U box with 12 hard drives instead of something that could hold more hard drives to keep costs down? The actual redundancy demands multiple nodes.
WalidIbrahim
If its' only providing common configuration, then why not have one storage node acting as the configuration "service" node?
farzad
Metadata is not catalogued. The nice thing about this platform is that it allows you to store the meta data with the object. All you need to keep track of are the UUID’s (Universal Unique Identifier) of the objects that you write to the cluster.
erson
Ceri, looks like you can add and subtract nodes whenever you want.
ceri
Any chance of an S3 API?
erson
Can I dock the DX Object Storage to any popular frontend right off the bat, like Sharepoint for example?
Brandon
Needcaffein and ericrr re: Centera: Centera is the most widely used object-based storage platform but Dell DX Object Storage is not Centera. Centera and Dell DX Object Storage differ in a number of ways. First, each Dell DX Object Storage node performs all functions and doesn't require different access versus storage nodes. Second, Dell DX Object Storage is offered in a cluster as small as 2 storage nodes + 1 services node (Centera requires 4 minimum)..
farzad
The storage can be accessed via HTTP or via SDK’s (Software Development Kits) that you can use for communication to the cluster. The SDK’s will then translate the calls into HTTP. All the SDK’s provide stateless interface to the cluster.
Dell-Greg-W
Ceri, yes, if you add a new node it will add to the load balancing and spread data across it to optimize performance.
erson
Walidbrahim, the storage nodes are very simple with only the necessary hardware to keep costs down. The service nodes are more like a standard server.
Dell-JeffS
I failed to introduce Farzad earlier. He's from our Advanced Engineering group, too.
erson
The storage nodes netBoots from the service nodes and keep the software in RAM, correct?
Dell-Greg-W
Erson, that’s correct. You can add or retire nodes with one click.
Dell-JeffS
We have a video showing the adding of nodes.
farzad
The service node is a required function; the node is particularly needed if you need replication services. It provides a function called Content Routing.
needcaffeine
Brandon, can a storage node become a service node? What happened to having backups? Why does it have only one service node?
Dell-Greg-W
Walidibrahim and erson, when you add storage nodes and services nodes you get more computing power. Peer-scaling.
erson
Do the storage nodes have redundant controllers or any other redundancy besides Dell PSU’s (Power Supply Upgrade)? Can I service a storage node easily if it goes down?
Dell-Greg-W
Erson, correct. Network PXE ( Preboot Execution Environment) boot for the storage nodes.
farzad
The load balancing is an inherent function within the cluster. Load balancing is not a function of the service node; it is based on the storage node intercommunication.
Brandon
More on Dell DX Object Storage versus. Centera: Centera was originally geared to the compliance storage market, whereas Dell DX Object Storage will target both compliance and cloud use cases. Dell DX Object Storage nodes can be added in increments of one node at a time. Dell DX Object Storage clusters can be asymmetrical. You can add a denser node in the future and use it along with smaller nodes.
Dell-JeffS
I've got a link to a couple of Dell DX Object Storage videos Greg-w put together. He's letting us leak them a bit early (they'll be a part of his blog tomorrow) Here's the first.
http://www.youtube.com/watch?v=chi9kdzwffq
Dell-JeffS
And the second
http://www.youtube.com/watch?v=s4r-wvako1m
gkeller
Looking at the datasheet and thread here, this is a "Cloud Based Filesystem" rather than a traditional RAID based filesystem.
erson
Where can I read more about Dell DX Object Storage than those two PDFs?
WalidIbrahim
Farzad, yes, the load balancing is inherit within the storage nodes, as well as the failover and the replication.
WalidIbrahim
Then you’re adding a whole new server "service node" for common configuration and NTP services? While the storage nodes can do that on their own!
gkeller
If that's right, my question is, Can having multiple replicas increase read performance and even be distributed between facilities in the same campus network?
erson
Gkeller, correct, though I’m not sure about talking about it as a "cloud."
Brandon
Needcaffeine, service nodes perform a different purpose altogether than the storage nodes. The storage nodes are based on Dell's PowerEdge R510 chassis and are 2U x 12 drives, optimized for storage density. The storage nodes contain the object storage software OS.
gkeller
Erson right, it's not in the clouds. But the data location is abstracted from the end user / application more than on a typical fileserver setup.
Dell-Greg-W
Erson, more on Dell DX Object Storage will be available tomorrow when it starts shipping -- online product details pages, some whitepapers, etc.
Brandon
The service node is based on the Dell PowerEdge R710 Server and has onboard storage for the content router software and for traffic enumeration in the case of wide-area replication. It is configured with RHEL 5.4 (Red Hat Enterprise Linux) and is not meant for storing objects. So, the service node cannot become a storage node. Also, there is no need for backup.
farzad
Erson, there is no RAID controller in a storage node. The redundancy happens at the object level. So nodes can come and go without notification and the background operation between nodes will recover the information based on the redundancy requirements and how many copies are required.
needcaffeine
What type of drives do these run on? SATA (Serial ATA)? Or something faster?
erson
Farzad, yes, there is no RAID in the storage nodes, but there must be a controller.
Brandon
Needcaffeine, the drives are SATA. Support for 250g, 500g, 1TB and 2TB drives.
erson
I’m guessing 3.5" is the right move if you want large, slow storage.
Brandon
Walidlbrahim, the service node is a value-add to the cluster. Its cost is typically amortized over a large capacity and does not add significantly to the total cost. However, it can significantly reduce cost.
erson
When I request a file through the frontend, how does the cluster figure out where the closest (or fastest) copy of the file is in relation to my location?
farzad
Erson, there is a generic SAS/SATA I/O controller that does direct I/O to the drives. If the failure of such a controller results in loss of communication with all the drives in the system, the node will be marked off and the contents are automatically recovered within the cluster.
WalidIbrahim
Also, for the storage node, are the hard drive hot swappable? Do I need to configure anything if I’m extending my storage with a new hard drive?
needcaffeine
What’s the storage limit to a service node? Or at what point would you create a new cluster?
erson
Farzad, it will then replicate another set of copies of the files that it lost. When I re-attach that downed storage node, will it remove those new copies that were made to replace the downed storage node?
Brandon
Here are the cluster service node advantages: Native NTP (Network Time Protocol), DNS, enables PXE boot for all added nodes, enabling essentially plug-and-play expansion of clusters. Also includes native content routing service so that a local cluster can wide-area-replicate to a remote cluster at no added cost. Also enables application of firmware across an entire cluster and manages all cluster licenses in a central location, which enables licenses to be easily moved from cluster to cluster as customer topologies change.
erson
Walid, yes, they are swappable.
gkeller
So can clients (Microsoft Windows or Linux) mount this filesystem without special clients, or is there a translator needed for that?
erson
Needcaffeine, 12 drives with the largest right now being 2TB.
erson
Seagate is launching 3TB Constellation 7200 RPM SAS drives at the end of this year. So we could probably see those next year in the Dell DX Object Storage nodes.
Brandon
Walidlbrahim: hard drives are hot swappable. If adding drives or nodes, the cluster automatically takes care of redistribution of objects across the new capacity.
Brandon
Erson, yes, we plan to support 3TB drives when Dell launches them.
gkeller
My third question: Can the Object Storage network be InfiniBand?
Brandon
Erson, in addition to the 2U x 12 drive storage node, we have plans to launch a 1U x 4 drive node in the future.
WalidIbrahim
How would a cluster communicate with the other remote cluster? Would that be over HTTP, or more precisely SCSP (SNIA Certified Storage Professional)?
erson
How can I as an administrator classify how many copies there should be of a certain type of data? Do I use the metadata to group the files into categories with different redundancy settings?
gkeller
And to follow up on erson, can the user specify non-default copies?
erson
Gkeller, yes, they can. The default seems to be two copies of every object.
farzad
Gkeller and erson, regarding read performance, I should say it depends. You can define geographic regions for your replicas. On reads, the information is read from the geographically closest location as possible. In those cases, yes. As for cloud usage, the system definitely has characteristics needed for cloud storage.
Brandon
Gkeller: re: infiniBand. Storage Nodes sit on a private multi-cast network. Technically, InfiniBand can be used, but Dell isn't supporting InfiniBand as part of our initial launch.
Dell-Greg-W
Erson, you can specify at the object level how many copies of data you want to keep and for how long. The default is two, but you can go up from there
gkeller
Brandon, as long as we stay under 500 nodes we won't kill IB multicast groups, and should hope it's at least technically feasible.
needcaffeine
I’m not sure why you'd bother with InfiniBand on such slow drives.
erson
But how do I group the objects to set those non-default redundancy settings? By using the metadata or is there another way?
gkeller
So we would likely set up multiple "geographies" in the same data center, and get different nodes to be in different "geographies" manually to balance loads.
erson
Brandon, nice with more storage node options.
gkeller
Needcaffeine, lots of nodes reading the same data, so it's generally cached after the first reader.
farzad
Walidibrahim, the remote cluster communication is based on a publisher/subscriber model. This allows the remote cluster to subscribe to the local cluster and pull information that is required to be replicated/changed. So in a sense, the communication is based on cluster intercommunication.
erson
The Dell EqualLogic PS6500E 4U case with 48 hard drives would be a good choice for a big object storage cluster. At least if rack space is valuable for you.
gkeller
Can you setup a synchronization hierarchy for geographically dispersed storage nodes, so that some files are higher priority than others? Maybe based on naming conventions or something like that.
WalidIbrahim
How do you match the object UUID’s to the objects stored?
farzad
You can also replicate information between sub-clusters (like when you setup locations between buildings). In that case it is SCSP.
needcaffeine
Is data anchored from the original source like with EMC DiskXtender and is security controlled that way?
Brandon
Gkeller, we do not have a synchronization hierarchy. Those are part of our ask for the next launch.
erson
Feels like I can't just get the hardware and get going. Is there any software connection that works right away or do I need to purchase a solution from an ISV (Independent Software Vender) to hook the object storage to my Microsoft SharePoint, enterprise content management system using the API’s?
erson
I'm assuming the frontend must support the usage of metadata tags for the documents/files that you intend to store.
farzad
Needcaffeine, the UUID is not solely generated based on the content. But there is an integrity seal that is associated with every object to guarantee objects integrity before the UUID is generated.
Brandon
Erson, the hardware can be used headless leveraging the standard HTTP interface and we've published SDK's (C++, Java, Python) so that application interfaces can be written. We also are working with an ecosystem of ISV partners, three of which will have native API's as of tomorrow: They are: StoredIQ (eDiscovery), Mimosa Systems /Iron Mountain (email archive) and TeraMedica (healthcare archive). We have another 9 ISV's signed up and are pursuing dozens more.
Brandon
Erson, Dell DX Object Storage supports standard metadata and custom metadata. The application will write that metadata via the HTTP header.
WorkingHardInIt
Could your repost the YouTube links?
Dell-JeffS
http://www.youtube.com/watch?v=chi9kdzwffq
Dell-JeffS
http://www.youtube.com/watch?v=s4r-wvako1m
WalidIbrahim
I was wondering how you match the object UUID’s to the objects? What kind of indexing is used?
farzad
The UUID matching is done on each storage node. The node has a reference to its local UUID index, so it can quickly be looked up. Once the matching is done, the node can bid to win the get call. The node that wins the bid services getting the matching object to that UUID.
Dell-Greg-W
Thanks to everyone for all the great questions. Maybe we need to do this again in a couple weeks once you've digested all the stuff that's up tomorrow
Dell-JeffS
Tomorrow look for Greg's blog on
http://en.community.dell.com
.
Dell-JeffS
Next week we've got a recap of this week’s Citrix Synergy event and associated announcements. Kong should have lots to report for that one
WalidIbrahim
I do have more questions on the UUID matching thing. So each storage node has its own index in memory? Is there any persisting of the index on the hard drives? Or in the service node?
farzad
Walidibrahim, in memory of each node only.
WalidIbrahim
Will the service node determine which node to service the request "after bidding," i.e., who determines the winner node?
WalidIbrahim
If in memory, then what is the recovery in case of power failure? Does a start up service iterate over the storage to parse out, or maybe re-generate, the UUID’s from the objects stored?
farzad
The service node is a passive observer; it has no role in determining who wins the bids. All participating nodes in the bid decide among themselves who wins the bid.
WorkingHardInIt
Any one planning on using this soon?
farzad
On power up restart, the node takes inventory. Based on the comparison with other nodes, a decision is made on who has the latest data or does the node join the cluster as a "new" node, which means its old data is no longer needed since the cluster has recovered it.
farzad
Thank you all for your participation.