Design Question: Max In-Use Best Practices

Storage

Storage
Information and ideas on Dell storage solutions, including DAS, NAS, SAN and backup.

Design Question: Max In-Use Best Practices

This question is answered

I am new to EqualLogic, but have a very good background in storage (multi-protocol SAN/NAS, particularly NetApp & EMC).

That said, I see the settings for "Maximun In-Use Space" on a volume that is thin provisioned.  I am looking at a system that was already configured, and this is what they did:

  • Created a bunch of volumes and made them much larger than they needed to be (overcommitted the aggregate or pool)
  • Set the Maximum In-Use space to about 50% of the volume size (because in reality that is all the pool has)
  • Mounted these volumes to ESXi as datastores

Now what happens is this (and if you ask me this is insane, if not stupid): VMWare administrator uses vSphere Client to review datastores and sees that "Datastore X" has 495GB of free space.  It is a 1TB datastore, and so naturally when they are asked to deploy new virtual machines they place the VMs in this datastore.  The reality is, there is only 5GB free (due to Max In-Use set at 50%).  The VMWare administrator creates VMs and the datastore is kicked offline by Dell.  Everyone panics, calls the storage administrator, who then raises the Max In-Use Space setting for the volume because the pool has plenty of free space (other volumes are using less than 50%).

To me, the obvious thing to do here is to set "Max In-Use" to 100% of the volume.  The problem is when a previous storage administrator decided to over-commit the pool.  So now I get warnings about Max In-Use exceeds pool capacity (no duh).

My understanding of designing thin provisioning does not include overcommitting the volumes.  In fact, all it should be is the sharing of all the volume white spaces.  By sharing this way, the 495GB in the volume I spoke of earlier actually belongs to ALL the volumes should they need it.  Having a "Max In-Use" setting that kicks a LUN volume offline seems backwards, in my opinion, and counter-acts any thin provisioning.

So I am ready to hear the opinion on this....here is how I think it should have been done.

Set the volume sizes to what you think you actually need.  Thin provision the volume, set Max-In Use to 100% (heck that setting should be able to go to OVER 100% to be true thin provisioning right?).  The goal of SAN is to never crowd a LUN....let the volumes auto-grow if they have to.

  • So how does this work with EqualLogic?  Please tell me our system was not configured right. :)
  • And is there an auto-grow feature? 
  • And can I set the Max In-Use to 120%?  I tried and it would only go to 95% I think (or 90?).

 

Verified Answer
  • Over provisioning storage in my personal opinion isn't a good idea.  It's only useful when growing a volume isn't supported and you know you're getting additional storage 'soon'.   Which isn't the case with ESX v4 and above.

    My thought would be create a new smaller volume and move VMs there and start removing the large over provisioned volumes.   Then you can be sure that you won't run out of space and have volumes go offline.  Eventually, you will run out of space.

    You are correct, there's no way to make the ESX admins realize they don't have the space they think they have.

    In the future EQL will support SCSI UNMAP to freed pages will go back to the free space pool as files are deleted.

    Which will make thin provisioning a more useful feature.   (I.e. more free space for snapshot reserve or replication)

    Regards,

    -don

All Replies
  • A couple other comments:

    Over-committing file level volumes is one thing, but over-committing block level is another (in storage design-land).  Yeah it's cool that I can mount 5 9TB LUNs when I only have a 10TB pool.  But so what?  That is no help right?  Why should I expose all that to the server admins?  

    What would be more cool, is if I can auto-grow volumes containing LUNs, and grow the LUN manually when necessary.  Why auto-grow a volume?  Because of snapshots.  Snapshots could potentially crowd LUNs and kick them offline too.

    From a management perspective, I can't get my arms around managing 5 x 9TB LUNs when there is no way to make sure the server admins realize they do not actually HAVE 9TB.  Right?

  • Over provisioning storage in my personal opinion isn't a good idea.  It's only useful when growing a volume isn't supported and you know you're getting additional storage 'soon'.   Which isn't the case with ESX v4 and above.

    My thought would be create a new smaller volume and move VMs there and start removing the large over provisioned volumes.   Then you can be sure that you won't run out of space and have volumes go offline.  Eventually, you will run out of space.

    You are correct, there's no way to make the ESX admins realize they don't have the space they think they have.

    In the future EQL will support SCSI UNMAP to freed pages will go back to the free space pool as files are deleted.

    Which will make thin provisioning a more useful feature.   (I.e. more free space for snapshot reserve or replication)

    Regards,

    -don

  • Good, I don't feel so crazy now.  The current plan is exactly that....attempting to migrate toward a "right sized" environment, and then using thin provisioning to expose white space should we grow faster than we can buy, or should our right sizing on a couple volumes be off a bit.

    I think Dell has a decent solution here.  It would be nice if this product line was migrated toward multi-protocol SAN/NAS and support file-level access too (CIFS/NFS).  I would also love to see some deduplucation.

  • Regarding VMWare administrators, I think if we set Max In-Use to 100% (or near it) then we are pretty safe....I think the only time Max In-Use should be less than 100% is in an over-commited situation like mine (although ironically we have 12-15% free on the pool - there is one really huge volume created that is using only 15% of its space).

    If I am understanding correctly though, I can add another member to the pool and span the volume out?  Because that would make sense, and Max In-Use would be perfect for this.  Basically, protect a highly utilized SAN from having a single volume administrator taking the entire pool offline until a new member is added.  I would rather this person disconnect a single volume than an entire pool of them. :)

  • Yes, you can add another member to that pool and the data will be stripped between them automatically.

    re: 100%.  Yes, assuming you are not overprovisioned, setting the max value to 100% is fine.  Which is the default.  It's the "in use" warning threshold that usually needs adjusting.   Just to stop annoying alerts.  ;-)

    You also need to leave about 5-10% of the member unallocated for best performance, especially when using snapshots and replication.   That's in the FW release notes.

    I would still suggest creating a smaller volume and moving that VMs from that large one over, then delete the huge one.

    With ESX, more, smaller volumes outperform few (or one) huge one.

    re: CIFS.  I doubt that will happen, since a SAN array should focus on serving data.  There's already a fault tolerant, scalable NAS head available for the EQL arrays.  It's called the 7500.  It's fully integrated into the EQL GUI.  

    re: deduplication is already on the roadmap.  You're sales rep/reseller can get you more specific info on future features.

    -don

  • Very nice, cool.

    I won't debate with you the benefits of multi-protocal storage.  I am sure you get it all the time from NetApp fan-boys (like myself!) :)

    Thanks for the info on the NAS head!!!!  I will check that out for sure!