SDRS: would 8ms and 65% space be a tad aggressive?

Deathmage · December 2014

Hello,

As the topic states I'm messing around with SDRS settings and I'd like to keep my datastores evenly balanced since to me a loaded datastore just has to spin more to seek the data on a NAS/SAN so if I keep the SDRS moving the data across 3 datastores then in turn it should make the SDRS datastore cluster perform better and more evenly.

But would applying a setting of 65% space allocated and 8ms latency be a bit too aggressive? (right now the latency on the array is less than 4 ms) - right now it takes about 3 minutes to move a power-on VM and 1 minute to move a powered-off VM (NAS has Velco-Raptor 10k drives in it); In this instance I'm wondering if the Jumbo FRames (set at a MTU of 9000) is even playing a part since the data is being transferred between datastores which reside on the same NAS so these transfer could all be happening logically on the same box and not even crossing the iSCSI fabric. - Never really thought this much into it before, lol!!!

Thoughts?

I'd like to hearing a discussion on the benefits and cons of messing around with those two setting in SDRS.

Thanks!!!

Essendon · December 2014

Well, it depends!

Too aggressive and you have too many SvMotions happening at unintended times (of high workloads), too limp and you have overcommitted datastores. My recommendation would be to talk to your storage vendor/SAN guys to determine these values. Usually, the default values of 85% and 5ms are okay, but again go with the recommendations. Also prior to 5.5, SRM wouldnt work with SDRS, so gotta take that into consideration too. SDRS is cool and all, but make sure you need it before you enable it.

And you cant forget auto-tiering. No one explains auto-tiering and SRDS inter-operability better than Frank Denneman.

Deathmage · December 2014

That's good to know, I'll remember to ask that to our storage vendors.

in the in-term, my logic is if you keep the datastores balanced, like I just set mine to 50%, then when you read/write to the datastore the access times will be more even and faster than if the datastores were loaded to the max.

I can see in a production world if you set SDRS to aggressive it could move them for even the slightest benefit in performance and that could make the storage fabric bog down if there is no uplink teaming going on. In the test lab I have uplink teaming on the NAS and on the HP Procurve switch into each vStorage nic's, so I'm sure that's helping with this vStorage vMotion migration at-the-moment.

I'm just curious if the same logical of keeping the datastores with free space still equates to performance gains even though the datastores in my environment and/or in a production environment normally always reside on a single SAN (however not out of the realm of possibilities, I've never seen a multi-SAN storage fabric before) so can anyone shed some light on that discussion?

dave330i · December 2014

If you have a decent storage, then SDRS should be set to manual and have storage manage congestion. Two reasons for this:

1. Storage is better at managing itself vs. vCenter trying to manage it.

2. Storage has a holistic view while vCenter is only aware of datastore provisioned to it.

Essendon · December 2014

There's more to SDRS than meets the eye. First, it only checks for imbalance every 8 hours but does look at past historical data when making recommendations and/or moving workloads around if fully automated. So it may not move vmdk's around when you expect it to. Second, you gotta ensure you are replicating all datastores that are in the SDRS realm, otherwise SRM wont be happy and your recovery plans will be out. Third, faster disks have lower latency, so ensure you select the right latency threshold value depending on the disks backing the datastores. Fourth, it's generally recommended to enable SDRS but dont monitor for latency.

You are correct about faster access times with lightly loaded datastores, but there's more to this than just datastores. Access times depend greatly on SAN front-end port load distribution, the disks backing the datastores and their RAID levels, the queue depths at the LUN level, the adapters level and all the way down to the queue on the disks themselves.

So you are right about free space and performance gains, but think about it in a prod environment. There's a limit of 256 datastores/host so you will end up provisioning more datastores than you have to. Additionally and more importantly though, you need to watch out for the adapter queue. Say you have QLogic HBA's with a queue of 4096. QLogic HBA's have a queue depth of 64, that limits you to 64 LUNs. Get the picture?

And the default values are 80% and 15ms, not what I wrote first. Sorry.

Essendon · December 2014

Yep, Dave's right. Just create a datastore cluster so Bob the VM builder doesnt have to worry about placing the VM on a particular datastore, set it to manual and you approve any recommended SvMotions. Let the array manage itself.

Deathmage · December 2014

Thanks for the feedback, I learned a few things.

joelsfood · December 2014

I definitely agree with what has been said here.

As an addition, dont' forget to check your VCOpsManager reports weekly/monthly (depending on what a glutton for punishment you are), to check for VMs and datastores that show latency issues, and use that information to help do any tuning required.

Essendon · December 2014

Essendon wrote: »

You are right about free space and performance gains, but think about it in a prod environment. There's a limit of 256 datastores/host so you will end up provisioning more datastores than you have to. Additionally and more importantly though, you need to watch out for the adapter queue. Say you have QLogic HBA's with a queue of 4096. QLogic HBA's have a queue depth of 64, that limits you to 64 LUNs. Get the picture?

Wanted to clarify here after I re-read this. You are not actually limited to 64 LUNs, ESXi wont prevent you from going past 64. But a nasty situation will arise with IO retries being issued to workloads because of queue-full conditions (you want to be on a holiday when this happens!).

jibbajabba · December 2014

dave330i wrote: »

If you have a decent storage, then SDRS should be set to manual and have storage manage congestion.

You kinda need a decent storage also when using Automatic SDRS, especially when using Thin-Provisioned LUNs.

Depending on the amount of migrations, you may end up with a lot of wasted space and I am not sure if any SAN automaticially reclaim space from those LUNs.

A customer was using a Hitachi HUS-VM with quite aggressive SDRS settings. The HUS didn't reclaim automaticially and they insisted on those SDRS settings for no reason, none I agreed with anyway.

As a result you end up with administrative overhead to keep an eye on the pools and schedule a maintenance for reclaim jobs (which can trash your performance while running).

joelsfood · December 2014

Nimble (and any others that support TRIM/unmap/VAAI) will reclaim the space, but VAAI will have to be enabled. Just came across that setting on a clients system, enabled, reclaimed about 3.5 tb. Not bad for 2x16tb SANs

jibbajabba · January 2015

Whilst the HUS supports UNMAP / VAAI, it didn't do it automaticially. I suppose, given that the performance goes downhill during reclaim process, you don't necessarilly want it to go automaticially. Ah decisions ...

joelsfood · January 2015

Yeah, I was pleasantly surprised with how well Nimble kept the unmap to a nonimpacting level.

Deathmage · January 2015

See.....this is why I love asking question on here, so many smart people with a wealth of experience over me. I always know I'll learn something.

SDRS: would 8ms and 65% space be a tad aggressive?

Comments