Thin provision vSphere datastore at SAN level?

langenoir · October 2015

Is there a good reason to thin provision a vSphere Datastore at the SAN level? I was always under the impression that it's a good idea to Thick provision the SAN volumes and then decide if you want thin or thick VMs within that DS because of better performance, but really more, too much promised reservation and not keeping an eye on it could get you into trouble.

kriscamaro68 · October 2015

I have always thick provisioned the data stores for both VMWare and Hyper-V on our SAN. As far as filling it up and not keeping an eye on it, I use de-dup and have notifications turned on to email me if the volume is getting past a certain point. With de-dup things really shouldn't get to bad. Also I never put any type of data on the VM's the data is always an iscsi volume attached to the VM off the SAN.

langenoir · October 2015

I have too, but I just noticed the guys at my new job using thin provisioning on the SAN volumes and thin provisioning on the VMs int he datastores. Just trying to see if I'm crazy or if this is an issue I should say something about.

Deathmage · October 2015

truly depends if your SAN supports thick vs thin provisioning. Some SAN's just use one massive LUN or see all the drives as one drive in which case you can make 5 different LUN's to access in VMware but as far as the array is concerned it's still one LUN while others you can see benefits of splits the LUNS up and control the IO.

Like on a Equalogic, my thought too was to make multiple LUN's and give a higher priority to a LUN over others with shares but Equalogics have no effect on shares since it just sees all the LUN's as one massive LUN, the smaller luns are essential just folders in a solo partition. You can't do crap to control IO since it still flows over the same IQN.....

Example: most people split datastores on a SQL server so that say SQL sits on one LUN and logs sit on another LUN, this way you have aggregated LUN's on two different parts of a SAN, in a Equalogic it still sees it as one massive LUN so in the case of a Equalogic it's truly pointless to split it up. They are however on different partitions but on the backend on the SAN they are still seen as one massive LUN.

Don't get me wrong the performance of a Equalogic is awesome but it just can't split IO traffic very well, hopefully other SAN's have more control over IO.

Best bet, call up the storage vendor and see what they specify, SNIA tried to standardize things but vendors don't always comply. An Equalogic Array may work different than a HP array or a EMC array. But the same ideas of basic array management still applies but again vendors just have different features so it can make things interesting....

As for running out of space, I know some storage guys like using thin provisioning but I'd rather not worry about the performance crutch that thin provides, I personally do it thick when possible but not always only when I know the LUN are pretty much static and just manage storage accordingly. Example all the O/S partition are on static 'thick' LUN's with a 1.5 growth factor for updates. All application are on separate partitions so the Windows O/S is as pristine as possible, I've found this truly helps with a servers performance to keep C completely clear of 'bloatware' software in Program Files.

Like I use a factor of 1.5 on storage, so the current size of a VM's in the LUN times 1.5 and that's the space I give the LUN, like say a LUN for a O/S partition that's thick. LUN's that are dynamic like say a file server, SQL Logs/databases or a terminal server I use a LUN on the SAN as thin and then just make the datastore thick in VMware, this way when the VM does need space later on I can easily add it from the array.

But also thin/thick on the SAN I don't over provision, I've seen zero benefit on the performance side of things to over provisioning cause lets face it us IT people forget I'd rather not run into a 'OOO ****" moment some Monday morning cause SQL logs ran out of space...if the partition ran out of space because the space got eaten up it's because it was from a set limit that can be easily increased by a factor of 1.5 from free space on the array. Don't ask about the 1.5 factor I just use it for my own reasons after doing some math calculations.

OctalDump · October 2015

I think there are too many variables to give a good answer to this. There are performance considerations, availability considerations, technical considerations, platform specific, but also the political ones. If you control the SAN and virtualisation provisioning it's different to where storage is managed by a different team. Different again where end users are being billed.

Probably the preferable situation is to have datastores thick provisioned on the SAN, and then thin/thick provision the VMs as needed. I don't have the experience at the really pointy end of this, but I imagine products are available that can better integrate provisioning with the SAN from VSphere.

dave330i · October 2015

If you want to be storage efficient, then thin provisioning is useful (especially those expensive ssd). Usually, it's better to thin at the storage level because then the storage admin has a holistic view from a single pane of glass.

Lexluethar · October 2015

We use thick lazy zeroed on the vmware side and thin on the SAN side (we use Dell Compellent so everything is technically thin until written to).

One main reason for the thick lazy on the vmware side is better track and management of storage usage. I've heard horror stories of people using thin, not keeping an eye on it and a datastore getting filled up.

To recoup space simply use the unmap command in vmware to reclaim space as VM's are removed over time.

Deathmage · October 2015

Lexluethar wrote: »

To recoup space simply use the unmap command in vmware to reclaim space as VM's are removed over time.

oooo koodos, always wondered how to do that.... googling this now

joelsfood · October 2015

Deathmage has already hit the main rule, which is to do what your storage vendor recommends. IE, with Netapp, particular aggregates, etc, I thick provision on disk, dont' oversubscribe, and let dedupe give me extra space on top of that. Nimble, it's shared spindles, etc, I thin provision and oversubscribe and let compression give me more space. Both are as recommended by the vendor in question. and of course both assume proper design, installation of the applicable vendor plugins, etc.

VAAI_UNMAP is definitely useful for SAN storage that thin provisions/compresses zeroes. Nimble has a quick writeup on how to do it if your vendor's software plugins don't do it automatically.

https://connect.nimblestorage.com/community/app-integration/blog/2014/03/28/space-reclamation-in-vsphere-55-with-nimble-storage

ninjaturtle · October 2015

What about with EMC XtremIO? I just got the array powered up yesterday, and I saw this post so now I'm curious. I've got a lot of reading and testing to do.

We are looking for a serious performance boost for our core application. We were using Equallogic SANs in test, but need more juice. You guys got me thinking and now looking further into this inquiry.

I always come on the forums, and leave with ideas and tasks. I really like that!

langenoir · October 2015

Lexluethar wrote: »

We use thick lazy zeroed on the vmware side and thin on the SAN side (we use Dell Compellent so everything is technically thin until written to).

One main reason for the thick lazy on the vmware side is better track and management of storage usage. I've heard horror stories of people using thin, not keeping an eye on it and a datastore getting filled up.

To recoup space simply use the unmap command in vmware to reclaim space as VM's are removed over time.

That's actually a good idea, I've never thought of that. We've been having issues with datastores filling, because not enough eyes. Still though the issue is really thin on the SAN and Thin on the VMs is what we're running so it just smells like trouble.

It sounds like my next step is to go to the vendor.

Lexluethar · October 2015

Thin on thin isn't trouble though. On the vmware side it's looked at as thick just the vmdk isn't zeroed until writes are committed. This day in age there is zero excuse IMO for datastores filling up causing lun-locks. Thick on thin unless your SAN doesn't support thin. We use Dell Compellent this works wonders.

Thanks Deathmage - we've been keeping an eye on unmap for a while now and in 5.5 it's finally stable. I've tested dozens of times and there is no ill impact on the VM's. The Host will get busy and you will understandably see a lot of reads/writes on the datastore level but nothing crazy. I'd suggest simply putting the Host in maintenance mode to ensure no CPU contention and run the unmap tool. I'm sure google will give you plenty of sources but here is the command: esxcli storage vmfs unmap --volume-label=volume_label|--volume-uuid=volume_uuid --reclaim-unit=numberwhere>volume label is the name
where>volume UUID is the ID if you don't use label
where>reclaim is the number to recliam, default is 200 which is recommended for vmfs5 so i never put that switch in

Currently working on a powercli / powershell command to automate the process every 6 months or so. If you have generic volumes where servers are storage vmotioned off and/or removed a lot it's worth running maybe every quarter. We went from a generic volume system (IE Tier3-VLAN2) to a specific application volume system. This helps with troubleshooting and reduces the movement of vmdk's around the array.

The argument can be said that you can storage vmotion these servers to new volumes and then delete the old volume. While this is true and how most people still do it - the time and administrative effort involved in using the unmap tool is a huge saver.

We have Nimble as well and are still testing that out - shouldn't matter honestly on anything provisioned in vmware because even on thick eagered zero'ed vm's you ares till going to have unclaimed space over time with the movement of servers (especially if you are using storage DRS).

Deathmage · October 2015

ninjaturtle wrote: »

What about with EMC XtremIO? I just got the array powered up yesterday, and I saw this post so now I'm curious. I've got a lot of reading and testing to do.

We are looking for a serious performance boost for our core application. We were using Equallogic SANs in test, but need more juice. You guys got me thinking and now looking further into this inquiry.

I always come on the forums, and leave with ideas and tasks. I really like that!

Are you're servers running 2008 R2? - dave330i posted a back-end SAN hotfix. Used it in our cluster for huge changes on our Equalogic.

Lexluethar · October 2015

Ninja we looked at XtremeIO but it was Xtremely expensive (at least from what i remembered). We used to have EQL for our production SAN, then moved to an EMC Celerra, now we are on a Dell Compellent. We still use the EQL for unstructured file servers and 'cheap and deep' storage that does not require a lot of high i/o operations.

What type of i/o are you looking for in your application? I personally have steered away from EMC b/c i haven't had the best of experiences (all-be-it we were on an old array). We've spoken to EMC a few times but every time they come in their price points are stupid high for what they are offering. A lot of times their integration is funky to because they purchase technology instead of cultivating it from within. I once got a demo on how an EMC array (not sure if it was the Xtreme IO one or not) replicates from site A to site B and i kid you not, it was about 7 hops because how their software couldn't speak to other pieces of the replication puzzle directly.

Depending on i/o requirements I would suggest looking at Nimble, Dell SC4020 or some other type of hybrid system. We use Nimble for our high i/o applications (things that require stupid amounts of iops) and have been very happy with it and they come at reasonable costs that are very expandable.

Thin provision vSphere datastore at SAN level?

Comments