Any Reason For Seperate Virtual Disks?

cnfuzzd · January 2009

Hi Everyone!

I was use the exchange troubleshooting tool on a virtualized exchange server, and it notified me that my ratio of reads to writes exceeded the recommended amount. The solution was to move the logs off to a separate disk. However, this is a virtual machine, and its storage is hosted on a fiber attached SAN. Would there be any point in creating a second disk?

What about in other scenarios, such as database apps virtualized on vmware server? Any other thoughts?

Thanks!

John

blargoe · January 2009

You want to physically separate the logs, create a new lun on separate spindles, and create the virtual disk for logs on that LUN.

astorrs · January 2009

It totally depends on the storage being used on the back end. Each vendor will have different recommendations/best practices. What makes sense for one vendor will be useless (or worse) on another.

What kind of array are we talking about? And what are the disk LUNs configured like?

Don't forget the Exchange Troubleshooting Tool is not virtualization aware and may not adequately account for things like write caching, etc. happening on the SAN.

HeroPsycho · January 2009

astorrs wrote: »

It totally depends on the storage being used on the back end. Each vendor will have different recommendations/best practices. What makes sense for one vendor will be useless (or worse) on another.

What kind of array are we talking about? And what are the disk LUNs configured like?

Don't forget the Exchange Troubleshooting Tool is not virtualization aware and may not adequately account for things like write caching, etc. happening on the SAN.

+1. NetApp for example won't really do any good to do that. LeftHand doesn't even have the notion of separating volumes on physical spindles.

cnfuzzd · February 2009

I apologize for the delays in responding. I got sick and seemed to stay that way forever...

The customer has an n3600 IBM disk array using fiber channel. We have carved it into two 2tB luns for virtual machine use, which are accessed by 3 esx servers. I am guessing that this is a case in which they shouldnt be separated?

Oh, the individual aggregates are setup as raid 6 arrays.

John

astorrs · February 2009

Hi cnfuzzd, glad you're feeling better.

Given that you're using NetApp (the IBM N3600 is the NetApp FAS2050 with an IBM logo on the front) splitting the disks is not going to have any effect on I/Os for the logs. Your problem is likely elsewhere in the storage configuration.

Here is the current setup as I understand it:

You have 3 ESX hosts connecting over fibre channel to a dual-headed FAS2050 (or is it a single controller model? I'm also assuming this is a single shelf)
The array is partitioned into a single aggregate using RAID-DP (or do you have a root aggregate too?)
You have created 1 or 2 FlexVolumes with either 1 or 2 LUNs (2 LUNs total)
Each LUN is 2TB in size

Is that correct?

Are you actually experiencing performance problems on the Exchange servers (or other VMs)? If not why were you running the Exchange troubleshooting tool?

Here would be my preferred configuration for a 4TB (usable) dual-headed FAS2050 for use with VMware:

Create a single RAID-DP aggregate volume containing all the disks in the shelf
Create 8 FlexVol's
Create a 500GB LUN within each FlexVol
Each VMFS datastore = 1 LUN = 1 FlexVol
Locate the Exchange database and log disks on seperate LUNs

The reasons behind this boil down to I/Os (some other's don't apply right now like FlexClones and A-SIS). ESX maintains 1 SCSI queue per LUN so by dividing up the size of the LUNs we prevent a particular virtual machine that has filled up the SCSI queue with pending requests from starving all the other virtual machines (it would only effect other virtual disks stored on the same LUN).

Check if the partitions of the I/O heavy disks are aligned properly (idealy all the virtual disks should be aligned properly). For example: on your Exchange server open msinfo32.exe and navigate to "System Summary > Components > Storage > Disks > PartitionStartingOffset" and verify if it is equal to 32,256.

If you're using clustered heads, SSH to the filer and run the following command:

fcp show cfmode

Let me know if you're using Standby, Single System Image, or something else.

Are you running ESX or ESXi as your hypervisor?

What version and patch/update level of ESX are you using (3.0, 3.5u1, 3.5u3, etc)?

What version of ONTAP are you running on the filer? 7.2 or 7.3?

On the ESX hosts can you verify what the path policy is set to on the HBAs? Is it MRU or Fixed (or Round Robin in ESX 3.5)?

Thanks,

Andrew

HeroPsycho · February 2009

astorrs wrote: »

The reasons behind this boil down to I/Os (some other's don't apply right now like FlexClones and A-SIS). ESX maintains 1 SCSI queue per LUN so by dividing up the size of the LUNs we prevent a particular virtual machine that has filled up the SCSI queue with pending requests from starving all the other virtual machines (it would only effect other virtual disks stored on the same LUN).

Which is precisely why NFS is superior to FC or iSCSI with a large numbers of VMs.

astorrs · February 2009

Yes and no. Not having to deal with SCSI queues is definitely a pro, but there are cons to going that route.

Personally I have not seen any issues with any of the 3 protocols as long as the storage is designed correctly and the bandwidth to the storage meets the performance requirements.

FCP, iSCSI, or NFS. Each has it's own set of pros & cons.

cnfuzzd · February 2009

Hi astorrs,

First, thank you very much for you reply. I am actually working on a plan to reconfigure this setup (the original consultant who sold and installed it didn't do a great job) and feel like my learning has just been accelerated.

Your estimation of our setup is correct, and we are having some performance issues on the exchange server, as well as another server which hosts Numara's Track It software. These issues could be entirely removed from the storage setup, but I wanted to eliminate the possibility, and then EXtra gave the information I mentioned. As for your questions on the specifics of our setup:

I verified the partitions are properly aligned. It appears that the filer is not running in clusteredmode, as running "fcp show cmode" results in "fcp show cfmode: System is not clustered. Unable to show cfmode." We are using regular ESX, version 3.5. The ONTAP is versioned at 7.2. The HBAs are set to fixed.

This is broadening the scope of the thread, but if you have anything else to add, please feel free. This company has massively over purchased server resources (they have three dual quad core blades with 16gb ram sitting unused), under purchased storage, and have recently lost their entire IT staff. Even when they had a staff, this setup was beyond their capabilities. I am probably going to reconfigure everything from the ground up, after I learn enough to know how to find out the best practices, and this stuff is great.

As an example/sidenote, at one point we rebooted the bladecenter (it is an ibm bladecenter H btw) and the SAN. When everything powered up, we could not get the blades to boot (they are booting from the SAN). I then spent 8 awesome hours playing the bios until I eventually did something to the HBAs to get them to work again and the blades would boot into esx. Then, esx would not load the virtual machines. The solution here was to disable the option in ESX that was preventing it from loading luns it identified as snapshot luns. This worked, and i finally made it home at 7am. Please feel free to speculate as much as you want on this.

Again, thank you guys so much. This stuff is going to be very very very useful. Sometimes, just know the right vocabulary is enough to unlock the best resources.

Thanks!

John

tiersten · February 2009

cnfuzzd wrote: »

Your estimation of our setup is correct, and we are having some performance issues on the exchange server, as well as another server which hosts Numara's Track It software.

How old is your version of TrackIt? We had a old version and the performance was terrible due to how it was implemented. There was a MDB kept on a share and the client would actually go open that.

blargoe · February 2009

We used to run Track-it, our performance was also sucky

Any Reason For Seperate Virtual Disks?

Comments