Storage bottlenecks and Performance reviews

staggerleestaggerlee Member Posts: 90 ■■□□□□□□□□
Hi all,
/* Note this is a bit of a long post sorry about that just getting slightly confused on the subject and how best to handle it! */

This is part SQL and part SAN related. Im mainly interested in the SAN performance review side hence sticking it here
Ive recently been part of a team building a new SQL Cluster. Now that it's built management has asked for a performance review of what the system can handle. Basically we brought the servers a while ago (SAN even longer) and since then we have had some changes where

A: A new website will be connecting to the SQL server.
B: A business app we just brought has been having a lot of troubles and people want to find out where the problems are coming from.

So im pretty new to whole world of SANS and performance related info. Im trying to get my head around it and i keep running into questions that im not sure where to look for the right answers.
Our current set up for the SAN as I know it (coming with n00b eyes im afraid)

2 SQL Servers clustered. Each server has 2 HBAs
2 Windows File Servers clustered. Not sure if there’s 2HBAs i would expect so.
A VMWare server that have Exchange VMs on it we currently have 5-10 users on it But will be moving the rest of the users onto it totalling around 1000 users.

Each HBA connects to a Broccade switch that goes to JBODS.

The data for each of the 3 sets of servers are on different phy disks.

Not sure if that is enough info but im hoping it’s a starting point for some experienced eyes to give some insight!
So for the website our project manager wants to know things like: Can the system handle 1000 users to the site. from that i guess that means i need to test:

A: 1000 users reading SQL data at the same time.
B: 1000 users writing SQL data at the same time.
C: Go slightly over and below in chunks to see if the SQL can handle more and where the line is to show the max performance?

So i get a ruff idea of what I think needs to be tested. Next is how to test it. I know theres SQLIO and SQLSim that can do stress and load testing and another called IOMeter. Are these the tools that anyone else has used before? Will they get the results I need?
But now im thinking how can i get solid results. because:

A: How much effect does the file server and the VMware have on watching the I/O’s of the SQL Server. Surely the Switch that they all link to. could be a bottleneck. If it could how do i test to find out if the SQL speeds dont matter as the switch is being thrashed out of its mind when 400 users are connecting to the files from the file server and we have 900 users accessing the Exchange VMware servers.
B: The SAN is currently only really used in the day when users access the files and VMs (8am-6pm). If i run the tests at night i would get a clearer peak possible performance for SQL but would this be accurate. What if we found out people log into the new website during the day. Should i run what ever tests i do both at night and day!? (and again going back to the Brent Ozar post he says that the SQLIO test will totally thrash the SAN and if anything else is connected it will seriously effect there results.)
C: Even if i test during the day then tests are again lying as at present the Exchange servers on the VMWare servers only have 5-10 users and that will end up at 1000.
Again I guess this all comes to down to weather the switch can handle it. If the the traffic of the other SAN users is on different disks and the switch can take anything then I can run results day or night and it wont effect anyone else.

Now to the learning part! Which im actually quite excited about. Being able to do this sort of performance reviews seems pretty cool to me (And very handy for future projects, How having all the users on VMware is performing, also talk of moving to Hyper-V so being able to compare the results of each of them would be a great help!). But im lost as how to do it. Ive found a good “How to use SQLIO” SAN Performance Tuning with SQLIO - SQLServerPedia. But not for SQLIOSim or IOMeter.

After ive run the tests how do I tell what the results mean? I often think the same for PerfMon. Theres 100s of options to run on it but where do you start on which ones to run.
I was looking at doing at SNIA or the EMC basic certs but they seem to only be a know what each part of the hardware is. I clearly need this to, but in the Storage field is there a cert that has some performance checking on it?

Hoping can someone can shed some light on any of my queries!
Thanks
S

Comments

  • bertiebbertieb Member Posts: 1,031 ■■■■■■□□□□
    Hi there. Without knowing exactly what kit and SAN you have it's difficult to offer you concrete advice, however from a high level I'd suggest the following. Note this is mostly from the SAN/fabric perspective, I'll leave the SQL and Exchange stuff to someone else!

    Fabric switches - certainly, you need to monitor ports for throughput/performance. Brocades have simple tools built in to give you a graphical representation, but you can also setup cacti to do this over time.

    SAN - More often than not you'll hit noticeable performance issues here before the fabric switches. For example, the cache in the SAN is a huge factor regarding performance and if it doesn't have enough for the type of workloads you are throwing at it, or is misconfigured then you'll notice it - fast. (e.g. All the major manufacturers have general guidelines saying how the read/write cache should be allocated, and the general rule of 'the more cache is better' applies.). Likewise, the raid layout can have similar implications and whilst specific applications may have their own general guidelines, some SAN manufacturers might actually recommend another so do some research and see if what you have is considered 'optimal'.

    You might also see more advanced options on the SAN such a 'SCSI Queue Depths' etc that can have a really big effect (positve and negative) but you shouldn't change these unless you really know what you are doing. On that point, if you have a 'decent' SAN thats under an active maintenance contract, do you have any consulting days available for the vendor to come in and analyse performance? Sometimes SAN vendors are deliberately stingy on the advanced performance monitoring tools available and they often come as add-on bits of software or licenses that all help to deplete your budget :)

    VMware- You mentioned VMware in there also. There are some good tools/monitors/performance graphs in there to monitor aspects of SAN performance from the VMware perspective. Whilst it may not give you the overall picture, it will help monitor it from an Exchange point of view seeing as you have them virtualised and you can take a baseline and review as you add more users.

    Testing - Tools like iometer are very useful for analysing performance as you can adjust the settings to match the types of workload you are throwing at it. As with all tools, you need to know what you are doing and what values you need to put it or else it'll be a case of 'rubbish in = rubbish out'. You'll need to tweak this to 'mimic' SQL and exchange type workloads, but even so your databases will likely have a different workload type than someone elses. There are some good forums and blogs out there detailing how people used tools like this within their environments to give you some good suggestions.
    staggerlee wrote: »
    So for the website our project manager wants to know things like: Can the system handle 1000 users to the site.

    Does he/she actually mean 1000 users connecting to the website, and not 1000 users each connected to SQL and running queries? These are quite different requests in nature as I wouldn't expect 1000 users on a website to generate 1000 SQL sessions........ if it's a Microsoft IIS server then there are tools available to run website stress testing and simulating 'X' amount of users which you could throw at the website and then monitor everything at server (IIS and SQL using perfmon etc), SAN and switching levels to see if there is an obvious bottleneck.

    So yep, a lot of reading/research and trying to combine it into something meaningful. It's not an easy or straightforward task as you'll often find you need to tailor it to your specific environment and setup - one size does definately not fit all - but it's very rewarding as you will definately learn a lot! You'll need to visit various vendor sites and forums to help you interprit the information, pretty much in the same way you would for working out what the various perfmon counters and their values mean. On a final note, if you are interested in SANS then I'd recommend the EMC Information Storage and Management book. Whilst it's written by EMC, it's mostly vendor neutral and contains some great information that will be very useful for you.

    Hopefully inbetween the ramble I've given you some useful pointers to get you started.
    The trouble with quotes on the internet is that you can never tell if they are genuine - Abraham Lincoln
  • staggerleestaggerlee Member Posts: 90 ■■□□□□□□□□
    Hi bertieb,

    Thanks for the reply and sorry for the delay in getting back to you..

    Im looking into getting some documentation on the LSI JBODs (I believe the model we have is a Pantera, if that helps, though i cant find anything on there site regarding anything called that :/ so waiting on a email back from our supplier).

    Im going to check out the Brocade tools and see what i can get from that! (We use 200E) cacti may also be an option though to be honest looks slightly confusing to my newbie eyes icon_sad.gif

    I think my choose of using either SQLIO or IOMeter will come down which i can get the best training for reading results so will search the web for some info :)

    As for our suppliers, i fear we may not get much bang for our buck on that one. Ive asked about performance tunning to the guy who deals with our SAN and he sounded alsmost excited to hear my results as he had never done it. He said he had used IOMeter before but not regularly and not to any depth (So not sounded great.)

    Hopefully i can start to piece it all together and start getting some results!

    Thanks

    S
Sign In or Register to comment.