Array Design question

Hey guys,

So I kind of know for a fact this array on our current Dell server is just not provision correctly; we have faster 10k 1.2 TB drives on order (9 count) but our current array has 2 TB's in a RAID 5 on a 7.2k drives.

With this being said, is this graph below normal for a array that uses 880 GB's of 1.8 TB's. To me this is a fine example of a stressed array that is being asked to do stuff it's just not designed to do.

The graph is a overall look at the performance of the sole datastore for the entire server with 6 VM's on it and vCops turned off; to me 25 ms (avg) latency is tad too high.

array performance.jpg

Find more posts tagged with

Free for TechExams community: Cybersecurity salary guide

Compare cert salaries and plan your next career move

Button

Comments

azjag

This is one of those "It Depends" answers. Running 6 vm's on a single datastore is going to cause some thrashing. What you have to be careful of is running a bunch of heavy vm's (SQL, Exch, or other high utilization Apps) on a single datastore. What types of VM's are running on this datastore?

Deathmage

azjag wrote: »

This is one of those "It Depends" answers. Running 6 vm's on a single datastore is going to cause some thrashing. What you have to be careful of is running a bunch of heavy vm's (SQL, Exch, or other high utilization Apps) on a single datastore. What types of VM's are running on this datastore?

a remote Terminal server (15 CALS) - roughly 9 active users at any time.
a Secondary DC (domain-level masters, DNS, DHCP)
vCenter
a transactions server
a Print Server - roughly 45 printers.
vCops (vApp turned off)

was curious in the new array (7.9 TB) to split up the datastores into smaller ones (only have a need with all servers for 5.8 TB's - bear in mind it will be 7.9 TB's per server and we have 3 of them) and placing the intense VM's like SQL and transactions VM on it's own datastore since the transactions server just accesses the SQL database anyways and then placing like the DC/Print Server on another datastore, and keep no so intensive server like Terminal Services on another datastore.

We also have a file server and app server and one other physical DC that is the PDC yet to be P2V'd as-well as the SQL server which I'm doing last giving the impact it has on the systems. it's just a very SQL intensive network, in excess of 40 to 50 large queries every hour from a custom UNIX-based ERP.

I'm sort of getting into the design thoughts now since I want to shape the cluster in the right manner so I've been running baselines on all of the physical servers to get an idea of what I need 'performance' wise to maintain them moving into the ESXi cluster and before you mentioned it azjag I was wondering if splitting up the SQL onto it's own datastore would be beneficial even though they sit on the same array.

Essendon

I'd put the SQL server on its own datastore and maybe have everything else on another (or give vCenter its own too, but going by the size of this it should be okay). Check the post I made in your other thread and like azjag said, it depends.

What are you going to be doing with new array? Is this part of a larger project? How big's the deployment anyway, number of hosts and VMs?

Deathmage

Essendon wrote: »

I'd put the SQL server on its own datastore and maybe have everything else on another (or give vCenter its own too, but going by the size of this it should be okay). Check the post I made in your other thread and like azjag said, it depends.

What are you going to be doing with new array? Is this part of a larger project? How big's the deployment anyway, number of hosts and VMs?

Well I kind of have to do this deployment in stages so that the bottom line is easier to absorb the cost. I want a 10G FC SAN with the correct redundancy in regards to fabric long-term and so does my colleague but for right now to get the servers in a more stable state I need to virtualize the systems and thus I've been doing that (lets put it this way our entire domain was sitting on a PIII server with a failed RAID 5 array and all the masters were on it, day 1 I spun up another DC, literally wasn't even there 2 hours and I fixed that!!!); passed the day 1 test.

Basically a little backdrop, all of the servers are R710/510's with a assorted suite of 2950's and really old PIII Dell servers with only 4 GB's of ram sitting on a 48 Gbit backplaned L2 /24 subnet; the Sonicwall is the L3 (eeek, I know). The servers are clearly just not maintained over the past ooo 8+ years. it scares the living crap out of me frankly, been P2V'in the critical server the past few weeks. They had 12 TB NAS just sitting in a box in the closet from 9 months ago that I setup just to get solid backups with Acronis, so I got solid backups. The forest level is at 2000, yes Server 2000!!!!!! ... checking with vendors to make sure if I upgrade the forest if it will work, but I so want to use GPO's but can't at the current forest level with the stuff I need it to do; it's annoying. Most of the servers are out of warranty and some have failing or failed drives in a RAID 5, as mentioned in a previous post the upgrades need to be done in stages so that the infrastructure works alongside the upgrades. Priority 1; make the servers more stable and having a fast-solid storage array.

The numbers of hosts is 3, the number of VM's in 13 each with right sizing them to 8 GB's up from 4 GB's and SQL from 6 GB's to 32 GB's w/ 4 cores (to start off). We are deploying 3 terminal servers for about 15 users per server with 16 GB's of RAM each w/ 4 cores as-well. SQL/App/File servers will be on one host all to themselves and the other server will have everything else, the 3 host is a hot-spare. We have 2 DC's and I want to have a physical one just so I don't have all the cookies in one jar mentality and a application and file server. We also have 4 specialized server with UNIX on it. (

UNIX, neato!) But long-term there is current talk to absorb 4 other companies and grow the forest from one domain to encompass 5 other domains working in 4 different locations, so I'm planning big even though the need is small. But right now the infrastructure can't sustain that addition hence why I was hired.

As far as the array, right now it's sustain storage growth but the idea is either vSAN or a full-fledged SAN; if we get a full fledged SAN then the array will be used to grow the cluster with VM's. I do beleive these Dell R720xd's can sustain 2 TB's of RAM, so there is a tons of routes I can go. The main reason for a SAN or vSAN is for backups and/or redundant storage. There is talk to use vSAN for hot data and a SAN for cold data or the other way around.

It's a very large encompassing project but this is what I dealt with when I came onboard at President gosh 3 1/2 years ago; I got good experience as a fixer from that nightmare back then, lol! - the one key good thing I like is they are paying for training for me to get my MCSE:SI so I can better manage there network now and when it grows.

Essendon

The read latency is high, what was happening when you took the screenshot?

I suggest you read up about VSAN (notice the uppercase v) before you make comparisons with a SAN. It's not like you think it is (going by how you've written about it above).

Sounds like you are going to get some solid experience here, that's for sure. Own it!

JBrown

check the read latency on each vmdk for each VM, check the Disk latency to confirm if its a VM can't keep up with IOPS requests or sub-system is the one that cant keep up.

Deathmage

JBrown wrote: »

check the read latency on each vmdk for each VM, check the Disk latency to confirm if its a VM can't keep up with IOPS requests or sub-system is the one that cant keep up.

the read latency is pretty high, about 26 ms average but have spikes to 65 to 80 ms for a few ms, so I don't think the current 3 drive 1TB array in a RAID 5 can sustain the load currently. Basically I arrived on the job the begining of the month and that was the array in the server so using what I got. More memory and a larger array is on order and will be here sometime next week.

See I have a option to use 15k drives but SAS drives only have have a capacity of 600 GB's and I fall short with current storage needs by 1.5 TB's so I'm stuck with 10k 1.4 TB drives in a RAID 5 array. The total IOPS of that array and size would be 1200, but my SQL server which is the highest demanding server has a peak IOPS of 860.

On a side-note something to go googly-eyed over I heard from the Dell storage rep I was talking to that Dell's new R720's coming out this month are shipping with the 12 GB/s SAS controllers option, only problem not many vendors are making the 12 GB/s SAS drives yet....

Deathmage

JBrown wrote: »

check the read latency on each vmdk for each VM, check the Disk latency to confirm if its a VM can't keep up with IOPS requests or sub-system is the one that cant keep up.

just check the logs and from 4:17 pm til 5:17 pm, the read latency as the latest was 3, max was 125, min was 0 and average was 21.25.

Hopefully this screenshot is viewable my screen at work is a 27 inches so the fill size is huge.

I think it would be a good idea to find a white paper on what a ideal latency would be for an array to aim for and work to keep it with Storage vMotion as one way to balance the load and properly design datastores from the get go like Eager Zero vs thin provisioned. Like I know, well at-least to me Eager Zero for a SQL/File server would seem the correct move however for a DC or a print server unsure however would presume Eager Zero for them would be ideal for performance too.

graph of one hour..jpg