iSCSI SAN Implementation with several ESXi hosts and two Equallogic SANs

sratakhinsratakhin Member Posts: 818
Hi everyone,




About a month ago I started working for a small state college in Midwest. Wanted to write a long post about the new job but got too lazy :)


One of my first projects is optimizing SAN setup. We currently have 4 ESXi hosts (all made by Dell), 2 EqualLogic SANs (PS4000 and PS4100) and a bunch of old HP Procurve switches. The current setup is very far from being redundant and fast so we want to improve it. I read several forum threads elsewhere but got even more confused. Here is what I suggested:



The Procurve Switches are 2824. I know they don't support Jumbo Frames and Flow Control at the same time, but we have plans to upgrade to something like Procurve 3500yl. Any suggestions? I heard Dell Powerconnects 6xxx are pretty good but I'm not sure how they compare to HPs.
There will be a 4-port Etherchannel (Link Aggregation) between the switches, and all control modules on SAN will be connected to different switches.
Is there anything that will make the setup better? Are there better switches then Procurves 3500yl that cost less than 5k? What kind of bandwidth can I expect between ESXi hosts (they will also be connected to 2824 with multiple cables) and SANs?

Comments

  • it_consultantit_consultant Member Posts: 1,903
    Is $5k your budget for this whole project? The best upgrade you can possibly do without investing in FC is upgrading to 10GB iSCSI. I am not sure if the SANs have 10GB ports, most modern SANs do - or it may be a slot upgrade. The picture you gave will improve redundancy but the performance will still be sub-par.

    People often underestimate the cost of storage solutions, all in they can get pricey. The HP small business SANs seem like the best for the money, but you guys already went all in with Dell.
  • undomielundomiel Member Posts: 2,818
    Personally I'd go with 2 Dell PC5524 switches and stack them with the HDMI cables. That will get you redundancy and improved bandwidth between the switches with very little effort and management on your part. I have a preference to Dell switches over HP switches.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • undomielundomiel Member Posts: 2,818
    No 10 GbE modules available for those arrays.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • ptilsenptilsen Member Posts: 2,835 ■■■■■■■■■■
    10gbps will be a big upgrade, but what kind of disk setup and load are we looking at? How "small" is "small", really? I'd just hate to see you invest in the network if the disks are the problem. I've implemented some very similar setups (with all HP hardware, mind you) for SMBs with relatively low loads, and 1gbps iSCSI with four link-aggregated ports between switches and two NICs per host was plenty fast. Conversely, I've done 8gbps FC and seen huge array bottlenecks leave the bandwidth wasted.

    Not saying one or the other is your problem, just saying there's a lot to it. I wouldn't throw all your money at the storage network until we're sure that's your bottleneck.
    Working B.S., Computer Science
    Complete: 55/120 credits SPAN 201, LIT 100, ETHS 200, AP Lang, MATH 120, WRIT 231, ICS 140, MATH 215, ECON 202, ECON 201, ICS 141, MATH 210, LING 111, ICS 240
    In progress: CLEP US GOV,
    Next up: MATH 211, ECON 352, ICS 340
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    I think what we'd need to know what you mean with "far from fast" - is the bottleneck the network even or maybe just he storage layout ? If the storage - is it read or write where the problem is ?

    You can improve a lot by just changing even the raid level of the arrays. I think Raid 5 or 6 is the default when you go for the initial provisioning..
    My own knowledge base made public: http://open902.com :p
  • it_consultantit_consultant Member Posts: 1,903
    Yeah, good call on the storage layout. Optimizing that will help. A dedicated 1GB HBA on the server side will Probably be OK but I wouldn't be running highly transactional databases over that link. Someone went cheap on the initial acquisition which means more pain for you as you try to expand and optimize. That is one of my main problems at my current job. Ever hear of Virtuozzo? Yep, that cheap.
  • sratakhinsratakhin Member Posts: 818
    Thank you for the responses.

    I guess I should have provided more info. The first SAN was set up before the adoption of VMware. It was used to expand the storage because the current fileserver ran out of space. The second SAN was installed later on, but the last network guy just plugged it in and called it good. For now, we don't really have any performance problems but of course, I can't really tell as I don't have any experience with SANs. However, we would like to provide enough bandwidth for the future growth.
    Currently, we have about 50 virtual servers and 60 thin clients using VDI-in-a-Box.

    Now to individual responses :)
    2 it_consultant, we don't really have a budget for it. the 5k figure I used is just an estimate of what can be spent on each switch. Since we'd like to have two of them, it will be at least 10k. I'm not sure that using 10gb Ethernet will help in any way as our college is really small. We only have a couple thousands of students and less than 200 employees. Moreover, it's too expensive :)

    2 undomiel: How do Dell switches' GUI and CLI compare to HP and Cisco? I guess it probably makes more sense to use Dells, especially given that our servers and SANs are Dells. The rest of the network is HP and so far they have been rock solid.

    2 ptilsen: we have 16 15k SAS drives in one SAN and 12 7.2k SATA drives in another. The current performance... I'm not really sure how to measure it, but when I did a benchmark in a virtual machine (can't remember which SAN it was on), the results disappointed me. Let's just say that it was slower than 2 drives in RAID 0 on my home computer ;)

    2 jibbajabba: My main concern is redundancy. If I could make it fast as well, that would be even better ;)
  • undomielundomiel Member Posts: 2,818
    I've had one Dell switch that was a bad apple from the factory but all the rest that I've deployed have been rock solid. Since you're using EQL definitely make sure you're running on the latest firmware as there was an incompatibility in an older version with iSCSI traffic between XenServer and the EQLs. I'm not sure if it extended to any other hypervisors but I do know that none of the Hyper-V clusters I have running on the older firmware have had any issues. Only other gotcha I can think of is if you use the HDMI stacking make sure you use proper specification cables.

    As far as management goes I would say that the CLI is very similar to Cisco. You definitely shouldn't have a problem jumping between the two as long as you aren't afraid to hit ? and also keep the documentation handy. When I configured my first PC I had the documentation open the entire time but since then I think I only needed to dive back in once to find something fairly obscure. As for the GUI I can't really make too much of a comparison as I'm a CLI guy. Usually the only reason I'm in the GUI is to check some of the performance graphs.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • it_consultantit_consultant Member Posts: 1,903
    The main problem is that 1GB is just too darn slow even on a dedicated HBA. I am only talking about doing 10GB ethernet between the servers and the SAN. Let me give you an example. Internal drive cabling is what...6 GB/s or somewhere around there. FC will be between 4 and 8 GB and they get to use massive FC frames. Your storage access is 1GB and you are on the small frame size of ethernet. Right now, you are in no position to expand. A better idea would have been to use a shared SCSI BUS or direct attached storage. Don't buy a new Dell switch at the same speed as your old HP switch, it won't do anything for you.

    We have about 200 users as well and we have a fully built out 4/8GB FC SAN with a backup SAN connected by a 2 4GB ISL trunk in a location 10KM away. We have really the best an org like us could hope for. Right now, I get faster access to my storage across a 10K link then you do right next to your SAN.
  • sratakhinsratakhin Member Posts: 818
    it_consultant, what do you think about using stuff like Link aggregation? I used in on all ESXi hosts and it made some difference. However, I only used an old ProCurve 2824. Now we are looking for something better that supports distributed "trunking".
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Re redundancy : when designing the layout don't forget that the EQLs aren't active / active so a failed switch can easily cause connectivity issues. Having said that - the Dells dealing with MPIO internally but you need to make sure it is setup correctly (your diagram doesn't use dedicated mgmt on purpose?)

    So make sure you connect your ESXi hosts to the GROUP IP (for iscsi and not mgmt) and not to the individual port.

    eth0: iscsi port #1
    eth1: iscsi port #2
    eth2: mgmt port

    Also, are both SANs in the same group ?
    My own knowledge base made public: http://open902.com :p
  • ptilsenptilsen Member Posts: 2,835 ■■■■■■■■■■
    @it_consultant, that's all spot on except that its gbps, not GB/s. Big difference there. You even made me make the same mistake in my first post (yes, I'm blaming you!)

    But yeah, for that kind of setup 1gbps iSCSI is going to be a serious limitation. While I'm inclined to say you may encounter a serious disk I/O bottleneck, I'm also inclined to say that your current SAN setup is a bigger problem. At any RAID level, those 15K drives are going to have higher read and write throughput than iSCSI on top of GbE can deliver. You're talking about over 2gbps just for the 15K doing some simple sequential operations. Over 50 servers + VDI is going to be a lot of traffic. I think 10GbE on the SAN side would do it, but Udomiel said that's not an option.

    For the record I've put organizations with half as many systems on 20-30 15K drives broken out based on projected workloads. I'm not saying your drives can't do that job, but it really wouldn't surprise me if they couldn't. You are probably going to want more drives regardless of the network.

    I would say focus on getting the storage network to a minimum of 4gbps FC on the one SAN and ESXi hosts. You could also do 10GbE on the SAN and 2-4 GbE NICs per host, but I think you're going to be disappointed by a pure GbE storage network at this point. I will say your 7.2K drives are probably not bottlenecked by the network at all. I'm assuming you use those for bulk storage, which should be fine and not a big concern anyway. You might see a throughput limitation on asynchronous sequential transfers. If it's being used for lots of simultaneous I/O the spindle rates are going to be a much bigger bottleneck than the network, IMO.

    As far as redundancy goes, it should be fairly redundant the way you have it cabled. You can tolerate a switch failure, multiple simultaneous cable failures, and a controller failure, assuming things are configured right. What kind of potential failures are you looking to mitigate that you haven't?
    Working B.S., Computer Science
    Complete: 55/120 credits SPAN 201, LIT 100, ETHS 200, AP Lang, MATH 120, WRIT 231, ICS 140, MATH 215, ECON 202, ECON 201, ICS 141, MATH 210, LING 111, ICS 240
    In progress: CLEP US GOV,
    Next up: MATH 211, ECON 352, ICS 340
  • it_consultantit_consultant Member Posts: 1,903
    Between the ESXi hosts and the storage (through a switch) trunking would incrementally help. If you got a dedicated HBA, say an Intel quad port gigabit adapter and bonded the 4 NICs in an aggregation config, assuming you have four ports on the back of the SAN than can also be configured as a trunk you could boost your overall storage performance. I see on the back of the picture that each SAN has 4 gigabit ports with one that looks like a management port, is that correct?

    In order to be truly redundant while trunking your links, you need to put your switches in a "Stack" so that you can trunk 2 ports into switch A and 2 ports into switch B - all 4 in the same trunk. That way, if one switch dies you still have 2 active links. Dell and HP support "stacking" on some of their switches, I think Dell uses dedicated stacking ports. $5K should get you what you need with optics on the switch side.

    Before you put any money into that solution, make sure the SAN will support link aggregation.
  • undomielundomiel Member Posts: 2,818
    For EQL you set the RAID level per a member so unfortunately it isn't possible to dedicate spindles to specific work loads. You just migrate based on the target RAID level, assuming you're mixing RAID levels in a storage pool. It's also not possible to bond the ports on the EQL. And jibbajabba is correct about them being active/passive so you won't be able to use both modules. And to continue the bad news, fiber channel is not an option with EQL arrays, they're iSCSI only.

    Probably the best thing that can be done on the EQL is pinning the important servers to the 15k array. Getting jumbo frames implemented should help a bit as well.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • sratakhinsratakhin Member Posts: 818
    Hm... You guys made me think... Unfortunately (or fortunately - it works fine), replacing the SAN is not even planned until I don't know when.
    We have also been thinking about using DAS for VDI (SSD drives may be?), leaving SAN for the servers and storage.

    2 it_consultant: both SANs only have 2 modules (active/passive), each with 2*1gbit ports + 10/100 for management. Unfortunately, no better modules are available, so we'll have to use whatever we have.

    2 ptilsen : let's just say that current setup is somewhat redundant. Both SANs are connected to two switches, but the switches don't have a connection between each other, and each ESXi host can only see one SAN. ptilsen :

    P.S. I'm still sure that there has to be a way to get more than 1gbps from these SANs... Will do some tests next week :)
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    PS, we run several 4000s using GbE only and our bottleneck was not the network - sure, it depends on the specific load. What is important are indeed Jumboframes and Flowcontrol. Heck, we even made sure the SANs are connected to the same switches as the host to avoid routing icon_smile.gif
    My own knowledge base made public: http://open902.com :p
  • it_consultantit_consultant Member Posts: 1,903
    Luckily you aren't in the position of having bought substandard hardware for the task; so you can say that they bought the wrong stuff without taking any personal responsibility. I would be curious to know how much they spent on these SANs. I know HP sells SANs with 10GB capability which are fairly affordable. HP tends to be a little more expensive and they can be a pain to order, but you won't find yourself unable to upgrade to 10GB.
  • dave330idave330i Member Posts: 2,091 ■■■■■■■■■■
    How many VMs? How many VMs/LUN?
    2018 Certification Goals: Maybe VMware Sales Cert
    "Simplify, then add lightness" -Colin Chapman
  • sratakhinsratakhin Member Posts: 818
    About 50 servers and 65 desktops. Most of the servers are just sitting idle.
Sign In or Register to comment.