Load Balancing/Sharing Between Multiple ISPs

Eildor · January 2013

What is the best way of implementing load balancing/sharing between multiple ISPs? Is load balancing even possible? I imagine you can configure static routes to load balance traffic, but then what about the differences in speed between ISPs? If it is possible it sounds kind of complicated. Per packet load balancing over multiple ISPs surely cannot work?

And I guess you're going to have to do something about BGP also.

networker050184 · January 2013

When you load balance over multiple ISPs it is a bit tricky working with BGP. I always advise customer against even trying. A primary/backup solution is much easier to troubleshoot and implement.

If you do want to try it you have to work with things like LP communities and prepending to try and work at getting your inbound traffic going how you want it. Outbound is a little bit easier as you can control your route preference within your own AS fairly easy. You could let traffic pick its best path naturally, but no garuntees that it will work its way anywhere close ot even distribution that way.

7of9 · January 2013

Most places I've worked used BGP and never really had anything near equal load balancing, but it was better than leaving one link completely unused. The one place I did see anything close to equal load balancing across the links, we used some sort of appliance at the edge to do it. I'm wracking my brain trying to remember the name of it, but all I remember is that it had a very nice GUI that you could use to send different traffic types down different paths, with nifty pie charts to show management whenever they questioned whether it was working. If I remember the name, I'll add it here, but suffice it to say that it was likely a very costly solution and only geared towards several thousand users.

networker050184 · January 2013

Curious why you think load balancing is better than leaving a link unused? You have to have it there whether it's used or not. A lot of time you may have to pay per traffic so using the two links may end up costing you more money than an active/passive solution in the long run. Especially if you can get a cheap unmetered primary link.

There are hardware appliances that can help you do this that use NAT rather than a BGP solution.

7of9 · January 2013

Everywhere I've been, you pay for a certain amount of bandwidth regardless of whether you use it or not. Since you generally size both links so that, in a DR situation, you could conceivably run on a single link, then you're essentially paying twice what you need to for 99% of the time. To most executives, that looks bad and they'd rather think that they're at least getting something beyond just redundancy for that money.

As long as load balancing doesn't come at the expense of being able to survive on one link if the other goes down, I do see load balancing as better than having a link you're already paying for going unused. It certainly helps when justifying that cost to management.

pert · January 2013

networker050184 wrote: »

Curious why you think load balancing is better than leaving a link unused? You have to have it there whether it's used or not. A lot of time you may have to pay per traffic so using the two links may end up costing you more money than an active/passive solution in the long run. Especially if you can get a cheap unmetered primary link.

There are hardware appliances that can help you do this that use NAT rather than a BGP solution.

The only real reason is being able to always know the redundant circuit is good when traffic is running over it, but honestly you can do most of that through monitoring software so...yeah.

This is from memory, but the way I've seen it does is through prefix lists, route maps, and weighting. Different sources of traffic are defined through the prefix lists. You'll use several such lists to split up the traffic into roughly equal. The route maps classify gear using the referenced prefix lists, then assign a higher weight to one than the other, which is then revered on the redundant router/link. That way you have redundancy, quick failover, and manual load balancing.

Heres a super simplified fake config

Pref List A
<stuff>
Pref List B
<stuff>

Route Map
If match List A
weight 120
If match List B
weight 100

BGP referenced route map.

Redundant router is the same, except

Route Map 1
If match List A
weight 100
If match List B
weight 120

pert · January 2013

7of9 wrote: »

Everywhere I've been, you pay for a certain amount of bandwidth regardless of whether you use it or not. Since you generally size both links so that, in a DR situation, you could conceivably run on a single link, then you're essentially paying twice what you need to for 99% of the time. To most executives, that looks bad and they'd rather think that they're at least getting something beyond just redundancy for that money.

As long as load balancing doesn't come at the expense of being able to survive on one link if the other goes down, I do see load balancing as better than having a link you're already paying for going unused. It certainly helps when justifying that cost to management.

This argument makes zero sense. You're still paying for the unused bandwidth either way. The only way you can achieve what you are talking about is if you had burstable circuits. I.E. you had two OC12s that are both running near peak, but you can allow for failover through bursting. As far as convincing management its worth having redundant circuits, eh. It's their call, even as the network guy it's not your call whether redundant anything is "worth" it, that depends on the business and its needs. If the management doesn't want to go for it I'd make sure they were clear on what exactly the risks are, but I wouldn't try to convince them of which way to go. I save that for when people want to buy over/underkill gear or implement stupid solutions. Pick your battles.

7of9 · January 2013

Basically, this is how the conversation goes.

CTO, CIO or <insert non-networky management guy> walks by the nice, new network map, showing the dual internet circuits. He pauses and has a "visionary moment" then says, "um...are we using both of those?" You explain that one is there for redundancy and is bought from a different ISP, run to a different POP, etc. He strokes his manicured goatee or adjusts his silk tie or whatever his "I'm thinking" tic is and then says, "Oh...but are we using them both NOW?" You explain that no, we aren't, we are fine with running on one. He nods a bit and walks away.

10 minutes later, whatever manager he tells what to do tells you you are going to find a way to put both circuits to use and prove that they are being used at any given time. Given that this is less annoying and disruptive than the last 20 things that said CIO or CTO came up with when having a simliar "visionary moment," you make it happen and he is happy looking at his pretty multicolored pie charts and you are happy that he is not refusing to pay for the redundancy the company needs and then breathing down your neck when ISP A has an outage and you're down.

Often...real life solutions have to factor in more factors than just the technical ones used in exams...including the whims and personalities of people who spend too much on suits.

Add to that the fact that most companies I've worked for are only willing to pay for a circuit which, when used by itself, is only enough to limp along with versus the much better performance if I load balance across the redundant link.

7of9 · January 2013

Of course...now that I've moved to the provider side, I'm more than happy for you to buy all your redundant circuits from me and then not use them unless you have an outage.

networker050184 · January 2013

If you have to talk your CIO into getting a rendundant circuit you probably need to find a new job! Redundancy is very important. Does this same CIO not want ot buy spare equipment? Not put an extra sup in the switches? No DR plan either? By that logic the whiole DR site should be shut down since its only used in emergencies!

Eildor · January 2013

I need to read up on BGP when I get time, I have forgotten so much of it. That's what happens when you aren't applying things you learn in the real world huh.

pert · January 2013

This is more a how-to on policy based routing than BGP, but you do need to know what BGP weights are, you could use the same thing with EIGRP or OSPF and adjust the appropriate metric.

Eildor · January 2013

pert wrote: »

This is more a how-to on policy based routing than BGP, but you do need to know what BGP weights are, you could use the same thing with EIGRP or OSPF and adjust the appropriate metric.

How does the example you posted effect inbound traffic though?

networker050184 · January 2013

That wouldn't affect inbound. That is much more difficult to tackle. Things that can be used are prepending and communities from your provider. Another popular way (though some think it is not a best practice for the internet as a whole) is to advertise specifics one way and aggregates the other. That way if say you have 2 /24s you can advertise one 24 out each and the aggregate out both as well. This way all traffic for one range goes one way and all goes the other due to longest match. You also have the aggregate out both incase of failure to pull the rest of your traffic over.

7of9 · January 2013

I've worked for various companies, from small startups to large healthcare systems. In none of these we were allowed to spend money on a full internet circuit that was only for DR. If it was there, they wanted it to be, if at all possible, used for something. And yes, CIO's aren't always the most savvy people when it comes to networking and making a business case for redundancy often takes a real outage before it sticks. Luckily in healthcare, all we had to do pretty much was mutter "patient impact" and they would pony up the money for redundant gear, but even there...we had to put our redundant internet circuit to work. It wasn't allowed to simply sit idle.

That's just been my experience in the trenches. Other companies may have it easier.

networker050184 · January 2013

I've set up BGP circuits for hunderds of customer that go unused unless there is a failure so it's definitely not an uncommon theme. It's usually done through mass prepending. Have a look through some route servers and I'm sure you will find many examples.

lordy · January 2013

At our office we have a 300 MBit/s and a 100 MBits/s Link. When we were talking about using both he brought up a smart point:

One link is fiber and the other is copper. They have different latencies which will lead to packets not coming in proper order. The hosts would have to reassemble them which would actually degrade throughput.

Therefore we are only using the big link and keep the "slow" link on standby.

networker050184 · January 2013

Hopefully even if you do happen to load balance you are doing it per flow and not per packet to have to worry about those types of issues. Most (all that I know of) modern gear is going to use per flow by default.

Eildor · January 2013

networker050184 wrote: »

Hopefully even if you do happen to load balance you are doing it per flow and not per packet to have to worry about those types of issues. Most (all that I know of) modern gear is going to use per flow by default.

That's exactly what I was thinking. I can't actually think how per packet load balancing would possibly work... it can't work, can it? It couldn't work inbound due to the nature of BGP.

Eildor · January 2013

Maybe this is a stupid question. I'm not good with BGP at all.

Why can't I simply send half the traffic to one ISP and the other half to the other ISP? I'll match half the network to one ISP, the other half to the other using a route-map. I advertise two routes, say 172.16.0.0/24 and 192.168.0.0/24 out of both routers. However R1 advertises 172.16.0.0 but prepends to ensure it isn't preferred. R2 advertises 192.168.0.0 but prepends to ensure it isn't preferred.

networker050184 · January 2013

That could work if you have enough IP space to have two seperate advertisements. You are going to need at least two /24s. It's one of my suggestions from above.

Eildor · January 2013

Where would you recommend the route-map is configured in a collapsed core design? I wonder how much impact it would have on network performance if you were to configure it at the distribution layer (before it is sent to a particular router).

How about if I was to configure the route-map on the edge routers? Would that be a better design even though it might initially get sent to the wrong place?

networker050184 · January 2013

Personally is do it on the core and have two seperate edge routers for the two WAN connections. Modern gear shouldn't have any performance issue with a simple PBR set up.

Eildor · January 2013

Ok, I was just wondering because this particular design is a collapsed core (2x 3750X) with over 1000 IP devices connected to it. I have no idea what kind of impact these things have on CPU utilsation; I guess it comes with experience.

networker050184 · January 2013

What you are concerned about is having your features implemented 'in hardware' rather than 'in software' aka not punted to the CPU for processing. I'm not too familiar with the 3750, but now a days most features can be programed into assics and performed at line rate. Things like tunnels, PBR, some QoS processing are things you need to take into consideration, but it's not as likely to find these types of things done in software on core or distribution caliber boxes.

Eildor · January 2013

networker050184 wrote: »

What you are concerned about is having your features implemented 'in hardware' rather than 'in software' aka not punted to the CPU for processing. I'm not too familiar with the 3750, but now a days most features can be programed into assics and performed at line rate. Things like tunnels, PBR, some QoS processing are things you need to take into consideration, but it's not as likely to find these types of things done in software on core or distribution caliber boxes.

Thanks friend. You are a wealth of knowledge

cisco_trooper · January 2013

You might as well forget about BGP unless you are large enough to get a direct IP allocation from ARIN for a full /24. There are minimum utilization requirements of that IP space as well.

https://www.arin.net/resources/request/ipv4_initial_assign.html
https://www.arin.net/resources/request/ipv4_add_assign.html

Another thing to consider for dual circuits if you can't get IP space you own is the different IP addresses you will be using over both circuits. Do you have inbound traffic? How are you going to manage the DNS changes if you have to failover to a redundant circuit? Are you willing to invest in Global Load Balancers to achieve super fast failover? or are you willing to suffer the delays of DNS propagation in a manual switchover? There is a lot of stuff to consider including the time to manage the BGP configuration if you are able to go that route.

Eildor · January 2013

cisco_trooper wrote: »

You might as well forget about BGP unless you are large enough to get a direct IP allocation from ARIN for a full /24. There are minimum utilization requirements of that IP space as well.

https://www.arin.net/resources/request/ipv4_initial_assign.html
https://www.arin.net/resources/request/ipv4_add_assign.html

Another thing to consider for dual circuits if you can't get IP space you own is the different IP addresses you will be using over both circuits. Do you have inbound traffic? How are you going to manage the DNS changes if you have to failover to a redundant circuit? Are you willing to invest in Global Load Balancers to achieve super fast failover? or are you willing to suffer the delays of DNS propagation in a manual switchover? There is a lot of stuff to consider including the time to manage the BGP configuration if you are able to go that route.

This is for my final year University project... so I can assign myself whatever IP addresses I want!

malcybood · January 2013

I've worked on networks where there is a shared service for internet breakout amongst multiple customers and the best way to do this is with BGP.

You can load balance inbound to the organisation by using AS Prepend or MED within BGP. If this is a project then you control the "ISP" configuration therefore you don't have to worry about attributes being stripped out i.e. some Tier 1 ISPs don't support MED but support AS Prepend communities for example.

So in short I would recommend AS Prepend or MED for inbound traffic and outbound you can use local preference within your own AS. Use communities if you can, but realise you mentioned you don't know BGP that well.

The attached PDF is a great resource and really well written - I used this as a reference when designing / implementing a solution for a government organisation who wanted some web facing services to operate out of the primary DC and some lower scale / priority resources to operate out of the DR DC. This worked perfectly for my scenario.

There is another method called conditional advertisement that you can use but I've only labbed / investigated this method so can't say how reliable it is.

Configuring and Verifying the BGP Conditional Advertisement Feature - Cisco Systems

I have always used BGP communities with AS prepend inbound and local preference or HSRP with interface tracking statements outbound.

You also have the option to go with DNS approach with DNS failover with Cisco GSS or F5 GTM load balancers. BGP failover is pretty decent, think it took about 5 - 15 seconds to failover when testing from a root server in Asia, to the two UK data centres I was working with.

7of9 · January 2013

You may be limited in what you can do with BGP depending on your agreement with your ISP. Depending on what service you bought, we may or may not peer with you, may or may not pass certain attributes, etc. Of course, we often respond well to being offered more money.

Load Balancing/Sharing Between Multiple ISPs

Comments