IP SLA implementation
We have recently had problems at work with HSRP and IP SLAs. We connect through redundant links that are not controlled by us, and one of them has been dropping some packets intermittently. This caused HSRP active gateway to go standby, and after a short time it would recover again.
There is a single IP SLA configured, and when the network dropped these packets it was the only responsible of bringing the active HSRP switch down.
I was wondering if it would not be best to have two IP SLAs with increased timers synched, so small packet losses would not be that much of an issue.
An example:
One SLA that pings every 10 seconds and decreases 20 to priority
Two SLAs that ping every 10 seconds and decrease priority by 10. One would start 5 seconds after the other.
The only bad thing about it would be having another process running on the router, but I don´t think that having a process that sends a ping can be that costly...
So... Does it make sense? Have you seen it done somewhere?
Thanks in advance!
(I edited the timers)
There is a single IP SLA configured, and when the network dropped these packets it was the only responsible of bringing the active HSRP switch down.
I was wondering if it would not be best to have two IP SLAs with increased timers synched, so small packet losses would not be that much of an issue.
An example:
One SLA that pings every 10 seconds and decreases 20 to priority
Two SLAs that ping every 10 seconds and decrease priority by 10. One would start 5 seconds after the other.
The only bad thing about it would be having another process running on the router, but I don´t think that having a process that sends a ping can be that costly...
So... Does it make sense? Have you seen it done somewhere?
Thanks in advance!
(I edited the timers)
Comments
-
sandman748 Member Posts: 104if its just a small amount of packet loss (less than three minutes) you can delay the object going down on the track statement
sample config
track x sla y
delay down 90
if you wanted to have a second sla to another site in case the object you are tracking is flaky you can also track a boolean list so that both sites have to stop responding to ping for the hsrp to fail over
example
track 1 ip sla 1
track 2 ip sla 2
track 3 list boolean or
object 1
object 2
We use a combination of both for our dual ISP wan connection. So on track 3 we also have
delay down 90 up 180
The end result is that both responders have to be down for 90 seconds before the HSRP state change and back up again for 3 minutes before it changes back.
Obviously those timers can be tweaked to your liking. It doesn't have to be that long before going down or back up.Working on CCIE Collaboration:
Written Exam Completed June 2015 ~ 100 hrs of study
Lab Exam Scheduled for Dec 2015 -
_Gonzalo_ Member Posts: 113Thanks sandman748!
That list boolean is definitely something I´ll use. In fact, I checked today in detail and discovered that the whole hrsp tracking configuration is a mess, so I´m remaking it tomorrow. I´ll post it when it´s done.
The timers will still have to be 10, but I think I have it almost clear. By the way, I think you wanted to type "and" instead of "or". -
sandman748 Member Posts: 104I definitely meant to say OR not AND.
The logic is if track 1 OR track 2 = UP then track 3 = UP
I want both sites to be down before I flip the switch.
If you use AND, one site going down will cause the list to be down.
Depends on what behavior you are looking for.Working on CCIE Collaboration:
Written Exam Completed June 2015 ~ 100 hrs of study
Lab Exam Scheduled for Dec 2015 -
_Gonzalo_ Member Posts: 113sandman748 wrote: »I definitely meant to say OR not AND
That you do! Logic told me "AND", but I looked it up and realized that you were right.
Thanks again! It was really useful. I do not have the config at hand, but I´ll try to post it tonight. -
_Gonzalo_ Member Posts: 113Sorry for the delay... Config should end like this:
On both
ip sla monitor 101
type echo protocol ipIcmpEcho 10.X.X.17
frequency 10
ip sla monitor schedule 101 life forever start-time X
ip sla monitor 102
type echo protocol ipIcmpEcho 10.X.X.18
frequency 10
ip sla monitor schedule 102 life forever start-time X+2 SECONDS
ip sla monitor 103
type echo protocol ipIcmpEcho 10.X.X.17
frequency 10
ip sla monitor schedule 103 life forever start-time X+5 SECONDS
ip sla monitor 104
type echo protocol ipIcmpEcho 10.X.X.18
frequency 10
ip sla monitor schedule 104 life forever start-time X+7/8 SECONDS
and
Track 101 rtr 101
Track 102 rtr 102
Track 104 rtr 103
Track 104 rtr 104
Track 100 list Boolean or
object 101
object 102
object 103
object 104
On sw01a
interface FastEthernet0/1
ip address 2.X.X.91 255.255.255.0
standby 99 ip 2.X.X.90
standby 99 priority 129
standby 99 preempt
standby 99 preempt delay minimum 1
standby 99 name TRACKHSRP1
standby 99 track Tunnel0
standby 99 track Tunnel1
standby 99 track 100 decrement 20
On sw01b
interface FastEthernet0/1
standby 99 ip 2.X.X.90
standby 99 priority 110
standby 99 preempt delay minimum 1
standby 99 name TRACKHSRP1
standby 99 track Tunnel10
standby 99 track Tunnel11
standby 99 track 100 decrement 20
The other side of the tunnels would just react to tunnel failure like this:
sw03a:
standby 99 priority 109
standby 99 track Tunnel0 decrement 12
standby 99 track Tunnel10 decrement 8
sw04a:
standby 99 priority 100
standby 99 track Tunnel1 decrement 10
standby 99 track Tunnel11 decrement 10
***I edited the priorities&decrements... -
d4nz1g Member Posts: 464Let me see if I understand: You have one active router for 2 sites?
If so, they are converging due to hello packet loss, and you would have a split brain scenario.
Keep in mind that this design is not recommended at all, the right one would be one active/standby per site (altough we run this where I am working at haha). -
_Gonzalo_ Member Posts: 113Hey!
The first two are actually in two different sites, but share two HSRP instances. sw03 and sw04 are on a separate site and share other two instances. There are even more HSRP instances functioning on the network, and underneath there is a lot of L2 zones that we do not manage. Also, there is some traffic routed out of the tunnel, through a couple of routers more and a firewall (per path) that we also manage.
I just limited it to a portion to simplify, as the other factors were not affecting this particular case, but basically it´s that.
The fact that the instance number is the same is just due to being tired, hehehe (but would not affect, as they share L2 links by pairs:s1-s2, s3-s4)