IP SLA implementation

_Gonzalo__Gonzalo_ Posts: 113Member
We have recently had problems at work with HSRP and IP SLAs. We connect through redundant links that are not controlled by us, and one of them has been dropping some packets intermittently. This caused HSRP active gateway to go standby, and after a short time it would recover again.

There is a single IP SLA configured, and when the network dropped these packets it was the only responsible of bringing the active HSRP switch down.

I was wondering if it would not be best to have two IP SLAs with increased timers synched, so small packet losses would not be that much of an issue.

An example:

One SLA that pings every 10 seconds and decreases 20 to priority
Two SLAs that ping every 10 seconds and decrease priority by 10. One would start 5 seconds after the other.

The only bad thing about it would be having another process running on the router, but I don´t think that having a process that sends a ping can be that costly...

So... Does it make sense? Have you seen it done somewhere?

Thanks in advance!

(I edited the timers)

Comments

  • sandman748sandman748 Posts: 104Member
    if its just a small amount of packet loss (less than three minutes) you can delay the object going down on the track statement

    sample config

    track x sla y
    delay down 90


    if you wanted to have a second sla to another site in case the object you are tracking is flaky you can also track a boolean list so that both sites have to stop responding to ping for the hsrp to fail over

    example

    track 1 ip sla 1
    track 2 ip sla 2
    track 3 list boolean or
    object 1
    object 2

    We use a combination of both for our dual ISP wan connection. So on track 3 we also have

    delay down 90 up 180

    The end result is that both responders have to be down for 90 seconds before the HSRP state change and back up again for 3 minutes before it changes back.

    Obviously those timers can be tweaked to your liking. It doesn't have to be that long before going down or back up.
    Working on CCIE Collaboration:
    Written Exam Completed June 2015 ~ 100 hrs of study
    Lab Exam Scheduled for Dec 2015
  • _Gonzalo__Gonzalo_ Posts: 113Member
    Thanks sandman748!icon_thumright.gif

    That list boolean is definitely something I´ll use. In fact, I checked today in detail and discovered that the whole hrsp tracking configuration is a mess, so I´m remaking it tomorrow. I´ll post it when it´s done.

    The timers will still have to be 10, but I think I have it almost clear. By the way, I think you wanted to type "and" instead of "or". :)
  • sandman748sandman748 Posts: 104Member
    I definitely meant to say OR not AND.

    The logic is if track 1 OR track 2 = UP then track 3 = UP

    I want both sites to be down before I flip the switch.

    If you use AND, one site going down will cause the list to be down.

    Depends on what behavior you are looking for.
    Working on CCIE Collaboration:
    Written Exam Completed June 2015 ~ 100 hrs of study
    Lab Exam Scheduled for Dec 2015
  • _Gonzalo__Gonzalo_ Posts: 113Member
    sandman748 wrote: »
    I definitely meant to say OR not AND

    That you do! Logic told me "AND", but I looked it up and realized that you were right.

    Thanks again! icon_thumright.gificon_thumright.gif It was really useful. I do not have the config at hand, but I´ll try to post it tonight.
  • _Gonzalo__Gonzalo_ Posts: 113Member
    Sorry for the delay... Config should end like this:


    On both


    ip sla monitor 101

    type echo protocol ipIcmpEcho 10.X.X.17

    frequency 10

    ip sla monitor schedule 101 life forever start-time X

    ip sla monitor 102

    type echo protocol ipIcmpEcho 10.X.X.18

    frequency 10

    ip sla monitor schedule 102 life forever start-time X+2 SECONDS

    ip sla monitor 103

    type echo protocol ipIcmpEcho 10.X.X.17

    frequency 10

    ip sla monitor schedule 103 life forever start-time X+5 SECONDS

    ip sla monitor 104

    type echo protocol ipIcmpEcho 10.X.X.18

    frequency 10

    ip sla monitor schedule 104 life forever start-time X+7/8 SECONDS


    and


    Track 101 rtr 101

    Track 102 rtr 102

    Track 104 rtr 103

    Track 104 rtr 104

    Track 100 list Boolean or

    object 101

    object 102

    object 103

    object 104




    On sw01a



    interface FastEthernet0/1

    ip address 2.X.X.91 255.255.255.0

    standby 99 ip 2.X.X.90

    standby 99 priority 129

    standby 99 preempt

    standby 99 preempt delay minimum 1

    standby 99 name TRACKHSRP1

    standby 99 track Tunnel0

    standby 99 track Tunnel1

    standby 99 track 100 decrement 20



    On sw01b



    interface FastEthernet0/1

    standby 99 ip 2.X.X.90

    standby 99 priority 110

    standby 99 preempt delay minimum 1

    standby 99 name TRACKHSRP1

    standby 99 track Tunnel10

    standby 99 track Tunnel11

    standby 99 track 100 decrement 20


    The other side of the tunnels would just react to tunnel failure like this:


    sw03a:

    standby 99 priority 109

    standby 99 track Tunnel0 decrement 12

    standby 99 track Tunnel10 decrement 8


    sw04a:

    standby 99 priority 100

    standby 99 track Tunnel1 decrement 10

    standby 99 track Tunnel11 decrement 10

    ***I edited the priorities&decrements...
  • d4nz1gd4nz1g Posts: 464Member
    Let me see if I understand: You have one active router for 2 sites?
    If so, they are converging due to hello packet loss, and you would have a split brain scenario.

    Keep in mind that this design is not recommended at all, the right one would be one active/standby per site (altough we run this where I am working at haha).
  • _Gonzalo__Gonzalo_ Posts: 113Member
    Hey!

    The first two are actually in two different sites, but share two HSRP instances. sw03 and sw04 are on a separate site and share other two instances. There are even more HSRP instances functioning on the network, and underneath there is a lot of L2 zones that we do not manage. Also, there is some traffic routed out of the tunnel, through a couple of routers more and a firewall (per path) that we also manage.

    I just limited it to a portion to simplify, as the other factors were not affecting this particular case, but basically it´s that.

    The fact that the instance number is the same is just due to being tired, hehehe (but would not affect, as they share L2 links by pairs:s1-s2, s3-s4)
Sign In or Register to comment.