Options

Help -- IP SLA Configuration to Monitor a Remote Peer

Danielh22185Danielh22185 Member Posts: 1,195 ■■■■□□□□□□
So I'll try to keep this brief. A while back at work we ran into an instance where we had a backup IPSEC tunnel go down at one of our remote sites and we really didn't know about it. We actually have the design throughout all of or network remote sites. These IPSEC tunnels are Palo-to-Palo tunnels, so once configured, while at the interface level a Palo never shows a downstate. Also, we use Solarwinds which does not have a lot of Palo integration yet. So really these tunnels could go down at any time and we don't really know about it. I initially thought "hey I can configure a simple IP SLA to run an icmp-echo between the tunnel end-points". Well... turns out this is not terribly reliable. It works but I get a lot of noise. These IPSEC SLA tunnels run over the public internet, so many times (mostly in less developed regions of the world), my SLAs go down sometimes not even a minute apart and come back up. Which is causing me noise in alarming (I had to create a tracking event to monitor the icmp-echo response failure and log that as a tracked EEM event to generate a SNMP-TRAP so I could get some form of an alarm).

Anywho now I am trying to tune this. I have come to the conclusion of, I don't care if a failure occurs and recovers within 10 minutes. Like I mentioned most of these have been under 1-2 min fail / recover events. I have a lab where I am testing my new tuning results but I can't seem to find anything that really gives me much inside the IP SLA config to adjust these parameters the way I want it. Solarwinds is also not a great help here either as their SNMP Trap engine does not have a lot of tuning options.

I am thinking maybe I can instead create some non-routable GRE tunnels that I can use just as a means to monitor the connection path and tune those individually with long keepalives.

Ideas?
Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi

Comments

  • Options
    Danielh22185Danielh22185 Member Posts: 1,195 ■■■■□□□□□□
    EDIT:

    I forgot to mention that the IP SLAs are being run from the down steam switch (under the Palo where the actual IPSEC tunnel exists) to the far-end IPSEC palo end-point. Also never mind the tunnel mention. I effectively create the same problem with long keepalive timers with a tunnel.

    What I need to consider is if there is a way to code EEM to monitor the events on a per minute basis, start a timer for a down event, and to only take actions on an event that remains down beyond 10 minutes.

    This all would be so much easier if I could just plug in these same threshold requirements into Solarwinds Trap Manager... but it's not nearly that sophisticated. Atleast from what I can tell so far...
    Currently Studying: IE Stuff...kinda...for now...
    My ultimate career goal: To climb to the top of the computer network industry food chain.
    "Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
  • Options
    d4nz1gd4nz1g Member Posts: 464
    hey daniel,

    you can tune your ip sla to generate a trap message when a certain threshold occurs (for example, 10 operations failed).

    never tested it, but it looks like it is capable of doing this.

    also, you can monitor the IP SLA operation OID on an EEM and customize the reactions you would like to implement.

    edit: forgot to add the link lol

    https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipsla/configuration/15-mt/sla-15-mt-book/sla_threshold_mon-0.html
  • Options
    Danielh22185Danielh22185 Member Posts: 1,195 ■■■■□□□□□□
    So I was WAY overthinking this trying to do something inside EEM . I accomplished what I wanted by implementing some delay timers on the actual IP SLA. The config allows for a variance of some ping loss over the Internet and would have to continue a failure rate of every 30 seconds over 180 seconds to finally mark my connection down. Here is the config I used to accomplish it if anybody is curious:

    The items I had to change were simply the delay timers on the tracking object (which I didn't have before) and I slimmed down the frequency of the SLA to 30 seconds as well as extended the history fo the stats kept for 24hrs.

    track 1 ip sla 1 reachability
    delay down 180 up 90
    !
    ip sla 1
    icmp-echo 1.1.1.1
    frequency 30
    history hours-of-statistics-kept 24
    ip sla schedule 1 life forever start-time now
    !
    !
    event manager applet TRACK_VPN_DOWN
    event syslog pattern "ip sla 1 reachability Up -> Down"
    action 1.0 syslog msg "Internet VPN Connection is Down"
    action 2.0 snmp-trap strdata "Internet VPN Connection is Down"
    action 3.0 exit
    event manager applet TRACK_VPN_UP
    event syslog pattern "ip sla 1 reachability Down -> Up"
    action 1.0 syslog msg "Internet VPN Connection is Up"
    action 2.0 snmp-trap strdata "Internet VPN Connection is Up"
    action 3.0 exit

    I then paired the snmp-trap received on my Solarwinds Orion server to send me an email to let me know the connection is down.
    Currently Studying: IE Stuff...kinda...for now...
    My ultimate career goal: To climb to the top of the computer network industry food chain.
    "Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
Sign In or Register to comment.