Just witnessed my first routing loop

W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
Not that big of a deal but It's not something I've come across at previous jobs since I've never worked at a big data center with a complex network set up. I'm a jr sys admin so I'm more of a server guy but I've got a pretty decent amount of networkng knowledge. I initially got a"time to live exceeded" message from ping a few servers on a specific subnet and we pretty much figured it was due to a change that one of the networking guys made in order to fix another problem and it got me thinking about what was causing the message. After doing a little reading I ran a traceroute and verified that it was a routing loop. Not that important but it's nice to see what one looks like.

Comments

  • nerdydadnerdydad Member Posts: 261
    Wow, haven't seen one in production, and I've only seen it once by accident in a lab, many times when it was configured to loop. Were they redistributing between protocols? I have seen some broadcast storms when STP failed, that's pretty awesome except that it was in a production network.
  • networker050184networker050184 Mod Posts: 11,962 Mod
    I've seen them plenty of times in production. Once you get a large enough network with a lot of people making changes it's bound to happen!
    An expert is a man who has made all the mistakes which can be made.
  • nerdydadnerdydad Member Posts: 261
    That is where solid processes pays off. I'm not saying mistakes never happen, but by the time we make any changes on the network, those changes have been through a gauntlet of more experienced eyes looking at them. We often complain about all the process involved in even the simplest of changes, but in the end it is in the name of network stability. We also were doing very little redistribution, as we move to a different internal routing protocol I have noticed asymmetrical routing, but fortunately no loops. Being on the build side of the house, I hope to never see one in our current network, as it will mean that I or one of my coworkers caused it, otherwise it would be solved by operations.
  • Mrock4Mrock4 Banned Posts: 2,359 ■■■■■■■■□□
    I have seen a fair share of loops- most were indeed a misconfiguration. It's never good to see:

    tracert 1.1.1.1
    1- 10.25.1.1
    2- 10.16.19.5
    3- 10.25.1.1
    4- 10.16.19.5
    5- 10.25.1.1

    lol...the good news is 99% of the time the issue is with that second hop (which sends the route back), so often times a look at the routing table there can go a long way.

    Congrats on your first loop!
  • VAHokie56VAHokie56 Member Posts: 783
    I've seen them plenty of times in production. Once you get a large enough network with a lot of people making changes it's bound to happen!


    I caused one ! on my first real network gig had a project to set up dual EPL's for a site to site connection..ran them into a couple 6524's and peer'ed them up bgp to the main site...long story short I learned a a lot about route tagging and route maps that night at 4am on the fly...I miss my rookie days =P
    .ιlι..ιlι.
    CISCO
    "A flute without holes, is not a flute. A donut without a hole, is a Danish" - Ty Webb
    Reading:NX-OS and Cisco Nexus Switching: Next-Generation Data Center Architectures
  • XyroXyro Member Posts: 623
    I have mixed sentiments about ever seeing 1. I think it would be cool icon_cool.gif , but I also have a feeling I'll be the 1 everyone expects to fix it lol.

    Congrats on the 1st loop though!
  • CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
    I've witnessed it out at our sites in Utah. I found that reloading one of the routers invokes a temporary EIGRP routing loop that lasts for about 5 minutes.
    Currently reading: Network Warrior, Unix Network Programming by Richard Stevens
  • Mrock4Mrock4 Banned Posts: 2,359 ■■■■■■■■□□
    Seeing a loop isn't too bad- once you see it you've found the problem :)
  • W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    Networking didn't give us the details on what exactly happened. I figured it might have been misconfiguration but one of my co-workers thought that the server might have been sending spam and they intentionally looped it to cut it off from the network. I believe it was an entire subnet with at least two different customers though so that may not have been the case.
  • W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    CodeBlox wrote: »
    I've witnessed it out at our sites in Utah. I found that reloading one of the routers invokes a temporary EIGRP routing loop that lasts for about 5 minutes.


    That could have been it. When we called networking they said the issue should mitigate itself eventually so it's very possible that this is what happened. They seemed very convinced in the notes that the issue had aready been resolved.
  • RouteMyPacketRouteMyPacket Member Posts: 1,104
    Pfft!

    I have seen one in production alright...nothing like losing half a building due to one. God forbid someone configure STP on a switch. lol
    Modularity and Design Simplicity:

    Think of the 2:00 a.m. test—if you were awakened in the
    middle of the night because of a network problem and had to figure out the
    traffic flows in your network while you were half asleep, could you do it?
  • phoeneousphoeneous Member Posts: 2,333 ■■■■■■■□□□
    Mrock4 wrote: »
    I have seen a fair share of loops- most were indeed a misconfiguration. It's never good to see:

    tracert 1.1.1.1
    1- 10.25.1.1
    2- 10.16.19.5
    3- 10.25.1.1
    4- 10.16.19.5
    5- 10.25.1.1

    lol...the good news is 99% of the time the issue is with that second hop (which sends the route back), so often times a look at the routing table there can go a long way.

    Congrats on your first loop!

    This just happened to me last week! Misconfig in gre tunnel. Fun times!
  • networker050184networker050184 Mod Posts: 11,962 Mod
    nerdydad wrote: »
    That is where solid processes pays off. I'm not saying mistakes never happen, but by the time we make any changes on the network, those changes have been through a gauntlet of more experienced eyes looking at them. We often complain about all the process involved in even the simplest of changes, but in the end it is in the name of network stability. We also were doing very little redistribution, as we move to a different internal routing protocol I have noticed asymmetrical routing, but fortunately no loops. Being on the build side of the house, I hope to never see one in our current network, as it will mean that I or one of my coworkers caused it, otherwise it would be solved by operations.

    I understand the whole change control process, but it can't be perfect. If I'm reviewing one change and another guy is reviewing another we don't know about the other change and they could end up causing an issue if both done. Can't have one person see everything. The change control process usually takes weeks at a time to get written, reviewed adn then completed. Other things could have changed in that time where the change would have been flawless if not for some traffic reroute due to a bad circuit etc. Things happen!
    Mrock4 wrote: »
    Seeing a loop isn't too bad- once you see it you've found the problem :)

    Yeah, not seeing the loop is when you get in trouble!
    An expert is a man who has made all the mistakes which can be made.
  • ShamPOWShamPOW Member Posts: 6 ■□□□□□□□□□
    I saw a really interesting one just last week. A bit of setup..........I work for an ISP. This customer was a remote site for a VERY large chain of stores. We had assigned a /30 block to them, binding their username/WAN interface to their first useable IP of that subnet.

    The issue arose when it looks like their equipment was configured to be expecting their SECOND useable on that WAN interface.

    Here's where it got tricky, since I was actually talking to an offsite 3rd party IT group who had no real idea what was going on out there as far as equipment and configuration. Their WAN interface was pulling the first useable IP. Traceroutes to the second useable would route to that IP, bounce around a couple of hops on a private 10.x.x.x network, hit an IP that belonged to a /16 block owned by this chain of stores, through time warner, level3, then BACK to my ISP, and BACK to the site ad-infinitum (or 30 hops). I suspect that /16 was their VPN to the corporate office, but I havne't covered VPN's all that much yet so I could be completely wrong there.

    I saved a screenshot of the traceroute for posterity.
  • nerdydadnerdydad Member Posts: 261
    I understand the whole change control process, but it can't be perfect. If I'm reviewing one change and another guy is reviewing another we don't know about the other change and they could end up causing an issue if both done. Can't have one person see everything. The change control process usually takes weeks at a time to get written, reviewed adn then completed. Other things could have changed in that time where the change would have been flawless if not for some traffic reroute due to a bad circuit etc. Things happen!

    Absolutely, our process involves multiple types of calls depending on the type of change, it is a dedicated team within operations that review everything and when they look at the devices that you are making the changes on, they can see every other change that has been associated with that device. It's not fool proof and stuff happens, but I have been surprised by some of the things they have caught. I mean, in the end a routing loop is easily discovered, and once it is discovered, it is usually easily mitigated unless you have really crazy amounts of redistribution going on.
Sign In or Register to comment.