Solving a network/ethernet loop

Dr_AtomicDr_Atomic Member Posts: 184
If you're experiencing an ethernet loop on a switch(es), what's the troubleshooting process for this? I remember this in my CCNA studies, but it doesn't come to me. I'm researching it, also, in the meantime. Thanks.

Comments

  • /usr/usr Member Posts: 1,768
    I suppose the first troubleshooting step to verify a loop would be traffic capturing. You could make a safe assumption based on which segments are experiencing throughput issues, based on user feedback or abnormal activity LED's, but the only sure way that I know of is to capture duplicate frames.

    The first plan of action is to break the loop, or manually disable all redundant network paths, then re-enable them one at a time, hopefully finding the cause of the loop in the process.

    Other people might chime in with other, more valuable input.
  • tierstentiersten Member Posts: 4,505
    Why don't you have STP enabled? :P
  • chrisonechrisone Member Posts: 2,278 ■■■■■■■■■□
    Although there are many methods, which i will admit i dont know them all, i would suggest checking the spanning tree, roots, and see if you see any ports flapping stp states. Start narrowing it down from there. Also check the MAC address tables, you probably shouldnt be seeing a bunch of MACs or a root port on a port where a computer/host should be connected to.

    Rule of thumb is:

    Use RSTP
    Use PORTFAST (only on hosts, non uplinks)
    Configure Globally BPDU Gaurd
    Configure Globally Root Gaurd

    I cant think of anything else at the moment. Maybe someone can chime in on some tips. icon_thumright.gif
    Certs: CISSP, EnCE, OSCP, CRTP, eCTHPv2, eCPPT, eCIR, LFCS, CEH, SPLK-1002, SC-200, SC-300, AZ-900, AZ-500, VHL:Advanced+
    2023 Cert Goals: SC-100, eCPTX
  • NetwurkNetwurk Member Posts: 1,155 ■■■■■□□□□□
    I once caused a huge loop by bringing down a port channel without first doing a shut on all the ports. Luckily I did this at home and not work.

    Every light on every port in my lab started blinking like crazy. Luckily I knew what caused it and did a quick no shut on the ports that were formerly part of the channel.

    So for troubleshooting a loop, let's put it this way - if you've got one, you'll know it.
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    If a loop develops, you'll know it pretty quick, your network will come to a screeching halt. Depending on the switches involved, you may no longer be able to get access to them, even from the console, because their CPU is probably pegged. I've only seen a loop in production once, and when I found out my switches were unaccessible from remote, I checked the syslog server to see which interface came up last, and had someone on site pull the cable from that port.
  • wolverene13wolverene13 Member Posts: 87 ■■□□□□□□□□
    tiersten wrote: »
    Why don't you have STP enabled? :P

    I get this question all the time at work when customers cause broadcast storms on our Ethernet network. I'm not sure about the original poster's situation, but in our network, we have a Cisco 3400-ME switch at each customer's site so that we have end-to-end visibility to the customer's circuit. The customer plugs their equipment into the dot1q-tunnel ports on the ME-3400. We use QinQ tunneling, which essentially wraps the customer's VLAN inside our VLAN that we've assigned to that customer and we are transparent to the customer and the customer's traffic is invisible to us, aside from the bandwidth it eats up. So, whatever they put inside the tunnel is of no concern to us. However, there is a downside to all this. While we do run STP, the STP instance only applies to our network, not the customers'. So, if the customer creates a loop on their network and the traffic leaves that site and traverses the Ethernet cloud to another one of their sites, our STP doesn't see it because it doesn't have an instance for the customer's internal VLAN, being that we don't even know it exists. This manifests itself on our network in the form of overutilized trunks. As a result, trunks start bouncing and all our other customers in that particular area of that particular state are affected because our STP has to keep reconverging. So, our STP works, but that does no good to stop loops on our customers' networks. Think of it like a swarm of angry bees inside a tree trunk. There's a lot of craziness going on inside the tree, but you can't see it because you're on the outside. Normally in this situation, we try to find a point of commonality between all of the customers who call in when this is happening and correlate it with the network alarms we get through Netcool (our network monitoring system). Starting from that point, we find out which trunks are being killed with traffic, and follow that traffic until we get to the Cisco 7609 that has the offending customer on it. If we notice a customer with with a 1-Gig circuit is using 996 Megs or something, we shut the port down to restore service and advise the customer that they need to fix the issue before we turn them back up.
    Currently Studying: CCIP - 642-611 - MPLS
    Occupation: Tier II NOC Tech - Centurylink
    CCIP Progress: [x] BSCI
    [x] BGP
    [ ] MPLS
    [ ] QoS
  • peanutnogginpeanutnoggin Member Posts: 1,096 ■■■□□□□□□□
    Dr_Atomic wrote: »
    If you're experiencing an ethernet loop on a switch(es), what's the troubleshooting process for this? I remember this in my CCNA studies, but it doesn't come to me. I'm researching it, also, in the meantime. Thanks.

    You should disable spanning-tree on your switches, plug them up with two crossover cables, plug in a PC (to generate some broadcast packets) and have fun... you can also use a tool like yersinia to generate some STP attack packets that will essentially have the same effect on the switches (slowing them down to a crawl). That's what a home lab is for... happy "geeking"! icon_thumright.gif HTH.

    -Peanut
    We cannot have a superior democracy with an inferior education system!

    -Mayor Cory Booker
  • creamy_stewcreamy_stew Member Posts: 406 ■■■□□□□□□□
    You should disable spanning-tree on your switches, plug them up with two crossover cables, plug in a PC (to generate some broadcast packets) and have fun... you can also use a tool like yersinia to generate some STP attack packets that will essentially have the same effect on the switches (slowing them down to a crawl). That's what a home lab is for... happy "geeking"! icon_thumright.gif HTH.

    -Peanut

    Hah, I did this with a couple cisco switches. If I turned down storm control to about 2Mbit rising and 1 sinking(?), They could just handle the load before croaking. I ended up using half those values, though. (Residential/college broadband) BPDU-filter on access ports.
    Itchy... Tasty!
    [X] DCICN
    [X] IINS

    [ ] CCDA
    [ ] DCICT
  • NetwurkNetwurk Member Posts: 1,155 ■■■■■□□□□□
    I ran a layer 2 MAC flooding attack from a linux box on several of my switches while I was labbing BCMSN. It's a good way to see how necessary port security is. With no security, the only unsecured switch that could keep going against the attack was my old CatOS 2926. The reason I think was its relatively huge mac address table. It just refused to go down despite looping endless flood commands its way.

    My 3550's, 2950's, 3500's, and 2900's were dead in the water in less than a minute.

    I was going to name the tool I used but some idiot would then download it and become an instant hacker.

    :)
  • Dr_AtomicDr_Atomic Member Posts: 184
    I gotten a lot of good responses, but so far it's all pretty much theory. Could someone give me some commands I could input and check to see what I should/shouldn't see from them? Like a step-by-step check of things to look for? What would be some sample commands to use to check for loop issues?
  • chrisonechrisone Member Posts: 2,278 ■■■■■■■■■□
    read my post, i gave you some tips on how to help prevent loops. Its your job to get the commands and read up exactly what they do. They are simple and self explanatory if you understand Spanning Tree. If you have access to the CLI , look at the STP states.
    Certs: CISSP, EnCE, OSCP, CRTP, eCTHPv2, eCPPT, eCIR, LFCS, CEH, SPLK-1002, SC-200, SC-300, AZ-900, AZ-500, VHL:Advanced+
    2023 Cert Goals: SC-100, eCPTX
  • peanutnogginpeanutnoggin Member Posts: 1,096 ■■■□□□□□□□
    Dr_Atomic wrote: »
    I gotten a lot of good responses, but so far it's all pretty much theory. Could someone give me some commands I could input and check to see what I should/shouldn't see from them? Like a step-by-step check of things to look for? What would be some sample commands to use to check for loop issues?

    Dr. Atomic,

    You're right... everyone is giving you theory and as Chrisone said... it's up to you to research and see how/when to use tools. That's a part of the learning curve. What you have to realize is that when someone gives you the step by step instructions... all you're going to typically learn is what they show you. Be adventurous... if you have a home lab, backup your configs (if you want to preserve them) and then start playing around. Change some of the modes of spanning-tree, change some of the port cost, priority values, etc... disable spanning-tree on links... as you go through these different configurations, document what you do and what you find... you'd be quite surprised with the amount of information you'll learn. HTH.

    -Peanut
    We cannot have a superior democracy with an inferior education system!

    -Mayor Cory Booker
  • NetwurkNetwurk Member Posts: 1,155 ■■■■■□□□□□
    Dr_Atomic wrote: »
    I gotten a lot of good responses, but so far it's all pretty much theory. Could someone give me some commands I could input and check to see what I should/shouldn't see from them? Like a step-by-step check of things to look for? What would be some sample commands to use to check for loop issues?

    When a loop occurs, you have limited time to track it down. After a few minutes, you might not even be able to get to the console on your devices. Set up a syslog server so that you can troubleshoot from there if all the network equipment gets pegged.

    Syslog is very easy to configure. Just get syslog running on a server and use the global IOS command logging x.x.x.x on all your devices.
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    Netwurk wrote: »
    When a loop occurs, you have limited time to track it down. After a few minutes, you might not even be able to get to the console on your devices. Set up a syslog server so that you can troubleshoot from there if all the network equipment gets pegged.

    Syslog is very easy to configure. Just get syslog running on a server and use the global IOS command logging x.x.x.x on all your devices.

    Yup, like I said above, the one broadcast storm I've seen in production, by the time I was made aware of it, my switches weren't accessible due to CPU usage, even from the console. I had someone on site go check the syslog server and read me the last few interface events before the crap hit the fan. It was the quickest way to narrow down the problem.
  • Dr_AtomicDr_Atomic Member Posts: 184
    I"m checking a production network, so I can't experiment with a server at the moment. I've input every conceivable command I can think of to check this problem. If someone could deign to provide some commands and what to look for, it would be nice.

    In other words, how would one know from being logged into a switch if there *was* a loop present causing a problem? From what command would one see the problem?
  • mikej412mikej412 Member Posts: 10,086 ■■■■■■■■■■
    Dr_Atomic wrote: »
    I"m checking a production network, so I can't experiment with a server at the moment. I've input every conceivable command I can think of to check this problem.
    Um.... what commands were those?

    Do you have a network diagram that accurately lists the redundant links? Shutdown the known redundant links.

    Find any incorrect redundant links created by an idiot randomly plugging in network cables in a wiring closet and misconfiguring switch ports using the show cdp neighbor command.

    If it's been a day and you can still log into the switches then you either don't have a loop -- or you have some nice switches and storm control enabled or maybe a loop limited to just one VLAN (or two). What's the hardware? What's the topology? Did you run the show spanning-tree command (and use any of the options)? What version(s) of spanning tree are you running?

    How are you logging? Syslog? Local Logs? Is logging to the console turned off?
    :mike: Cisco Certifications -- Collect the Entire Set!
  • Dr_AtomicDr_Atomic Member Posts: 184
    mikej412 wrote: »
    Do you have a network diagram that accurately lists the redundant links? Shutdown the known redundant links.

    Find any incorrect redundant links created by an idiot randomly plugging in network cables in a wiring closet and misconfiguring switch ports using the show cdp neighbor command.

    So if I *do* have a redundant link, I could do a sh cdp neighbor and it would show it there? Then I could just disconnect that link to see if it helps the issue?
  • networker050184networker050184 Mod Posts: 11,962 Mod
    No offense, but it sounds like you are in WAY over your head here if you don't even know how to find redundant links. Is there not a more knowledgeable staff member you can talk to? You are probably going to make things worse if you are just winging it man.
    An expert is a man who has made all the mistakes which can be made.
  • Dr_AtomicDr_Atomic Member Posts: 184
    No offense, but it sounds like you are in WAY over your head here if you don't even know how to find redundant links. Is there not a more knowledgeable staff member you can talk to? You are probably going to make things worse if you are just winging it man.

    It's just me, pal. I've been thrown to the wolves on this one.
  • creamy_stewcreamy_stew Member Posts: 406 ■■■□□□□□□□
    Netwurk wrote: »
    I ran a layer 2 MAC flooding attack from a linux box on several of my switches while I was labbing BCMSN. It's a good way to see how necessary port security is. With no security, the only unsecured switch that could keep going against the attack was my old CatOS 2926. The reason I think was its relatively huge mac address table. It just refused to go down despite looping endless flood commands its way.

    My 3550's, 2950's, 3500's, and 2900's were dead in the water in less than a minute.

    I was going to name the tool I used but some idiot would then download it and become an instant hacker.

    :)

    Well I did have port security set, so mac overflow wasn't an issue, still CPU util went up like crazy.

    Also, macof! There, I dun did it!

    Information wants to be free goddammit :)
    Itchy... Tasty!
    [X] DCICN
    [X] IINS

    [ ] CCDA
    [ ] DCICT
  • mikej412mikej412 Member Posts: 10,086 ■■■■■■■■■■
    Dr_Atomic wrote: »
    So if I *do* have a redundant link, I could do a sh cdp neighbor and it would show it there? Then I could just disconnect that link to see if it helps the issue?
    Exactly what is the issue. If you did have a loop and misconfigured/disabled STP you probably wouldn't be able to access the switches.

    If you don't have a (current) network diagram, then you should be able to map the Cisco equipment (and links) using the show cdp command. Of course you'd also check the configurations first to see if CDP had been disabled anywhere before you waste time drawing an incomplete network map.

    But if you did have a good network diagram, you'd probably want to first look for redundant links that shouldn't be there.
    :mike: Cisco Certifications -- Collect the Entire Set!
  • NetwurkNetwurk Member Posts: 1,155 ■■■■■□□□□□
    Also, macof! There, I dun did it!

    Information wants to be free :)

    My macof experiment post was really off-topic for this thread, but at the time I mistakenly thought the loop issue was solved. Oops.

    Although I'm not really sure if our buddy even has a loop issue, but who know?

    Oliver, how big a network are we talking about? What is making you think you have a loop?

    Found you a Cisco page with loop advice, the section "Troubleshooting Forwarding Loops" should be helpful.

    Troubleshooting STP on Catalyst Switches Running Cisco IOS System Software - Cisco Systems
  • mikej412mikej412 Member Posts: 10,086 ■■■■■■■■■■
    Netwurk wrote: »
    What is making you think you have a loop?
    Or has he already moved on to a different thread/problem theory and not told us here?

    From this thread: http://www.techexams.net/forums/ccnp/58713-bouncing-ports.html
    Dr_Atomic wrote: »
    I was told there might be a network loop


    show interface summary and look for anything out of the ordinary (assuming you know what would be ordinary)

    show interface counters (ditto)
    :mike: Cisco Certifications -- Collect the Entire Set!
  • wolverene13wolverene13 Member Posts: 87 ■■□□□□□□□□
    Dr_Atomic wrote: »
    I"m checking a production network, so I can't experiment with a server at the moment. I've input every conceivable command I can think of to check this problem. If someone could deign to provide some commands and what to look for, it would be nice.

    In other words, how would one know from being logged into a switch if there *was* a loop present causing a problem? From what command would one see the problem?

    "show interfaces" is all you really need. Then you look for ports that are maxed out. Those ports are where the traffic caused by the loop is. If you see maxed out input traffic on a trunk (meaning the loop traffic is coming into that device from somewhere else), go to the device on the other end of the trunk and issue a "show interfaces" command on that device. Keep doing this until you reach a device that only has maxed out output traffic on the trunks. This means that the culprit is directly connected to the device you are currently logged into and the loop traffic is originating on the device you are currently logged into, so you then check traffic on access ports. Once you find a maxed out access port, you know that the device or host connected to it is what is causing the loop. "show log" will also help in this scenario. I lot of times you'll see MAC flapping messages in the logs on the device where the loop is occurring because the switch is seeing the same MAC on two different ports.
    Currently Studying: CCIP - 642-611 - MPLS
    Occupation: Tier II NOC Tech - Centurylink
    CCIP Progress: [x] BSCI
    [x] BGP
    [ ] MPLS
    [ ] QoS
  • chmorinchmorin Member Posts: 1,446 ■■■■■□□□□□
    In my experiance most broadcast storms are caused on the access layer by an unmanaged switch that the staff was not aware of and some users decided to plug in the loose cables. The means for troubleshooting this over a large WAN scale from a distance is dirty. At my old high school they had an issue where a 1st grade student accidentally did what I mentioned above. They literally had to bring down segments of the ISD's network until they isolated the issue.

    Though they couldn't make it any worse by doing this, since the storm had literally haulted the entire network.

    If it is not an access layer issue I would begin to investigate STP.

    If it has been going on for some hours now and you can still make changes to all of your equipment, it is probably not a broadcast storm.

    What exactly are you experiancing? I think you are mis-diagnosing the issue here.
    Currently Pursuing
    WGU (BS in IT Network Administration) - 52%| CCIE:Voice Written - 0% (0/200 Hours)
    mikej412 wrote:
    Cisco Networking isn't just a job, it's a Lifestyle.
  • wolverene13wolverene13 Member Posts: 87 ■■□□□□□□□□
    chmorin wrote: »
    In my experiance most broadcast storms are caused on the access layer by an unmanaged switch that the staff was not aware of and some users decided to plug in the loose cables.

    Amen to that. It's always Sally from Accounting who wanted to extend the network to plug in her laptop, or Bill from Sales who walked by a switch and said "Hey, what's this cable doing unplugged? I should probably plug that back in!"
    Currently Studying: CCIP - 642-611 - MPLS
    Occupation: Tier II NOC Tech - Centurylink
    CCIP Progress: [x] BSCI
    [x] BGP
    [ ] MPLS
    [ ] QoS
Sign In or Register to comment.