How to tell of device failure?

notgoing2failnotgoing2fail Member Posts: 1,138
Maybe this is appropriate for the CCNA forum.

But how does one know when their router/switch is failing? Or if a module is failing?

Does anyone have a checklist or guide to check against for known behaviors?

Comments

  • tierstentiersten Member Posts: 4,505
    Maybe this is appropriate for the CCNA forum.

    But how does one know when their router/switch is failing? Or if a module is failing?

    Does anyone have a checklist or guide to check against for known behaviors?
    Thats a pretty vague checklist really. Does the device do what its supposed to do? Yes/No Any errors appearing in the logs? Error counters starting to increment? Is it on fire?

    Some devices have built in diagnostics but generally you can't and shouldn't run those during operation.
  • hexemhexem Member Posts: 177
    For either a switch or router you have the POST (power on self test) which is run first from ROM when first booting up the device, this run's check's through the hardware, such as memory, interfaces etc.

    On a switch if system LED is amber, you have a problem, POST has failed and you will have a dead switch on you're hands and better have a backup :)

    On router interfaces, keeping a check on the interface counters for errors, you should look this up and be familiar with them and what each of them mean, especially for dealing with broadcast storms and other nasties.

    here's some guides to look through:

    http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a008015bfd6.shtml

    http://support.verio.com/documents/view_article.cfm?doc_id=482
    ICND1 - Passed 25/01/10
    ICND2 - Passed 9/03/10

    Studying CCNA:S
  • notgoing2failnotgoing2fail Member Posts: 1,138
    tiersten wrote: »
    Thats a pretty vague checklist really. Does the device do what its supposed to do? Yes/No Any errors appearing in the logs? Error counters starting to increment? Is it on fire?

    Some devices have built in diagnostics but generally you can't and shouldn't run those during operation.

    Yeah I figured it was a little vague. I was hoping if there were any de facto things that are definitely failures to look out for. Especially when remoting into the device...

    hexem wrote: »
    For either a switch or router you have the POST (power on self test) which is run first from ROM when first booting up the device, this run's check's through the hardware, such as memory, interfaces etc.

    On a switch if system LED is amber, you have a problem, POST has failed and you will have a dead switch on you're hands and better have a backup :)

    On router interfaces, keeping a check on the interface counters for errors, you should look this up and be familiar with them and what each of them mean, especially for dealing with broadcast storms and other nasties.

    here's some guides to look through:

    Troubleshooting Switch Port and Interface Problems - Cisco Systems

    Cisco Show Interface Reference


    Thanks I'll take a look. I know that over time the experienced will be able to quickly or at least hopefully quickly determine what is failing compared to just bad broadcast/collisions, mismatched encapsulation types etc...
  • hexemhexem Member Posts: 177
    Obviously there's alot of error messages specific to thing's other than just hardware issues, you'll come accross stuff that you've never seen before, plenty of thing's like duplex mismatch, native vlan mismatch...just to name a few off the top of my head...it's one of those thing's you learn over the time, just remember if you're working on a vty line through ssh/telnet you won't see console messages, you need to type 'terminal monitor' and to turn it off 'term no mon'

    learning how to use the debug commands will help alot, especially with thing's like ppp, nat, inspects, rip, eigrp, ospf......list goes on lol!
    ICND1 - Passed 25/01/10
    ICND2 - Passed 9/03/10

    Studying CCNA:S
  • notgoing2failnotgoing2fail Member Posts: 1,138
    hexem wrote: »
    Obviously there's alot of error messages specific to thing's other than just hardware issues, you'll come accross stuff that you've never seen before, plenty of thing's like duplex mismatch, native vlan mismatch...just to name a few off the top of my head...it's one of those thing's you learn over the time, just remember if you're working on a vty line through ssh/telnet you won't see console messages, you need to type 'terminal monitor' and to turn it off 'term no mon'

    learning how to use the debug commands will help alot, especially with thing's like ppp, nat, inspects, rip, eigrp, ospf......list goes on lol!


    That's what I figured. Isn't there a really dangerous debug command that you shouldn't use?

    Something like debug ip all? Or something crazy like that that can shut down the router immediately if it's already being hammered?
  • hexemhexem Member Posts: 177
    Yeh in a production enviroment with alot of traffic that will definitely cause the CPU usage to spike and can cause the router to crash.

    I was helping a friend remotely on his router setting up access to a SIP (voice server) behind his router and started debugging ip packet's and inspect rules and took it down :) hah, live and learn.
    ICND1 - Passed 25/01/10
    ICND2 - Passed 9/03/10

    Studying CCNA:S
  • notgoing2failnotgoing2fail Member Posts: 1,138
    hexem wrote: »
    Yeh in a production enviroment with alot of traffic that will definitely cause the CPU usage to spike and can cause the router to crash.

    I was helping a friend remotely on his router setting up access to a SIP (voice server) behind his router and started debugging ip packet's and inspect rules and took it down :) hah, live and learn.

    I can't wait to get into CCNA Voice, that sounds like such a fun topic. Hopefully I won't have to pay thousands of dollars just to get equipment to practice on.

    The closet I've gotten with VOIP is setting up my own asterisk server a couple of years ago with FXS/FXO cards in my HP DL 380. It was really fun and it surprisingly worked, but I'd never want to do such a hack setup like that again...
  • tierstentiersten Member Posts: 4,505
    That's what I figured. Isn't there a really dangerous debug command that you shouldn't use?

    Something like debug ip all? Or something crazy like that that can shut down the router immediately if it's already being hammered?
    You have to exercise some caution when dealing with the debug commands. It is possible to completely wedge the router because you're pegging the CPU at 100% as its just constantly generating debug output. If you're going to be doing anything then make sure its during a maintenance period and that you can recover if it does go badly.

    The "reload in 5" command will be your friend if you don't have OOB remote access to a device.
  • Dilbert65Dilbert65 Member Posts: 73 ■■□□□□□□□□
    It does not matter what you are trying to fix ( MS, Apple, Linux, Cisco, 3com,very long list) troubleshooting is at a best an art form. Take little steps at first. The old 50/50 rule can help alot.

    The 50/50 rule is

    1. at the 1/2 way test to see if you are getting the results you expect. IF so jump the the 75 % and test again. If not jump to 25% and test again. You can see where this is going. Everybody will develop thier own style of debugging, just takes time.

    Once you find the problem area go slow. With cisco "debug" command should be you last thing you use ( except in a lab), I say this cause how CPU intensive debugs are. In fact the debug command gets priority 1 for cpu.

    Check the configs, then double check the configs, then have somebody else look at it. If you feel the need to debug then do it during the off load time if possible. Also make sure you are close to the router in case to have to become a reboot specialist and power off/on to get control back.

    In my home lab I have run debugs that I have lost control of a router and had to yank the power to get control back. Debug's are a very powerful tool but use with care.
  • CiskHoCiskHo Member Posts: 188
    Maybe this is appropriate for the CCNA forum.

    But how does one know when their router/switch is failing? Or if a module is failing?

    Does anyone have a checklist or guide to check against for known behaviors?
    Lots of good advice already posted but my input:

    "show diag" will list installed modules. If something is shown as "unknown" then either it doesn't work in that type chassis, isn't compatible with the installed IOS, or may just be straight up broken/dead.

    Watching the messages (SNMP traps?) shown while consoled into the device will let you know when things like fans have failed. Or you can run "show enviroment" (temp/fan) to see specific details.

    Someone mentioned that if the sys light on a switch is amber then you have a problem and POST has failed. Well, it does indicate that their is an issue but it doesn't mean the switch itself is useless. I just got a 3524XLPoE and upon power up I discovered that the sys light was amber. Consoled into it and saw "Fan #5 Failure" message. Switch still runs, pings, configs, etc. Knowing how the LEDs can display messages can be a big help in the real world but I don't think you will see too much of that stuff on the exams. More likely to be asked about it in an interview.

    Personally, I like to setup a syslog server on my network and have all devices relay their messages to that so you can go to one place and see most of the issues instead of having to log in to multiple devices and read several different logs. Obviously this doesn't help for a device that loses connectivity though.
    My Lab Gear:
    2811(+SW/POE/ABGwifi/DOCSIS) - 3560G-24-EI - 3550-12G - 3550POE - (2) 2950G-24 - 7206VXR - 2651XM - (2) 2611XM - 1760 - (2) CP-7940G - ESXi Server

    Just Finished: RHCT (1/8/11) and CCNA:S (Fall 2010)
    Prepping For: VCP and CCNP SWITCH, ROUTE, TSHOOT
  • notgoing2failnotgoing2fail Member Posts: 1,138
    tiersten wrote: »

    The "reload in 5" command will be your friend if you don't have OOB remote access to a device.

    Thanks, I am going to write this one down...never even heard of this command before. Well actually I've heard of reload, but not that you can adjust it....


    CiskHo wrote: »
    Lots of good advice already posted but my input:

    "show diag" will list installed modules. If something is shown as "unknown" then either it doesn't work in that type chassis, isn't compatible with the installed IOS, or may just be straight up broken/dead.

    Watching the messages (SNMP traps?) shown while consoled into the device will let you know when things like fans have failed. Or you can run "show enviroment" (temp/fan) to see specific details.

    Someone mentioned that if the sys light on a switch is amber then you have a problem and POST has failed. Well, it does indicate that their is an issue but it doesn't mean the switch itself is useless. I just got a 3524XLPoE and upon power up I discovered that the sys light was amber. Consoled into it and saw "Fan #5 Failure" message. Switch still runs, pings, configs, etc. Knowing how the LEDs can display messages can be a big help in the real world but I don't think you will see too much of that stuff on the exams. More likely to be asked about it in an interview.

    Personally, I like to setup a syslog server on my network and have all devices relay their messages to that so you can go to one place and see most of the issues instead of having to log in to multiple devices and read several different logs. Obviously this doesn't help for a device that loses connectivity though.


    What do you use for your syslog server? I mean, what application? Or do you use some kind of Ubuntu box with some kind of opensource app?
  • notgoing2failnotgoing2fail Member Posts: 1,138
    Dilbert65 wrote: »
    It does not matter what you are trying to fix ( MS, Apple, Linux, Cisco, 3com,very long list) troubleshooting is at a best an art form. Take little steps at first. The old 50/50 rule can help alot.

    The 50/50 rule is

    1. at the 1/2 way test to see if you are getting the results you expect. IF so jump the the 75 % and test again. If not jump to 25% and test again. You can see where this is going. Everybody will develop thier own style of debugging, just takes time.

    Once you find the problem area go slow. With cisco "debug" command should be you last thing you use ( except in a lab), I say this cause how CPU intensive debugs are. In fact the debug command gets priority 1 for cpu.

    Check the configs, then double check the configs, then have somebody else look at it. If you feel the need to debug then do it during the off load time if possible. Also make sure you are close to the router in case to have to become a reboot specialist and power off/on to get control back.

    In my home lab I have run debugs that I have lost control of a router and had to yank the power to get control back. Debug's are a very powerful tool but use with care.


    I'd actually like to try to simulate some loops to bring down a switch and hammer a router just for FYI purposes. I think it would be a good learning experience to see the good and the bad...

    Wendell Odom said the same thing in his book, that there is no exact guide to troubleshooting and everyone comes up with their own way of figuring things out. I guess it just takes time...

    I like to try to be as prepared as possible...
  • CiskHoCiskHo Member Posts: 188
    What do you use for your syslog server? I mean, what application? Or do you use some kind of Ubuntu box with some kind of opensource app?
    I use "kiwi syslog server" on a Win XP machine. Its free and fairly small. I'm planning to replace that PC with a new server I am currently building (Win7/64, RHEL, VMWare) and I think I'll keep kiwi for the Win7 boot... not sure about the Linux boot yet as I have ZERO Linux server experience. My guess is that Kiwi will run off of either OS though.
    My Lab Gear:
    2811(+SW/POE/ABGwifi/DOCSIS) - 3560G-24-EI - 3550-12G - 3550POE - (2) 2950G-24 - 7206VXR - 2651XM - (2) 2611XM - 1760 - (2) CP-7940G - ESXi Server

    Just Finished: RHCT (1/8/11) and CCNA:S (Fall 2010)
    Prepping For: VCP and CCNP SWITCH, ROUTE, TSHOOT
Sign In or Register to comment.