Categories
Welcome Center
Education & Development
Cyber Security
Virtualization
General
Certification Preparation
Project Management
Posts
Groups
Training Resources
Infosec
IT & Security Bootcamps
Practice Exams
Security Awareness Training
About Us
Home
Discussions
Off Topic
Question about recent network outage
CodeBlox
We have a 6509 as our core switch. One of three 48 port GB blades failed causing a catastrophic failure. When I walked in the NOC I found a single red light and all of the switchport lights off. The problem blade seemed to also be causing the entire core switch to lock up preventing me from using telnet/console for access.
Has anyone ever heard of a single blade (48 1GB ports) locking up an entire switch?
I'm just wondering if there could have been more than one issue. Things have normalized since then with a blade from our old core switch. We plan to cut over to the replacement that was flash shippped tomorrow.
Find more posts tagged with
Save $250 on 2025 certification boot camps from Infosec!
Book now with code EOY2025
Button
Comments
networker050184
I've seen plenty situations like this. Could be many things. Jacked up cef table, back plane, supervisor got wacked out. Probably no way to know for sure unless you have a crash file (which is unlikely if from what I'm understanding the SUP didn't crash) for Cisco to look at. Even then they will probably just tell you it got in a corrupted state from the line card failing....
CodeBlox
The SUPs didn't crash but in the chaos, I was asked to reboot the entire core switch which means that the logs from that time are no longer there. Couldn't console into it. Is there anything else I should check or should I leave it at "the blade failed"?
inscom.brigade
try doing a show tech-support
, open a case with cisco
networker050184
You can open a case with Cisco, but I really doubt you'll get much out of it honestly without any information to give them. Show tech probably won't have anything. It's more of a current state of the router and log messages.
inscom.brigade
we had "several", 4506 e chassis with blade trouble and power supply trouble. ( i could elaborate but that would take a page itself), after opening a case with cisco and giving them a show tech-su they came back with "their" IOS, had a bug and that we needed to upgrade the IOS.
If you have all the time in the world to look at that, you'll figure out what's wrong your self right. But if you need to attend to things at hand while your paying for a service contract putting cisco to work for you, is a good idea?
networker050184
Well yeah, if it's an IOS bug that is one thing, but if it's just a case of a router getting in an inconsistent state you aren't likely to find anything.
Iristheangel
Time to set up a Syslog server and some SNMP traps! At least in that scenario, all your logs would not have been lost upon reboot.
I had a somewhat similar situation about a month ago where I was asked to reboot the core. I was on-call for the week and I got the call our entire data center was down early on a Sunday morning and the CIO and his boss had already noticed. I jumped in the car in my pajama's and flip flops and zoomed off to the data center. The console wasn't gummed up but it was close to it. I was getting HSRP and EIGRP flaps blasting my console and the CPU utilization for both cores were up to 100%. Logging into that mess was a process. I'll spare you the gory details but the CIO ended up showing up to the datacenter 20 minutes into my troubleshooting and was like "If we reboot them, it is possible it will fix the problem?" I basically was like "Very very very very doubtful. Plus we lose the logs so if the problem goes away, we run the risk of it occurring again at a worse time than a non-business day if we don't root cause it." The CIO was insistent about me rebooting it anyways so I did.
In my situation, the problem came right back so the loss of logs didn't really matter as much and we were able to find the problem (STP issue - port channels were flapping on one of the access layer switches) but it sure lit some fires under some people's butts to get a better logging and alerting solution in place. We ended up adjusting and pointing SNMP traps to out newer Solarwinds server, adding logging for emergency, alert, critical and error levels and pointing it towards Solarwinds, adding out-of-band terminal access, adding out-of-band alerting.
My point is that it may suck that you lost all your logs and feeling like you have a potential underlying issue going on that may go off at any point but it's a great opportunity to pitch a syslog and SNMP server to the business. It'll be pretty invaluable in the future if you can get them to approve getting it going.
PCSPreston
I would think that would be a ISO file corruption. Hard to tell however at this point.
CodeBlox
I think this definitely merits sending the logs to our SolarWinds Orion server via Syslog. Will be putting that in place on Monday.
Quick Links
All Categories
Recent Posts
Activity
Unanswered
Groups
Best Of
INFOSEC Boot Camps
$250
OFF
Use code
EOY2025
to receive $250 off your 2025 certification boot camp!
BROWSE BOOT CAMPS