Something I really never Understood...
Danielh22185
Member Posts: 1,195 ■■■■□□□□□□
in CCNP
So with my company we tend to I think "over do it" when it comes to a failed devices on the network.
Example:
Major core switch fails, we generally shut down physical interfaces AND routing protocols to that device. I know it's always best to tread carefully in this type of scenario but I've never been able to get an answer from anybody I work with why just shutting down physical interfaces isn't "good enough". If there is no possible ingress interface UP/UP on the device then how could the device possibly receive traffic right?
The only reason I can possibly think people tend to do this is to have the additional control element to ensure interfaces are up/up physically before telling a routing protocol to establish during a restoration. Still, I think a bit redundant and silly and not necessary because an interface will always establish before the routing protocol, and if there is something actually wrong with an interface, simply shut it back down...
Now, don't get me wrong either. I fully understand too when it's best to have just a routing protocol down. Example of this would be for some reason we need to leave the physical up but not allow traffic across the link. So the routing protocol can be shut down in this instance to do the physical testing. However my previous example doesn't apply to this idea.
Can y'all think of why I should steer clear of just focusing on the physical and think more like some of my work colleagues or are they just being overly cautious?
Example:
Major core switch fails, we generally shut down physical interfaces AND routing protocols to that device. I know it's always best to tread carefully in this type of scenario but I've never been able to get an answer from anybody I work with why just shutting down physical interfaces isn't "good enough". If there is no possible ingress interface UP/UP on the device then how could the device possibly receive traffic right?
The only reason I can possibly think people tend to do this is to have the additional control element to ensure interfaces are up/up physically before telling a routing protocol to establish during a restoration. Still, I think a bit redundant and silly and not necessary because an interface will always establish before the routing protocol, and if there is something actually wrong with an interface, simply shut it back down...
Now, don't get me wrong either. I fully understand too when it's best to have just a routing protocol down. Example of this would be for some reason we need to leave the physical up but not allow traffic across the link. So the routing protocol can be shut down in this instance to do the physical testing. However my previous example doesn't apply to this idea.
Can y'all think of why I should steer clear of just focusing on the physical and think more like some of my work colleagues or are they just being overly cautious?
Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
Comments
-
docrice Member Posts: 1,706 ■■■■■■■■■■Some places have a strict procedure. Sure, shutting an interface down can effectively cut off the device's utility, but if there are many cooks in the kitchen it's possible that someone may come around and "fix it" by turning it back on not knowing about the issue that caused the problem.
Another example - if you remove a server from a network, shutting the port down seems enough ... until someone later sees that port as available, plugs an arbitrary device in, turns the port back on, and all of a sudden you have a new device on a VLAN that it's not supposed to be on. This can be bad news.Hopefully-useful stuff I've written: http://kimiushida.com/bitsandpieces/articles/ -
fredrikjj Member Posts: 879So with my company we tend to I think "over do it" when it comes to a failed devices on the network.
Example:
Major core switch fails, we generally shut down physical interfaces AND routing protocols to that device.
Manually changing routing protocol configuration at the CLI of an operational device when another device fails seems like a bigger risk to me than just shutting down interfaces and leaving the routing intact. If the person doing that reconfiguration does something wrong you now have two failed devices. -
White Wizard Member Posts: 179Some places have a strict procedure. Sure, shutting an interface down can effectively cut off the device's utility, but if there are many cooks in the kitchen it's possible that someone may come around and "fix it" by turning it back on not knowing about the issue that caused the problem.
Another example - if you remove a server from a network, shutting the port down seems enough ... until someone later sees that port as available, plugs an arbitrary device in, turns the port back on, and all of a sudden you have a new device on a VLAN that it's not supposed to be on. This can be bad news.
If an int is administratively down, then whomever wants to use that int should know it was shutdown for a reason and find out why."The secret to happiness is doing what you love. The secret to success is loving what you do." -
networker050184 Mod Posts: 11,962 ModI usually don't remove routing protocols but I do raise the metric. That way you can turn it all the way up and gracefully shift traffic back on.An expert is a man who has made all the mistakes which can be made.
-
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□networker050184 wrote: »I usually don't remove routing protocols but I do raise the metric. That way you can turn it all the way up and gracefully shift traffic back on.
I believe the mind set is all about doing things gracefully.
Now don't get me wrong, like in this situation I described we have the device isolated from the network already (cables pulled (told you we tread carefully)) (we sometimes even go as far as powering off a bad device like this...I get it, it quickly preserves business continuity... and I work for a large financial firm so business / money is everything).
The Failure was a bad SUP (had multiple parity errors). So to better facilitate a graceful re-introduction to the network I consoled into the switch and shut down all interfaces, so once the hardware is replaced we still have the device logically isolated from the network. From there we can control gracefully reintroducing it to the network from that particular device instead of logging into 20+ surrounding networking devices and shutting down / re-enabled stuff manually (creating less opportunity for human error).
The thing that just throws me is that I have several colleagues with years more experience than me that will take it a step further and passive interfaces / shut down routing protocols on the devices. I never understood from them why other than it being a precaution it is done that way. A routing protocol works only as good as it's up/up interfaces...Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
james43026 Member Posts: 303 ■■□□□□□□□□In the world of networking, you have a lot of people that have learned over the years, only based on what they have seen. There are some that may not even bother to look at best practices, and just come up with something on their own. This isn't wrong per se, but it can be uneducated guesses at times. You will also find a lot of people out there, that only have a high level view of how something works in the world of networking, and they don't have the lower level knowledge of how something really works, which are usually the people that start making lots of "Excessively safe" policies, and creating lots of unnecessary work.
-
networker050184 Mod Posts: 11,962 ModThose people with more experience have probably seen devices comeback online and wreak havoc which is why they're so cautious. Either that or someone that has told them war stories.An expert is a man who has made all the mistakes which can be made.
-
daveyb Member Posts: 28 ■□□□□□□□□□While that does sound a bit extreme, its always wise to bring a device into service gracefully.
You generally don't want to plug a device in, turn up its interfaces, and boom its forwarding traffic. What if there is a faulty patch that has become damaged during the swap out? You may get a whole load of TCP retransmits and severely limit traffic flow. Policy got fluffed when restoring config? Could have routing loops/other undesirable behaviour. There are many issues that could occur.
Generally a good way of going about things is something like the following:
Bring up interfaces. Run a few pings across, make sure no errors on any ports.
Bring up IGP with high metrics on all links. Ensure all adjacencies are established and that all routes are being distributed/learnt.
Bring up BGP. Ensure all neighbours establish, routes are being distributed.
Drop IGP metrics back to what is normal.
Restore any FHRP that are usually master on this box.
YMMV, but that is roughly the kind of plan I would follow. It always pays to be safe. -
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□While that does sound a bit extreme, its always wise to bring a device into service gracefully.
You generally don't want to plug a device in, turn up its interfaces, and boom its forwarding traffic. What if there is a faulty patch that has become damaged during the swap out? You may get a whole load of TCP retransmits and severely limit traffic flow. Policy got fluffed when restoring config? Could have routing loops/other undesirable behaviour. There are many issues that could occur.
Generally a good way of going about things is something like the following:
Bring up interfaces. Run a few pings across, make sure no errors on any ports.
Bring up IGP with high metrics on all links. Ensure all adjacencies are established and that all routes are being distributed/learnt.
Bring up BGP. Ensure all neighbours establish, routes are being distributed.
Drop IGP metrics back to what is normal.
Restore any FHRP that are usually master on this box.
YMMV, but that is roughly the kind of plan I would follow. It always pays to be safe.
All good advice. In the end, in the networking world there are multiple ways to skin a cat. Other than being cautious I wanted to understand is there really a hard reason why some of my colleagues wanted to operate the way they do. Seems there is no hard reason other than caution, which I fully understand, because like I mentioned I work for a giant financial firm centered around a giant network. Network stability is huge for us. We are involved in almost any technical problem the firm faces, at least always to prove out network.
I too have seen my fair share of craziness. This is why when we do restore equipment is generally done in the middle of the night when business traffic is at its lowest.Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
james43026 Member Posts: 303 ■■□□□□□□□□networker050184 wrote: »Those people with more experience have probably seen devices comeback online and wreak havoc which is why they're so cautious. Either that or someone that has told them war stories.
I can't argue against this one. We all make mistakes, there is nothing like not being cautious enough, and causing an outage. I was just trying to make the point that some people use excessive caution as a way to mask the fact that they don't understand how something works. But you are correct, that in networking, people who have learned to lean towards caution, probably have a few battle scars.