fredrik's thread

fredrikjj · May 2014

I'm moving my OSPF timers posts to my own thread for safe keeping

Obscure OSPF Timers Part 1,

timers pacing lsa-group

Every LSA has a maximum age of 60 minutes, and if that age is reached, it is removed from the database. To prevent this from happening each LSA is reflooded by the originator every 30 minutes. Early in OSPF's history in IOS, all LSAs were refreshed at the same time which caused a spike in CPU and bandwidth usage every 30 minutes, and potentially overloading weaker routers. To prevent this from happening, each LSA is refreshed based on its own individual age. However, treating each LSA completely independently would be inefficient because you would be constantly processing OSPF packets and sending very little data in each packet. Instead, LSAs that are ready to be refreshed are grouped together and sent all at once according to the group pacing timer. The default is 240 seconds. Sources say that you could lower this timer if your LSDB is absolutely massive to the point where you would accumulate too many LSAs to be refreshed in 240 seconds.

PS.
This timer also controls how LSAs are checksummed. I assume that this is to spread the CPU load from that activity.

Obscure OSPF Timers Part 2,

timers pacing flood

This timer is conceptually similar to the previous one, except that it deals with how LSAs are sent out interfaces in a more general sense. Each interface has a list of LSAs that are to be sent out (the 'flood list'), and instead of sending as soon as possible, LSAs are grouped together and sent every 33 ms by default. This is a CPU and bandwidth optimization if multiple LSAs need to be sent because if LSAs were sent immediately they would naturally require their own OSPF packet and IP packet, packets that would be very small.

When would you change with this timer? If you have a need to speed up convergence and you don't anticipate that lowering the timer would cause CPU issues on your routers. Remember, when an event happens that requires OSPF to converge (i.e. run SPF), the LSA that has been changed must be flooded throughout the area. The 33 ms pacing on each interface would add up. Minmum value is 5 ms. I imagine that it would be safe to run the minimum value on a modern router, but I really have no idea.

edit: The pacing doesn't apply to the first LSA, and only triggers if anything LSA needs to be sent right after. Tuning this value for convergence purposes thus doesn't seem as important as I previously thought.

Obscure OSPF Timers Part 3,

timers pacing retransmission

This timer has the same function as the previous one, but it specifically groups retransmissions of unacknowledged LSAs. It defaults to 66 ms, but it's less obvious to me when you would want to lower this one, and what the effect would be. It wouldn't really have a major effect on convergence because the limiting factor here is the time from when the LSA was first sent to when the router decided that it is unacknowledged (retransmit timer). If you actually have unacknowledged LSAs, you probably have an issue in your network and probably don't want to retransmit a bunch of tiny OSPF packets.

Obscure OSPF Timers, part 4

interface command, ip ospf transmit-delay

I imagine that this is the most useless timer in modern OSPF. Essentially, it adds a set time to the age of all LSAs that are sent out the interface. The idea is that if you have an impossibly slow link, setting this timer allows you to take into account the time it actually takes to send the LSA. Even if you do have a slow link and it takes a few seconds to the send the LSA, I don't see what adding a few seconds to the age time would do.

Obscure OSPF Timers, part 5

interface command, ip ospf dead-interval minimal hello-multiplier (aka. "fast hellos")

This probably isn't particularly obscure but I decided to add to the list anyway. The normal OSPF hello timer has a minimum value of 1 second. Using the dead-interval minimal command sets the dead interval to 1 second and the hello-multiplier specifies how many hellos are sent within this one second. What's strange about this feature is that you are limited to a one second dead interval, but you are allowed to send a hello every 50 ms with the maximum hello-multiplier (20). Why is this strange? Well, why would you need to refresh a particular hold time that often? The point of sending multiple keepalives within a particular hold time is to prevent your neighbor relationship from going down if one, two or three keepalives are lost in transit. If you lose more than that, your link is probably of such a poor quality anyway that the neighbor should go down. Feel free to inform me why you would ever send 20 hellos per second with a 1 second dead interval.

fredrikjj · May 2014

Obscure OSPF Timers, part 6

timers throttle lsa all <start-interval> <hold-interval> <max-interval>

An issue with routing protocol convergence is that while you want the protocol to react quickly to legitimate changes in network conditions, you don't want the protocol to get overwhelmed by very rapid changes to the same link. For example, a link flap that causes a router to make changes to its router LSA. It makes sense to react quickly to the first change because it could be a "real" change, but as there continue to be changes the to same LSA, it becomes less and less likely that this is the case.

The crudest way of implementing this protection would be to simply delay the generation of the LSA in question for several seconds in order to let things stabilize. For example, a new version of an LSA can only be generated every 5 seconds. The throttle lsa feature is a bit more sophisticated however.

We have three values: start-, hold- and max-interval

With the default start-interval value (0) the first LSA is generated as soon as a change happens to it. It could be beneficial to set this to a small, but none zero value so that simultaneous link failures are reflected in a single LSA. This very fast initial reaction is what we are looking for in order to allow for fast reconvergence around the change.

Now things get a bit tricky. After the first LSA, the same LSA cannot be generated until the hold-interval expires. If nothing happens within the hold-interval, we are back to square one. However, if a new version of the LSA must be generated and the hold-interval hasn't expired, it is delayed until hold-interval expires, and the next hold-interval is 2xhold-interval. Again, if nothing happens in that period, we're back to the beginning, but if there is another change, it is delayed until this double hold-interval expires, and the next hold is 4xhold-interval. Hopefully you get the picture here; for every change, there's an exponential increase in the delay of the LSA generation.

If there continues to be changes within the exponentially increasing hold-time, we eventually reach the max-interval which is the maximum amount of time there can be between LSAs. If we reach this point, there must be no change to the guilty LSA within this period in order for OSPF to return to normal operation.

Let's take a look at the default values for this feature in 15.4:

start-interval: 0
hold-interval: 5000 (ms)
max-interval: 5000 (ms)

To me, the hold-interval looks somewhat problematic. For example, if there's a quick down/up event on a link, it would take 5 seconds to recover since the second event (the 'up') would be delayed for 1xhold-interval. It seems to me that you would want to drastically lower the hold-interval while maintaining a fairly long max-interval to protect against a longer series of rapid changes. Thoughts on this?

sources/more reading:
IP Routing: OSPF Configuration Guide, Cisco IOS Release 15M&T - OSPF Link-State Advertisement Throttling [Cisco IOS 15.4M&T] - Cisco
OSPF Fast Convergence
Tuning OSPF Performance

fredrikjj · May 2014

Obscure OSPF Timers, part 7

timers lsa arrival <milliseconds>

This timer is related to the previous lsa throttle command, and it puts a strict constraint on how often a particular LSA can be accepted from another router. There seems to be no real purpose to this timer since lsa generation is already limited by lsa throttle. The configuration guide recommends setting it to a lower value than the lsa throttle hold-interval which makes sense because if it's set higher, you would risk dropping LSAs.

Important or not, you must still be aware of this timer because if you are tuning the lsa throttle to have a smaller hold-interval, you would also have to lower this timer from its default of 1,000 ms.

fredrikjj · May 2014

Obscure OSPF Timers, part 8

timers throttle spf <start> <hold> <max-wait>

Our adventure in the underbelly of OSPF is starting to get interesting because by default the spf start time is 5 seconds. Let that sink in for a moment; after receiving an LSA that triggers a recalculation of the SPF tree the router waits 5 seconds before it actually performs the calculation. Until the new calculation is performed, the RIB will not be updated and consequently neither will the FIB. I feel like someone should have told me this at a much earlier point than today, but now I know.

The throttle spf timer follows the same general principles as throttle lsa that I covered in part 6 of this series. You want your routers to respond quickly to an event, but not get overwhelmed when faced with a rapidly oscillating network.

So, should you just set the start time to its minimum value (1 ms) and be done with it? Not quite. Lapukhov points out in one of his articles that you want to flood the LSA and then run SPF. That is, flooding the LSA to neighbors should be of higher priority than locally running SPF. Instead of 1 ms, we're talking about values in the 10 ms range for the spf start time. You would then want to set the hold-time to a value that allows the network to converge before allowing another SPF run to be triggered. This would depend on factors such as SPF run time and RIB/FIB update time. Max-wait time would be set based on the CPU of your routers and what kind of CPU load would be acceptable during a worst case scenario LSA flood.

fredrikjj · May 2014

Obscure OSPF Timers, part 9

interface command, ip ospf retransmit-interval

This timer determines how long OSPF will wait for acknowledgement of a sent LSA before retransmission. The default is 5 seconds. The curious thing about this timer is that the minimum is quite high at a full one second. The conclusion I draw from that is that if your LSAs gets lost in transit you're in big trouble in terms of getting OSPF to react quickly. Tuning SPF to run after 10 ms for example isn't exactly helping you if it takes a full extra second before the router actually get the LSA. In other words, running OSPF over links where packet loss is a possibility could greatly increase convergence time if you are unlucky, and you're quite limited in what you can do about it since retransmit-interval has a lower bound of one second.

I think that should be it, and I'll finish up this series in my next post with some kind of summary unless I managed to dig up yet another time from the command reference later today.

PS.
Can't be bothered to make a summary of the summaries

fredrikjj · May 2014

I have a blog now fredrikjj which is why I haven't been as active in this thread. Check it out and give me some feedback. I need a good name more than anything..

fredrikjj · May 2014

I've moved on from OSPF for the time being to focus on BGP. Spent yesterday and today covering the first 120 pages or so in Routing TCP/IP Vol 2. I encountered a couple of new attributes and concepts here and there, but nothing major. I expect the real hard work with BGP to begin when I start on BGP Design and Implementation. I'm probably skipping Internet Routing Architectures because no one seems to object when I say that I will, and it's not on Lapukhov's reading list (I'm his number one fan, lol).

fredrikjj · June 2014

A quick status update.

- Reviewed BGP in Routing TCP/IP vol 2
- Solved the BGP section in INE's workbook.
- Started on BGP Design and Implementation

I'm writing summaries on the chapters in the design book so if you're curious about what that book is all about, check out my blog. Link's a few posts up. It's also starting to sink in how ridiculous this cert is in terms of how much material you need to learn.

lrb · June 2014

Nice blog. The study seems to be going very well! Keep up the awesome work

fredrikjj · June 2014

lrb wrote: »

Nice blog. The study seems to be going very well! Keep up the awesome work

Thanks

fredrikjj · June 2014

I wrote I post on BGP QoS Policy Propagation today. It's probably not something that people in general are super familiar with so please go check out it and give me some feedback: BGP Design & Implementation Part 7 – QoS Policy Propagation | A Networking Blog

And if you in fact are an expert on this feature, please point out any mistakes that I've made.

creamy_stew · June 2014

Great discussion! I find it slightly bothersome that I'm having problems following along, though. I mean, I did pass ROUTE...

Sometimes, I feel like I'm forgetting more stuff than I'm learning.

harish.80k@gmail.com · June 2014

>>>>>>>>>>>>>>>>>>>>>>>

In the topology we dont have a direct link b/w R4 and R3 anyways and hence the trafic will have to go via R5 even if the best path is advertsied from R3 as mentioned . The moment the packet to the desired n/w reaches R5 should it not select the Intra Area route to the n/w 163.X.12.0/24 via R2 ?

harish.80k@gmail.com · June 2014

Pls ignore my question as I missed the vc b/w R4 and R3 ... tired eyes. Apologies

fredrikjj · June 2014

I've now finished a series on using BGP in the enterprise core based on chapter 5 in BGP Design and Implementation. Check them out and if you have an opinion, don't be afraid to share it.

BGP Design & Implementation Part 8 – iBGP Enterprise Core | A Networking Blog
BGP Design & Implementation Part 9 – eBGP Enterprise Core | A Networking Blog
BGP Design & Implementation Part 10 – eBGP And iBGP Enterprise Core | A Networking Blog

tomtom1 · June 2014

creamy_stew wrote: »

Great discussion! I find it slightly bothersome that I'm having problems following along, though. I mean, I did pass ROUTE...

Sometimes, I feel like I'm forgetting more stuff than I'm learning.

Same here though

Guess that's what the CCIE does.

fredrikjj · July 2014

Still studying. I've recently started using the Pomodoro Technique - Wikipedia, the free encyclopedia after someone on this forum made me aware of it. While I'm not necessarily studying more, I'm more effective when I actually do study. I'm still working my way through BGP Design and Implementation and blogging about it, something that has turned out to be very time consuming. The good news is that I'm probably (hopefully) much better at BGP than a month ago.

creamy_stew[ wrote:

Sometimes, I feel like I'm forgetting more stuff than I'm learning.

I know that feeling. It's very difficult to remember every detail of topics that you are not longer actively studying.

fredrikjj · July 2014

Update!

Wrote a few more posts on BGP:

BGP Design & Implementation Part 12 – Route Reflection | Fredrik's Networking Blog
BGP Design & Implementation Part 13 – Confederation | Fredrik's Networking Blog
BGP Design & Implementation Part 14 – Service Provider Architecture Fundamentals | Fredrik's Networking Blog
BGP Design & Implementation Part 15 – Service Provider Community Design Overview | Fredrik's Networking Blog
BGP Design & Implementation Part 16 – A Few BGP Security Considerations | Fredrik's Networking Blog

The BGP book that I'm using as a source is pretty amazing. While Routing TCP/IP Vol 2 was just more of the same "here's a command and this is what it does" that I was used to from CCNP, BGP Design and Implementation actually tries explain how these different building blocks are used.

gorebrush · July 2014

fredrikjj wrote: »

Update!

Wrote a few more posts on BGP:

BGP Design & Implementation Part 12 – Route Reflection | Fredrik's Networking Blog
BGP Design & Implementation Part 13 – Confederation | Fredrik's Networking Blog
BGP Design & Implementation Part 14 – Service Provider Architecture Fundamentals | Fredrik's Networking Blog
BGP Design & Implementation Part 15 – Service Provider Community Design Overview | Fredrik's Networking Blog
BGP Design & Implementation Part 16 – A Few BGP Security Considerations | Fredrik's Networking Blog

The BGP book that I'm using as a source is pretty amazing. While Routing TCP/IP Vol 2 was just more of the same "here's a command and this is what it does" that I was used to from CCNP, BGP Design and Implementation actually tries explain how these different building blocks are used.

Thanks for the suggestion of the book. I've been going between the INE videos, Internet Routing Architectures, and well I read Routing TCP/IP vol2 a long time ago.

I like the BGP Design and Implementation guide because as you say, it helps translate the real world requirements of these features into "why" you would use them, and some texts seem to miss this point.

I've been spending the last 2 days on BGP and feeling like I've gotten nowhere but reading that book is helping cement down all the basics again. Honestly, I'd buy you a beer for that!

fredrikjj · July 2014

Wrote a post on MPLS VPN with BGP as the PE-CE and using as-override and what can happen in that scenario if a bad network admin makes some weird changes, and how to fix it with SOO.

https://fredrikjj.wordpress.com/2014/07/26/bgp-design-implementation-part-18-mpls-vpn-bgp-loop-prevention/

The loops I've created orignally relied on local preference modification but someone pointed out that this wouldn't really work and that the "bad amin" in question would have to modify weight to above 32,768 instead. This makes the scenarios a lot less "realistic" than I had originally anticipated, but they are still somewhat plausible. It'll make sense when you read it.

creamy_stew · July 2014

Your company is lucky to have you. Do you work for a provider, or is it something like Eltel or Dotcom?

fredrikjj · July 2014

creamy_stew wrote: »

Your company is lucky to have you. Do you work for a provider, or is it something like Eltel or Dotcom?

That's a great compliment, thanks

I don't work in networking at this point.

fredrikjj · July 2014

I've now finished BGP Design and Implementation except for the multicast chapter which I don't think I can handle at this point since the only multicast I've studied is a chapter in Comer's Internetworking with TCP/IP book. While I've spent most of my time on BGP lately, I've also read sections of Routing TCP/IP Vol 1, recently starting the IS-IS chapter. It's probably because I haven't touched IS-IS at all before, but that chapter definitely feels harder and denser than the other chapter in the book (I've read everything except that one). My plans for the next month or so is IS-IS and IPsec VPNs and DMVPN, and hopefully I can maintain my current pace.

Dieg0M · July 2014

In what do you work Fredrik?

fredrikjj · July 2014

I'm unemployed.

ninjaturtle · July 2014

That's a very good article. You need to move to the US mate, tons of companies could use your skills out here. Is it hard to find jobs right now in Sweden?

fredrikjj · July 2014

ninjaturtle wrote: »

That's a very good article.

Thanks!

You need to move to the US mate, tons of companies could use your skills out here.

I'm sure they could but it doesn't really work like that.

Is it hard to find jobs right now in Sweden?

I doubt my unemployment reflects some systematic weakness in the Swedish market for networking professionals. At the end of the day, just because I can read some books and write a few articles doesn't mean that I'm particularly employable on paper to people who screen resumes. I've only really studied this stuff for around year now and if it takes another year or two until my skill set is wide and deep enough, that's fine by me since I know that this is what I want to do for a living.

Dieg0M · July 2014

How many years of experience do you have in networking? Don't you think it would be better studying part time while working and getting experience?

fredrikjj · July 2014

Dieg0M wrote: »

How many years of experience do you have in networking?

I have two years of experience in networking. However, that was a few years ago now and probably more or less irrelevant from a resume point of view.

Don't you think it would be better studying part time while working and getting experience?

Yes, that would be much better obviously, but the second best alternative is continuing to study as much as I can.

creamy_stew · July 2014

fredrikjj wrote: »

Thanks!

I'm sure they could but it doesn't really work like that.

Check out H1b visas. They have tightened the requirements, so I doubt a CCNP alone qualifies, but it might be good to know the requirements if it's something you could see yourself doing. Moving to the US, that is.

fredrik's thread

Comments