EIGRP Flapping...

CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
On our network at work, we have two remote sites peered up through EIGRP. I'm not sure how long it's been happening but the neighborship has been flapping up and down. Doing debug eigrp packet hello, I can see that the hellos (Multicast) are sent and recieved just fine. The Updates (Unicast) are what appear to not work. The same debug shows them with the retry counter incrementing all the way up to 16 then the neighborship breaks and the routes are lost. The neighbors reconverge and it happens all over again. The neighbors are peered up on SVIs. One thing I noticed is that for one of the neighbors, the MTU is set to 1500 while on the other, it's set to 1504. For the site with the MTU of 1504, QinQ is in place and I believe this is necessary for that to work. Could this mismatch in MTU cause the issue I am seeing? Neighborship forms but updates constantly retry. QCnt stays at 1 and RTO is 5000
Currently reading: Network Warrior, Unix Network Programming by Richard Stevens

Comments

  • instant000instant000 Member Posts: 1,745
    Hrm.

    Just to confirm that it is not something else, these neighbors can ping each other prior to setting up EIGRP, right?

    Have you tried configuring a neighbor command in order to get EIGRP to use unicast updates versus multicast (in case some security guy is blocking the multicast traffic?).

    I hope these suggestions give you some ideas:

    1. confirm the neighbor is always reachable
    2. try unicast versus multicast

    Hrm ... my connection to cisco.com is down right now, I was going to look over the EIGRP FAQs to see if this issue had surfaced before ... and I need to be getting to bed, so I can't be bothered to lab this up right now...I thought EIGRP supported neighbor statement for unicast, but my mind is in "go to bed" mode right now.

    I hope these ideas help, though.

    EDIT:

    I see that I MISREAD your post. Apparently, the unicast is what is not working!

    Disregard everything I said, and allow me to get some sleep. :D
    Currently Working: CCIE R&S
    LinkedIn: http://www.linkedin.com/in/lewislampkin (Please connect: Just say you're from TechExams.Net!)
  • CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
    No worries :D I am going to further investigate the QinQ config tomorrow... QinQ might be transparent to us and if it is, I'm gonna remove the system wide mtu setting which requires a reboot of the core :P It might be a default setting but I'm gonna set it to 1500 like the other end and see what happens.

    Additionally, not sure if it's a reliable test but I cannot ping the other end if I set the MTU to 1504 with the df bit set. I can otherwise ping it though. The issue happens roughly every minute (16 update retries). I started a ping of 15000 packets and none of them dropped from end to end.
    Currently reading: Network Warrior, Unix Network Programming by Richard Stevens
  • instant000instant000 Member Posts: 1,745
    A crazy idea I was wondering about was this: If this update is sent out each source, and set not to fragment, then this would cause them to not get received on the other end.

    Of course, if there was a packet capture, it would put this question to rest. I've not been able to find a document that simply stated this, and won't be able to lab (to prove this actually occurs) until later. [Have other "work" to do. LOL.]

    Let me know how it goes today. Quite curious now.
    Currently Working: CCIE R&S
    LinkedIn: http://www.linkedin.com/in/lewislampkin (Please connect: Just say you're from TechExams.Net!)
  • networker050184networker050184 Mod Posts: 11,962 Mod
    Sounds like an MTU mismatch could be the cause. You have the MTU set to 1504 on the SVI? You shouldn't need it there.
    An expert is a man who has made all the mistakes which can be made.
  • Mrock4Mrock4 Banned Posts: 2,359 ■■■■■■■■□□
    Sounds like an MTU mismatch could be the cause. You have the MTU set to 1504 on the SVI? You shouldn't need it there.

    Ding ding! I think MTU is your issue here at first glance.
  • phoeneousphoeneous Member Posts: 2,333 ■■■■■■■□□□
    Along with mtu mismatch, take a look at your interface statistics.
  • vanquish23vanquish23 Member Posts: 224
    WE had eigrp issues with our 4500 switches for our VOIP/LAN switches in the office, and ended up restarting the switches to correct the issue.
    He who SYNs is of the devil, for the devil has SYN'ed and ACK'ed from the beginning. For this purpose, that the ACK might destroy the works of the devil.
  • networker050184networker050184 Mod Posts: 11,962 Mod
    You sound like an MS admin vanquish23. ;)
    An expert is a man who has made all the mistakes which can be made.
  • RouteMyPacketRouteMyPacket Member Posts: 1,104
    lulz icon_lol.gif
    Modularity and Design Simplicity:

    Think of the 2:00 a.m. test—if you were awakened in the
    middle of the night because of a network problem and had to figure out the
    traffic flows in your network while you were half asleep, could you do it?
  • CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
    Update on this... The carrier had an incident today and now there is an issue in the carriers layer two service that has caused the neighborship to totally break. Even though they're on the same subnet they can't ping each other now :P Once they resolve that, I can continue to investigate this. I'd be surprised if this issue doesn't exist after they resolve this outage. I say that because I called them last week about this and was told that there was no issues on their end.

    EDIT: They restored their service so I'll continue to troubleshoot when back in the office since the issue still exists.
    Currently reading: Network Warrior, Unix Network Programming by Richard Stevens
  • RouteMyPacketRouteMyPacket Member Posts: 1,104
    CodeBlox wrote: »
    Update on this... The carrier had an incident today and now there is an issue in the carriers layer two service that has caused the neighborship to totally break. Even though they're on the same subnet they can't ping each other now :P Once they resolve that, I can continue to investigate this. I'd be surprised if this issue doesn't exist after they resolve this outage. I say that because I called them last week about this and was told that there was no issues on their end.

    EDIT: They restored their service so I'll continue to troubleshoot when back in the office since the issue still exists.

    Keep us updated, I would like to see what you find. Were you able to verify MTU?
    Modularity and Design Simplicity:

    Think of the 2:00 a.m. test—if you were awakened in the
    middle of the night because of a network problem and had to figure out the
    traffic flows in your network while you were half asleep, could you do it?
  • jamesp1983jamesp1983 Member Posts: 2,475 ■■■■□□□□□□
    I'm anxious to hear about this as well...
    "Check both the destination and return path when a route fails." "Switches create a network. Routers connect networks."
  • CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
    This has been resolved... Two things here, somebody set the mtu to 1400 on one of the vlan interfaces which caused an issue with accessing certain websites causing them to either not load at all or take a really long time. I found and removed this from the config and that cleared up. Other thing, I brought the interface down administratively to force traffic another way. Upon coming back up, the issue no longer is happening. Very strange indeed but Q Cnt is now 0 and RTO is 200 like normal. Testing with the MTU back to 1400 i cannot replicate the EIGRP issue -_-

    I assure you though, it was happening for weeks looking at the logs
    Currently reading: Network Warrior, Unix Network Programming by Richard Stevens
  • instant000instant000 Member Posts: 1,745
    Did I read that correctly?

    Putting the "if I was there" hat on:

    You really cannot afford to have customer's suffering because Cisco Bob wanted to try that new "ip mtu" command that he learned on a Youtube video.

    Basically, it is not good to have changes occur without approval.

    It appears that your network has an issue with change control. You might want to confirm that you're logging configuration changes, so that you can catch the rogue admin in the act next time.

    Also, go ahead and let them know that it is enabled. This might discourage unauthorized changes in the future.

    Edit: I use the term Cisco Bob to poke fun at Microsoft Bob. In the old days when I took Microsoft tests, there was this guy Bob who was always having issues. Didn't everyone call him "Microsoft Bob"?

    Edit2: Wow, I tried to look up Microsoft Bob, and came upon this product that looked absolutely horrible. They tried to be user-friendly, but it looked like something for kids.
    http://toastytech.com/guis/bob.html
    Currently Working: CCIE R&S
    LinkedIn: http://www.linkedin.com/in/lewislampkin (Please connect: Just say you're from TechExams.Net!)
  • CodeBloxCodeBlox Member Posts: 1,363 ■■■■□□□□□□
    Lol! I have an idea of who dun it, the logs show me that too in Orion. I couldn't agree more about the change management part. Folks have brought up that problem here actually. We very recently started a change management process. Problem with that is, we are a small shop and usually the person submitting the request is the ONLY SME on the subject of their change. Only thing other folks COULD do is go "Ok...?"

    I learned something interesting about https traffic in all of this... It comes with the don't fragment bit set in the ip header.
    Currently reading: Network Warrior, Unix Network Programming by Richard Stevens
  • Mrock4Mrock4 Banned Posts: 2,359 ■■■■■■■■□□
    If you didn't already, read up on Path MTU discovery (PMTUD) - it applies to your situation. It's a good read for future reference and it goes in line with the fragmentation bit you mentioned:

    Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC - Cisco Systems

    Glad the problem is resolved.
Sign In or Register to comment.