Help needed for an interesting problem

jasonzwsajasonzwsa Member Posts: 8 ■□□□□□□□□□
Hello Everyone,

I am having a little problem with troubleshooting at my work.

We have a customer with two routers Main/Backup. The 2 routers are connected via iBGP.

The backup router is a connection ADSL (MPLS). When pings are made from the Backup router to the PE router there is latency and packet loss.

bf-XXX-gre1-phi-am#ping
Protocol [ip]:
Target IP address: 62.X.X.X
Repeat count [5]: 500
Datagram size [100]: 1000
Timeout in seconds [2]:
Extended commands [n]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 500, 1000-byte ICMP Echos to 62.X.X.X, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!
!!!!!!!.!!!!!!!!!!!!!!!!.!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!..!!!!
!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!
!!.!!!!!!!!!!.!!!!!!!!!!!!!!!!!..!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!
!!!!!!!!!!
Success rate is 96 percent (481/500), round-trip min/avg/max = 20/35/1580 ms

When pings are made from the main router towards the PE router of the backup (therefore via the backup router towards the PE router) there are no problems :

bf-XXX-gre1-nyk-am#ping
Protocol [ip]:
Target IP address: 62.X.X.X
Repeat count [5]: 500
Datagram size [100]: 1000
Timeout in seconds [2]:
Extended commands [n]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 500, 1000-byte ICMP Echos to 62.X.X.X, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!
Success rate is 100 percent (500/500), round-trip min/avg/max = 16/19/76 ms

This is a very interesting problem. The IOS on both routers is the same : flash:c2800nm-spservicesk9-mz.124-3a.bin
Both routers are Cisco 2821.

Please Help! I would greatly appreciate it

Comments

  • dtlokeedtlokee Member Posts: 2,378 ■■■■□□□□□□
    Many vaiables that could be affecting this, since the "main" router most likely has load on it could it simply be the packet loss is due to queue drops and or policing, while the "backup" router with no load on it is not going to drop the pings due to these problems.

    Really need a little more information, perhaps some show outputs.
    The only easy day was yesterday!
  • jasonzwsajasonzwsa Member Posts: 8 ■□□□□□□□□□
    Hello dtlokee

    There is no load on either router becuase the customer is not accepting the delivery of the service until we reslove the problem.

    The ping from the main router via the backup router (ibgp) towards the PE router of the backup (via bgp) works with no problems.

    The problem is when we ping the PE directly from the Backup CE. That is where the degradation is.

    What information do you need. debugs etc??
  • jasonzwsajasonzwsa Member Posts: 8 ■□□□□□□□□□
    Ok so I have one thing clear.

    If I do a ping towards the PE from the Backup router with a timeout of 3600s there is no packet loss.

    Therefore we can conclude that packet loss is due to time response.


    Diagnosis :

    (Backup router) bf-XXX-gre1-phi-am#ping 62.X.X.X size 1500 repeat 100 timeout 3600

    Type escape sequence to abort.
    Sending 100, 1500-byte ICMP Echos to 62.X.X.X, timeout is 3600 seconds:
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    Success rate is 100 percent (100/100), round-trip min/avg/max = 24/810/14888 ms <<<<<<<<<<<<<< VERY HIGH huh.gif


    (MAIN ROUTER) bf-XXX-gre1-nyk-am#ping 62.X.X.X size 1500 repeat 100 timeout 3600

    Type escape sequence to abort.
    Sending 100, 1500-byte ICMP Echos to 62.X.X.X, timeout is 3600 seconds:
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    Success rate is 100 percent (100/100), round-trip min/avg/max = 24/25/28 ms

    (MAIN ROUTER)bf-XXX-gre1-nyk-am#traceroute 62.X.X.X

    Type escape sequence to abort.
    Tracing the route to 62.X.X.X

    1 192.168.100.98 0 msec 0 msec 0 msec <<<<<<<< LAN address of Backup router
    2 62.X.X.X [AS 65006] 8 msec * 8 msec

    Any suggestions ???
  • AhriakinAhriakin Member Posts: 1,799 ■■■■■■■■□□
    Okay, excuse the ignorance here but the idea that popped into my wee noggin involves much lower level knowledge than I have. At what stage does a router reply to ICMP? Is it directly from the 'In' logical path on the interface, i.e. just like an Inbound ACL processes packets before it reaches the routing engine will a router respond to ICMP requests pre-processing since it is such a simple and oft'used protocol (And from the packet itself it already has the correct exit interface and source IP) or does ICMP have to pass through the interface and be processed before replies are given? The reason I ask is that if ICMP needs to be processed internally and then sent back out (which I presume it does) then a processing issue on one router would show loss where going in the reverse direction might not - it would point to a network issue but really it's internal processing....if any of that makes sense?....In fact if there are a number of ACL entries inbound on one router they may be bogging the traffic down enough, for whatever bug/reason, to slow some responses even though they ultimately allow the reply traffic.

    My vague $0.02
    We responded to the Year 2000 issue with "Y2K" solutions...isn't this the kind of thinking that got us into trouble in the first place?
  • dtlokeedtlokee Member Posts: 2,378 ■■■■□□□□□□
    Sounds like a service provider issue with that amount of jitter in the ping packets.

    14888 ms = 14.88 seconds, that's insane for it to take that long for a reply.

    I'd call the provider
    The only easy day was yesterday!
  • Fugazi1000Fugazi1000 Member Posts: 145
    Try a clear counters, ping (both ends) and then show int and cont and post the results here.
Sign In or Register to comment.