6500 issue

nelnel Member Posts: 2,859 ■□□□□□□□□□
Hi Guys,

Hoping someone could help me with an issue we are mysteriously having on our 6509's at one of our offices. This happened out of the blue and was not a result of any work going on etc.

Firstly, there's 2 6509's (SW1 & SW2) with the usual vlan routing, etherchanel, hsrp etc running eigrp. however yesterday out of now where the two could no longer communicate and ping each other on certain IP ranges using the DG for that subnet. For example half of the virtual hsrp IP's are active on SW1 and the other half on SW2. now sw2 cannot ping virtual IP's on SW1 and SW1 and cannot ping certain IP's on SW2.

A big issue is for some reason all of our access switches has only one link to only ONE of the 6500's - dont ask, it was a guy before me! im looking to rectify this but there's ALOT of faulty fibre so its not possible at the mo and a disaster like this has been waiting to happen because no one has solid knowledge about this site. So any access switch which is connected to SW2 cant use vlan routing to reach servers etc when it goes through SW2 (because SW2 cant ping the DG situated on SW1).

We've cleared all the usual cache, the configs have not changed at all, confirmed the hsrp groups are configured correctly, no error message ANYWHERE on console or from some debug commands, status lights on the 6500 chassis are green and OK, reseated the fibre connections on each switch, confirmed etherchannel is showing as OK, when you run a show ip route for a DG IP it knows to go through the etherchannel. When you run a sh cdp neigh on the switch module it cant see SW2 as a neighbour but if you go into the supervisor then it can see SW2's supervisor IP as a cdp neighbour? also, all inetrfaces are showing as up up.

We also failed over hsrp by forcing the priority over to the SW1 (which was having less issues than 2) and when we done that we could no longer see SW2. It was like it fell from the earth never to return icon_lol.gif I Changed the priorities back and it went back to its orginal state. We done the same on SW2 and again it happened as above.

Both core's have now been rebooted. Wasnt my idea but it was to put people minds at rest as they had been up for around 4 years now. Anyway, it came back up and is still the same.

Ive went round and tested alot of the client machines and found that regardless of which core switch access switches are going through its vlan 4 cant route off its subnet. as a temp measure we've moved all vlan 4 clients into vlan 3 and they work ok. It doesnt seem a routing issue though its enabled for that range etc and its strange because these have been running for 10 yrs and no one touches them and like i say the config has not changed so its strange why it would happen all of a sudden. also both core switches are having issues pinging IP's from all vlans not just 4 but operationally only clients on vlan 4 seem to be effected when it goes through sw2 to reach a virtual IP on sw1. for some reason clients which do the same but for other vlans are working OK - even though on that core switch it cannot ping that same virtual IP which these clients are going to!!

Also when i run show ip eigrp neigh from the switch module it shows no neighbours however if i run it from the supervisor it shows the other core switch's supervisor as its neighbour. btw, the sup is running catos and the switch module are running ios.

So, i know there are plenty of people around here with solid 6500 experiance and im running out of idea's. i hope i havent missed anything obvious but would welcome opinions.

Thanks guys.
Xbox Live: Bring It On

Bsc (hons) Network Computing - 1st Class
WIP: Msc advanced networking

Comments

  • EdTheLadEdTheLad Member Posts: 2,111 ■■■■□□□□□□
    This could be anything, you could have a memory leak, the port-channels could have gone screwy etc, first i'd check routing,forwarding,arp tables,logs,counters etc to try and find if and where traffic is being dropped.Is the dropped traffic common to one card? you said the system has been running for 10 years, card warranties are about 10 years so maybe you have a card failure.Do you have dfc's? if so data path is a little different and you can remote login to linecards are check asic counters.
    Probably time to start swapping hardware, if the issue is seen across multiple cards i'd start looking at the msfc.
    Last chance salon,contact Cisco support, not the kind of thing you will fix on a forum.
    Networking, sometimes i love it, mostly i hate it.Its all about the $$$$
  • nelnel Member Posts: 2,859 ■□□□□□□□□□
    EdTheLad wrote:
    This could be anything, you could have a memory leak, the port-channels could have gone screwy etc, first i'd check routing,forwarding,arp tables,logs,counters etc to try and find if and where traffic is being dropped.Is the dropped traffic common to one card? you said the system has been running for 10 years, card warranties are about 10 years so maybe you have a card failure.Do you have dfc's? if so data path is a little different and you can remote login to linecards are check asic counters.
    Probably time to start swapping hardware, if the issue is seen across multiple cards i'd start looking at the msfc.
    Last chance salon,contact Cisco support, not the kind of thing you will fix on a forum.

    Yeah i think we are looking at external support now. Thing is we have 2 new 6500's waiting to be installed next quarter, damn.

    Are there any commands specifically you would recommend outside the usual stuff? have to admit the 6500's are alittle above me at the minute on a technical level. icon_redface.gif

    The etherchannel looks like its spanned across two modules but at the time i couldnt see anything that was indicating a particular module or set of them. everything looked normal and operational. Would it still be possibly to have faulty module even though there status are green and they all pass the diagnostics upon bootup?

    Also on a seprate note: would anyone recommend some gd materials on the web to become a master with this type of beast (6500's) - or at least understand it a little better. Both hardrware and software
    Xbox Live: Bring It On

    Bsc (hons) Network Computing - 1st Class
    WIP: Msc advanced networking
  • cisco_troopercisco_trooper Too many Member Posts: 1,441 ■■■■□□□□□□
    How are switches 1 and 2 connected? Are they trunked? Are the proper VLANs allowed to traverse that trunk?
  • nelnel Member Posts: 2,859 ■□□□□□□□□□
    How are switches 1 and 2 connected? Are they trunked? Are the proper VLANs allowed to traverse that trunk?

    Yes and Yes.
    Xbox Live: Bring It On

    Bsc (hons) Network Computing - 1st Class
    WIP: Msc advanced networking
  • nelnel Member Posts: 2,859 ■□□□□□□□□□
    Hi Guys,

    A question about HSRP as a follow on to this issue.

    Like i said above both 6500's could not ping IP's in hsrp groups - Virtual and interface IP's. However ive statically defined the ARP enteries and for some reason i can now ping most of the IP's from each 6500 bar 2 remaining virtual IP's for 2 seperate groups. Im not sure why the dynamic ones learnt dont work.

    I run a traceroute and when going through sw1 it goes across to sw2 and stops at the interface which is active for the hsrp group and i get no response. however i can ping the interface IP's in each group on both switches backwards and forwards and the arp enteries are correct.

    Anyone have any idea's on this?

    i havent been able to debug arp as it will crash the 6500's during the day.

    Thanks
    Xbox Live: Bring It On

    Bsc (hons) Network Computing - 1st Class
    WIP: Msc advanced networking
Sign In or Register to comment.