LACP troubleshoot need some help :D
NightShade1
Member Posts: 433 ■■■□□□□□□□
in CCIE
so i have this scenario
switch
switch
I configured LACP to have reduncandy so yeah 2 cables going trough 1 switch to the other
now it seems something went wrong.....
let say i got like 6 vlans going trough that virtual trunk between those 2 cables and let say i got admistration going trough vlan mmm 39 that goes though that trunk also.
so i got 2 interfaces vlans one in both switches for example
192.168.1.1 /30 in one switch in interface vlan 39
192.168.1.2/30 in the ohter switch in interface vlan 39
so yeah all the other vlans are fine.... but well i Cannot ping 192.168.1.2 from 192.168.1.1
okay when i connect both cables it start working for like a minute or 2 and then the pings start failing, just that vlan is failing!
Then if i disconnect one of them it start working again...
im using 2 allied telesis switches if anyone knows what could be happening would be great smile.gif
if the config is needed or a part i can post it tho, but well as soon as i get into my work i can post it. Got a free day but it bothers me having something i cant fix :S
ummmm just want some ideas of what i could check when i get back
What it got me really confused is that just one vlan is failing and all the others are working just fine.....
Any ideas?
PD: Seriusly there should be a troubleshoooting forum in here would be a nice idea i think
switch
switch
I configured LACP to have reduncandy so yeah 2 cables going trough 1 switch to the other
now it seems something went wrong.....
let say i got like 6 vlans going trough that virtual trunk between those 2 cables and let say i got admistration going trough vlan mmm 39 that goes though that trunk also.
so i got 2 interfaces vlans one in both switches for example
192.168.1.1 /30 in one switch in interface vlan 39
192.168.1.2/30 in the ohter switch in interface vlan 39
so yeah all the other vlans are fine.... but well i Cannot ping 192.168.1.2 from 192.168.1.1
okay when i connect both cables it start working for like a minute or 2 and then the pings start failing, just that vlan is failing!
Then if i disconnect one of them it start working again...
im using 2 allied telesis switches if anyone knows what could be happening would be great smile.gif
if the config is needed or a part i can post it tho, but well as soon as i get into my work i can post it. Got a free day but it bothers me having something i cant fix :S
ummmm just want some ideas of what i could check when i get back
What it got me really confused is that just one vlan is failing and all the others are working just fine.....
Any ideas?
PD: Seriusly there should be a troubleshoooting forum in here would be a nice idea i think
Comments
-
networker050184 Mod Posts: 11,962 ModSounds like an ARP issue to me. I'd check the ARP table when successful and when it fails to see if the entry is missing.An expert is a man who has made all the mistakes which can be made.
-
NightShade1 Member Posts: 433 ■■■□□□□□□□heh thats why i like posting in ccie forums, answers are soo fast
ok ill check that, any other ideas? -
EMcCaleb Member Posts: 63 ■■■□□□□□□□It shouldn't have anything to do with ARP since a switch doesn't learn about VLANS via ARP. You may want to ensure no type of loop is occurring. Since you disconnecting any single cable fixes the issue it sure sounds like a loop.
E -
networker050184 Mod Posts: 11,962 ModIt shouldn't have anything to do with ARP since a switch doesn't learn about VLANS via ARP. You may want to ensure no type of loop is occurring. Since you disconnecting any single cable fixes the issue it sure sounds like a loop.
E
It doesn't learn about the VLAN itself via ARP but it will send an ARP request when you ping the IP address which is what the op is doing.
Could be a loop, but that would probably effect more than one SVI's IP address I would think.An expert is a man who has made all the mistakes which can be made. -
NightShade1 Member Posts: 433 ■■■□□□□□□□yeah i thought it would be a loop or something but it just affecting one vlan.... and a loop i think it would affect more than 1 vlan, but well you a CCIE and im a newbie still got a long way to go hehe xD explain me how it can affectjust one vlan it is possible Caleb?
I ll also recheck all the config when i get back to work at friday.
but i dont think there would be anything wrong with it as its a simple config tho and it worked just fine in a lab that was done before implementing it. -
dtlokee Member Posts: 2,378 ■■■■□□□□□□A couple things to check, are the links actually becoming bonded into a single logical link via LACP? Is spanning tree turned on for the affected VLAN on both switches?The only easy day was yesterday!
-
EMcCaleb Member Posts: 63 ■■■□□□□□□□networker050184 wrote: »It doesn't learn about the VLAN itself via ARP but it will send an ARP request when you ping the IP address which is what the op is doing.
Could be a loop, but that would probably effect more than one SVI's IP address I would think.
My (attempted) reasoning was that an ARP request is sent intially to bind l2/l3. He is pinging successfully initially which would indicate that ARP worked. At that point he wouldn't fail because of ARP while running a continuous ping since there would be no chance of a ARP cache timeout.
Is possible to create a second SVI on both switches in a different VLAN? Then ping continuously between switches to both VLANS? Now, see if they BOTH fail at the same time or if one fails while the other continues etc. You say that only this VLAN 39 fails while everything else is ok. Have you tried stressing other VLANS in the same manner?
Also, on the ports within the channel group, are you able to look at packets-input/packet-output and see if they jive with the amount of packets that should be going through them? Or do their numbers increment exponentiation as though some kind of amplification (ie loop) could be occurring?
Finally, perhaps there is a software bug or buffer issue that is killing the channel. So, instead of sending a typical ping send pings of greater size and speed (reduce the timeout to zero). Does this effect the the rate of failure? Within these things I assume the solution can be found.
HTH,
Ernest -
networker050184 Mod Posts: 11,962 ModAgreed if a continuous ping is failing its not ARP.
Try the continuous ping test and run some debugs to see whats going on.An expert is a man who has made all the mistakes which can be made. -
ITdude Member Posts: 1,181 ■■■□□□□□□□Man, you guys even work on holidays?! You beat me to a response.I usually hang out on 224.0.0.10 (FF02::A) and 224.0.0.5 (FF02::5) when I'm in a non-proprietary mood.
__________________________________________
Simplicity is the ultimate sophistication.
(Leonardo da Vinci) -
NightShade1 Member Posts: 433 ■■■□□□□□□□Actually in my last job in another NOC i used to work in holadays haha when i was a technician
mmmm as an egnineer i dont work at holidays mmm just if its really needed, anyways keep the ideas coming now i got a lot ot test and i really thanks you all guys for your responses helping this newbie i really appreciate it a lot.
EmcCalep ill try all that tomorrow thanks you -
NightShade1 Member Posts: 433 ■■■□□□□□□□I had some time today to do some test
And i did some debugs to the LACP and there wasnt anything weird on it
I changed the administration vlan and it happened the same, it wasnt possible doing the 2 interface vlans test as the switch was just a L2 switch so that means yeah just one interface vlan allowed for administration so i just changed the interface vlan and the vlan and it happened the same... the ping failed afte rlike 3 mins....
If i clean the arp cache it start pinging again.... and works for 3 more mins and so on....
So well that was at the end of the day... i guess i could fix it by changing the time out of arp value to a small timeout value.... but ok i guess that would fix the problem but that doesnt tell me whats the problem.....
When you just cleared the arp table and the ping is working... you can even telnet the switch and all that
Wheni tried the lacp debugs it doesnt seems the channel is failing its seems like its okay
The diagram is something liek this and i say sorry because of my diagram up that was my diagram for my lab i did... umm
Switch1
DSLAM
Switch2
On switch 1 and DSLAM i got the LACP
Between DSLAM and switch2 i just got a single port
Between Switch 1 and switch 2 i got Interface vlans for administration
Switch 1 Interface vlan 39 with ip 192.168.1.1
Switch 2 interface vlan 39 siwth ip 192.168.1.2
and of course all the equipment got the vlan configured
Does this change your thoughs in any way? -
EMcCaleb Member Posts: 63 ■■■□□□□□□□NightShade1 wrote: »I had some time today to do some test
And i did some debugs to the LACP and there wasnt anything weird on it
I changed the administration vlan and it happened the same, it wasnt possible doing the 2 interface vlans test as the switch was just a L2 switch so that means yeah just one interface vlan allowed for administration so i just changed the interface vlan and the vlan and it happened the same... the ping failed afte rlike 3 mins....
If i clean the arp cache it start pinging again.... and works for 3 more mins and so on....
So well that was at the end of the day... i guess i could fix it by changing the time out of arp value to a small timeout value.... but ok i guess that would fix the problem but that doesnt tell me whats the problem.....
When you just cleared the arp table and the ping is working... you can even telnet the switch and all that
Wheni tried the lacp debugs it doesnt seems the channel is failing its seems like its okay
You still think its a loop???? i can try more test tmorrow can anyone suje
So, here is what we know.
1. When only 1 cable is connected there is never an issue.
2 When 2 cables are connected there's no issue for 3 minutes.
3. When the issue presents itself the fix is to disconnect 1 of the cables or clear the ARP cache. With the latter the issue resumes after 3 minutes.
Doesn't sound like a loop to me, but boy, this is a good one. Once the pings begin to fail, and you look at the ARP cache, does the ARP cache on both devices possess the accurate MAC addresses?
What's interesting is that even if you do NOT clear the arp cache but simply disconnect one of the cables it works.
With Cisco, only 1 link would be used in any given direction because of Cisco's methods of load balancing across a channel group. I'm not sure this holds true with the vender you are using but could their be an issue with a particular link? Have you moved the channel group to other ports to see if the issue follows?
Ernest -
NightShade1 Member Posts: 433 ■■■□□□□□□□Caleb i thanks your for asnwering me
Ill talk with my supervisor tomorrow to see i will show him your thoughs tho
"What's interesting is that even if you do NOT clear the arp cache but simply disconnect one of the cables it works."
If i disconect the cable woudnt that make the switch refressh the arp cache? just wondering....
will have to check the arp cache before and after
ill see if there is any difference in the arp table before and after