Spanning tree putting interfaces in err disable
As I am studying for the BCMSN/switching exam this problem has interested me.
Below is a short example of the topology, with multilayer switch 1 as the root bridge. The switches at the top reside in a server rack connected redundantly at 10Gb. The switches to the right reside in another rack and the switch at the bottom is the access layer.

Originally the 10Gb switches were connected to multilayer switch 0 only. When they were connected to the root bridge as well, the 40Gb between the 2 cores went into error disable and spanning tree reconverged across the whole network.
Now I assume the BPDU's met at the 40Gb link and so is where it decided to stop the loop with out taking into consideration path costs to the root bridge and speed of the links?
If the 10Gb switches were connected to the root bridge first and then to the secondary, would the 40Gb link still have gone into error disable?
How would you go about ensuring the 40Gb link never goes into error disable/blocked due to spanning tree, while ensuring you never get a loop with the redundant network? Would BPDU filter on the port group ensure the loop is then blocked in the correct location (as shown in the picture)? The book states you should only enable it on ports that have a single host connected and loops are impossible. Which obviously isn't the case.
Below is a short example of the topology, with multilayer switch 1 as the root bridge. The switches at the top reside in a server rack connected redundantly at 10Gb. The switches to the right reside in another rack and the switch at the bottom is the access layer.

Originally the 10Gb switches were connected to multilayer switch 0 only. When they were connected to the root bridge as well, the 40Gb between the 2 cores went into error disable and spanning tree reconverged across the whole network.
Now I assume the BPDU's met at the 40Gb link and so is where it decided to stop the loop with out taking into consideration path costs to the root bridge and speed of the links?
If the 10Gb switches were connected to the root bridge first and then to the secondary, would the 40Gb link still have gone into error disable?
How would you go about ensuring the 40Gb link never goes into error disable/blocked due to spanning tree, while ensuring you never get a loop with the redundant network? Would BPDU filter on the port group ensure the loop is then blocked in the correct location (as shown in the picture)? The book states you should only enable it on ports that have a single host connected and loops are impossible. Which obviously isn't the case.
Comments
Which appears to be confirmed from your diagram based on the fact that the Root Ports on the 10GB switches are the ports which are directly connected to M1. As well as the 1GB connection (switches). It would make sense because the cost of that port is going to be lower than going through one of the other switches (due to your configuration.) So that port is going to be elected as the root port and put the other port into blocking state to prevent loops
Can you post a show spanning-tree here on the Multi-layer switches?
M1 is configured as the root bridge and stayed as the root bridge. M0 is configured as the secondary.
Depends what you consider the problem is? The network is fully functioning as shown in the diagram, minus redundancy in one of the 10Gb racks (is empty at the moment anyway). But if the 10Gb link is connected again it could result in the 40Gb link going into err-disable as before.
Is this simulated or done in a real lab?
This sounds like an excellent lab for me to test out. I have just enough switches to lab this up.....
But I'm going to have to do it quick, or my power bill is going to doom me!!!
FYI: Did you make this lab up or was it from a book somewhere? Just curious....
Ok, well except for the 10GB links part, hopefully that shouldn't change the way the lab works all that much.
You basically have 10GB for the access links and one 40GB for your etherchannel.
In theory, I should be ok with 100MB for my access links with 40GB as my etherchannel.
I'm curious what the expected behavior should be for this diagram.
I went ahead and printed it out so I can write all over it. I'm off to a BBQ so I won't be able to get to it today but I hope to soon.
Would you mind if I made a blog post about this?
Is there any manual configuration done on the switches other than the etherchannel?
My understanding is my diagram is the expected behaviour and it should always block the port the access side and never the link between the root bridge and secondary?
As long as it is kept in general, I look forward to reading it.
The etherchannel effectively is the DP for the root switch.
All links connected to the root switch should be DP and so forwarding state...
Primary and secondary root bridges are defined. All other spanning-tree settings are left to defaults. Only config on the ports are trunk, native vlan and QoS. This is the same for all 1Gb and 10Gb ports.
From a quick scan (will take days to check all ports for all vlans!)they are all in Desg Forward.
Would it be possible to see the output of "sh spanning-tree" on M1, M0 and Switch7?
I wasn't aware you were running rapid, thought it was just normal STP...
This is good to know though...
No!
Are there any comments on how to prevent this happening in future? Especially on BPDU filter on the port channel?
Do you have the logs or anything AT all you can throw us to give us some way to help you figure out what's going on?
Logs were overwritten on the switch by the time we got to them.
Sent from my phone.
the end result was we moved to the latest and greatest for the 2960's and then all seem'ed to work.
Just a thought
J
Any chance you can give us this info? Kind of hard to help without knowing how things are configured.
Root ID Priority 8193
Address ...
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 8193 (priority 8192 sys-id-ext 1)
Address ...
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300
Interface Role Sts Cost Prio.Nbr Type
---- ---
Gi1/1 Desg FWD 4 128.1 P2p
[repeat for every interface on every vlan...]
interface Port-channel1
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos
interface TenGigabitEthernet5/4
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos
channel-group 1 mode active
[repeat for the 4 10gig interfaces]
The log on the root switch was already overwritten by the time we got to look at it. Here is a line from a access layer switch.
%SPANTREE-2-UNBLOCK_CONSIST_PORT: Unblocking GigabitEthernet5/1
on VLAN0789. Port consistency restored.
5/1 goes to the root bridge. 6/1 goes to the secondary root bridge. All the lines are for 5/1, per vlan.
Causes of Errdisable
This feature was first implemented to handle special collision situations in which the switch detected excessive or late collisions on a port. Excessive collisions occur when a frame is dropped because the switch encounters 16 collisions in a row. Late collisions occur after every device on the wire should have recognized that the wire was in use. Possible causes of these types of errors include:
- A cable that is out of specification (either too long, the wrong type, or defective)
- A bad network interface card (NIC) card (with physical problems or driver problems)
- A port duplex misconfiguration
There are various reasons for the interface to go into errdisable. The reason can be:A port duplex misconfiguration is a common cause of the errors because of failures to negotiate the speed and duplex properly between two directly connected devices (for example, a NIC that connects to a switch). Only half-duplex connections should ever have collisions in a LAN. Because of the carrier sense multiple access (CSMA) nature of Ethernet, collisions are normal for half duplex, as long as the collisions do not exceed a small percentage of traffic.
- Duplex mismatch
- Port channel misconfiguration
- BPDU guard violation
- UniDirectional Link Detection (UDLD) condition
- Late-collision detection
- Link-flap detection
- Security violation
- Port Aggregation Protocol (PAgP) flap
- Layer 2 Tunneling Protocol (L2TP) guard
- DHCP snooping rate-limit
- Incorrect GBIC / Small Form-Factor Pluggable (SFP) module or cable
- Address Resolution Protocol (ARP) inspection
- Inline power
Note: Error-disable detection is enabled for all of these reasons by default. In order to disable error-disable detection, use the no errdisable detect cause command. The show errdisable detect command displays the error-disable detection status.Errdisable Port State Recovery on the Cisco IOS Platforms - Cisco Systems
http://www.mashtronauts.com
They are identical except for HSRP standby (for all vlans) and spanning tree priority set for secondary.
We will have syslog again when we get another server for it...
Any tutorials out there for putting in syslog? Does it cause any downtime in a production environment?
I'm sure there is a tutorial out there some where. Nothing to it really. No downtime unless you hit some crazy bug or your CPU is pegged. A (very)short config example below.
Thats all you need. You can of course tweak a little with options like source IP and logging level.
It looks like what happen is that there was an disagreement on who was the root, until what you configured as root bridge lower its priority again to be the root again.