Spanning tree putting interfaces in err disable

Trifidw · June 2010

As I am studying for the BCMSN/switching exam this problem has interested me.

Below is a short example of the topology, with multilayer switch 1 as the root bridge. The switches at the top reside in a server rack connected redundantly at 10Gb. The switches to the right reside in another rack and the switch at the bottom is the access layer.

Originally the 10Gb switches were connected to multilayer switch 0 only. When they were connected to the root bridge as well, the 40Gb between the 2 cores went into error disable and spanning tree reconverged across the whole network.

Now I assume the BPDU's met at the 40Gb link and so is where it decided to stop the loop with out taking into consideration path costs to the root bridge and speed of the links?

If the 10Gb switches were connected to the root bridge first and then to the secondary, would the 40Gb link still have gone into error disable?

How would you go about ensuring the 40Gb link never goes into error disable/blocked due to spanning tree, while ensuring you never get a loop with the redundant network? Would BPDU filter on the port group ensure the loop is then blocked in the correct location (as shown in the picture)? The book states you should only enable it on ports that have a single host connected and loops are impossible. Which obviously isn't the case.

billscott92787 · June 2010

IS the 40GB etherchannel between the two switches M0 and M1 still in error disabled state? Is it staying there is what I mean to say. Because if not, I would assume based on your explanation that this is going to happen, based on the fact that when connected to M1, the BPDU's being send out by M1 had a lower BID than M0, which caused spanning-tree to re-converged and M1 to become the root bridge, where as before the root bridge would have been M0 (depending if you altered the priority settings on the switch). What type of configuration do you have on the multilayer switches? Are you leaving the default STP priority or have you changed it?

Which appears to be confirmed from your diagram based on the fact that the Root Ports on the 10GB switches are the ports which are directly connected to M1. As well as the 1GB connection (switches). It would make sense because the cost of that port is going to be lower than going through one of the other switches (due to your configuration.) So that port is going to be elected as the root port and put the other port into blocking state to prevent loops

Can you post a show spanning-tree here on the Multi-layer switches?

Trifidw · June 2010

A shut no shut sorted out the error disable state, but the secondary 10gb link (not the one to the root bridge) has been disconnected.

M1 is configured as the root bridge and stayed as the root bridge. M0 is configured as the secondary.

notgoing2fail · June 2010

Is this problem solved now?

Trifidw · June 2010

notgoing2fail wrote: »

Is this problem solved now?

Depends what you consider the problem is? The network is fully functioning as shown in the diagram, minus redundancy in one of the 10Gb racks (is empty at the moment anyway). But if the 10Gb link is connected again it could result in the 40Gb link going into err-disable as before.

notgoing2fail · June 2010

Trifidw wrote: »

Depends what you consider the problem is? The network is fully functioning as shown in the diagram, minus redundancy in one of the 10Gb racks (is empty at the moment anyway). But if the 10Gb link is connected again it could result in the 40Gb link going into err-disable as before.

Is this simulated or done in a real lab?

This sounds like an excellent lab for me to test out. I have just enough switches to lab this up.....

But I'm going to have to do it quick, or my power bill is going to doom me!!!

FYI: Did you make this lab up or was it from a book somewhere? Just curious....

Trifidw · June 2010

Unfortuantely it was a live network (I wish I had 10Gb links for a lab!) I will probably put it together with some spare 3560/3750's at work, have a 2Gb ether channel between L3 switches and 1Gb redundant to the rack switches and a couple of 100meg to the edge. I also want to test out flex links.

notgoing2fail · June 2010

Trifidw wrote: »

Unfortuantely it was a live network (I wish I had 10Gb links for a lab!) I will probably put it together with some spare 3560/3570's at work, have a 2Gb ether channel between L3 switches and 1Gb redundant to the rack switches and a couple of 100meg to the edge. I also want to test out flex links.

Ok, well except for the 10GB links part, hopefully that shouldn't change the way the lab works all that much.

You basically have 10GB for the access links and one 40GB for your etherchannel.

In theory, I should be ok with 100MB for my access links with 40GB as my etherchannel.

I'm curious what the expected behavior should be for this diagram.

I went ahead and printed it out so I can write all over it. I'm off to a BBQ so I won't be able to get to it today but I hope to soon.

Would you mind if I made a blog post about this?

networker050184 · June 2010

Let us see your port config for those that got disabled. Did you see anything in the log?

notgoing2fail · June 2010

Also as billscott mentioned, are your switches all left at default settings?

Is there any manual configuration done on the switches other than the etherchannel?

Trifidw · June 2010

notgoing2fail wrote: »

I'm curious what the expected behavior should be for this diagram.

Would you mind if I made a blog post about this?

My understanding is my diagram is the expected behaviour and it should always block the port the access side and never the link between the root bridge and secondary?

As long as it is kept in general, I look forward to reading it.

notgoing2fail · June 2010

Well what's strange is that once MS1 becomes root bridge, the etherchannel should stay up.

The etherchannel effectively is the DP for the root switch.

All links connected to the root switch should be DP and so forwarding state...

Trifidw · June 2010

notgoing2fail wrote: »

Also as billscott mentioned, are your switches all left at default settings?

Is there any manual configuration done on the switches other than the etherchannel?

Primary and secondary root bridges are defined. All other spanning-tree settings are left to defaults. Only config on the ports are trunk, native vlan and QoS. This is the same for all 1Gb and 10Gb ports.

From a quick scan (will take days to check all ports for all vlans!)they are all in Desg Forward.

EliZ_ · June 2010

This might be a totally wrong line of thought, but what mode is running of spanning tree?

Would it be possible to see the output of "sh spanning-tree" on M1, M0 and Switch7?

Trifidw · June 2010

Rapid PVST+, and so the output would be huge from show spanning-tree. M1 is the root bridge for all vlans. All ports are as shown in the diagram, per vlan.

notgoing2fail · June 2010

Trifidw wrote: »

Rapid PVST+, and so the output would be huge from show spanning-tree. M1 is the root bridge for all vlans. All ports are as shown in the diagram, per vlan.

I wasn't aware you were running rapid, thought it was just normal STP...

This is good to know though...

billscott92787 · June 2010

+2 can we get a show spanning tree, papapappaplease

Trifidw · June 2010

billscott92787 wrote: »

+2 can we get a show spanning tree, papapappaplease

No!

If there was a character limit for posts, I'm sure it would hit it! It wouldn't tell you anything else that I haven't already posted.

Are there any comments on how to prevent this happening in future? Especially on BPDU filter on the port channel?

billscott92787 · June 2010

I think it would help but if you say so, then whatever floats your boat!

I don't really see how BPDU filter would help you in this case since. I mean you already have M1 as the ROOT and M0 is the secondary. So really I don't see what would be throwing your Etherchannel into a down and error state. You don't have any kind of port-security on there do you? Also, what if you just run STP instead of Rapid PVST? Do you get the same result? Just out of curiosity or do you not have the ability to change it?

Do you have the logs or anything AT all you can throw us to give us some way to help you figure out what's going on?

Trifidw · June 2010

No port security on the port channel. I am able to change it, but the change will not be authorised and the site is 24/7 so no out of hours for testing. Our propsal is most likely to be moving to vss.

Logs were overwritten on the switch by the time we got to them.

Sent from my phone.

BADfish10 · June 2010

what switch OS are you running i have had loads of issues with smaller scale but ultimatly the same thing using 3560's connected via ether channel and then several 2960's with redundant links.
the end result was we moved to the latest and greatest for the 2960's and then all seem'ed to work.
Just a thought

J

networker050184 · June 2010

networker050184 wrote: »

Let us see your port config for those that got disabled. Did you see anything in the log?

Any chance you can give us this info? Kind of hard to help without knowing how things are configured.

Trifidw · June 2010

Spanning tree enabled protocol rstp
Root ID Priority 8193
Address ...
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 8193 (priority 8192 sys-id-ext 1)
Address ...
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300

Interface Role Sts Cost Prio.Nbr Type

---- ---

Gi1/1 Desg FWD 4 128.1 P2p
[repeat for every interface on every vlan...]

interface Port-channel1
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos

interface TenGigabitEthernet5/4
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos
channel-group 1 mode active

[repeat for the 4 10gig interfaces]

The log on the root switch was already overwritten by the time we got to look at it. Here is a line from a access layer switch.
%SPANTREE-2-UNBLOCK_CONSIST_PORT: Unblocking GigabitEthernet5/1
on VLAN0789. Port consistency restored.

5/1 goes to the root bridge. 6/1 goes to the secondary root bridge. All the lines are for 5/1, per vlan.

Cyanic · June 2010

I would think the first thing to do is figure out why the ports are going into error disable mode. Then focus on the specific problem.

Causes of Errdisable

This feature was first implemented to handle special collision situations in which the switch detected excessive or late collisions on a port. Excessive collisions occur when a frame is dropped because the switch encounters 16 collisions in a row. Late collisions occur after every device on the wire should have recognized that the wire was in use. Possible causes of these types of errors include:

A cable that is out of specification (either too long, the wrong type, or defective)
A bad network interface card (NIC) card (with physical problems or driver problems)
A port duplex misconfiguration
A port duplex misconfiguration is a common cause of the errors because of failures to negotiate the speed and duplex properly between two directly connected devices (for example, a NIC that connects to a switch). Only half-duplex connections should ever have collisions in a LAN. Because of the carrier sense multiple access (CSMA) nature of Ethernet, collisions are normal for half duplex, as long as the collisions do not exceed a small percentage of traffic.

There are various reasons for the interface to go into errdisable. The reason can be:

Duplex mismatch
Port channel misconfiguration
BPDU guard violation
UniDirectional Link Detection (UDLD) condition
Late-collision detection
Link-flap detection
Security violation
Port Aggregation Protocol (PAgP) flap
Layer 2 Tunneling Protocol (L2TP) guard
DHCP snooping rate-limit
Incorrect GBIC / Small Form-Factor Pluggable (SFP) module or cable
Address Resolution Protocol (ARP) inspection
Inline power

Note: Error-disable detection is enabled for all of these reasons by default. In order to disable error-disable detection, use the no errdisable detect cause command. The show errdisable detect command displays the error-disable detection status.

Errdisable Port State Recovery on the Cisco IOS Platforms - Cisco Systems

networker050184 · June 2010

Can we see a show spanning-tee summary? I'm guessing root gaurd had something to do with it. And look into syslog! A few minutes to set it up and you would have the log message you need to know what happened. We have had a few weird STP issues that we nor Cisco were never able to figure out though, so it may remain a mystery.

billscott92787 · June 2010

What's the config on M0? For port channel and everything can you post that here from the switch? Remove whatever you need to remove to CYA. :P

Trifidw · June 2010

billscott92787 wrote: »

What's the config on M0? For port channel and everything can you post that here from the switch? Remove whatever you need to remove to CYA. :P

They are identical except for HSRP standby (for all vlans) and spanning tree priority set for secondary.

We will have syslog again when we get another server for it...

chmorin · June 2010

networker050184 wrote: »

Can we see a show spanning-tee summary? I'm guessing root gaurd had something to do with it. And look into syslog! A few minutes to set it up and you would have the log message you need to know what happened. We have had a few weird STP issues that we nor Cisco were never able to figure out though, so it may remain a mystery.

Any tutorials out there for putting in syslog? Does it cause any downtime in a production environment?

networker050184 · June 2010

chmorin wrote: »

Any tutorials out there for putting in syslog? Does it cause any downtime in a production environment?

I'm sure there is a tutorial out there some where. Nothing to it really. No downtime unless you hit some crazy bug or your CPU is pegged. A (very)short config example below.

logging [IP of server]

Thats all you need. You can of course tweak a little with options like source IP and logging level.

techED · June 2010

Looks like you the “spanning vlan # root <primary|secondary>,” instead of “spanning vlan #priorty #” to. When you use the first command “spanning vlan # root <primary|secondary>” the switch will lower its priority automatically to beat out the best BPDU.

It looks like what happen is that there was an disagreement on who was the root, until what you configured as root bridge lower its priority again to be the root again.

Trifidw wrote: »

Spanning tree enabled protocol rstp
Root ID Priority 8193
Address ...
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 8193 (priority 8192 sys-id-ext 1)
Address ...
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300

Interface Role Sts Cost Prio.Nbr Type
---- ---
Gi1/1 Desg FWD 4 128.1 P2p
[repeat for every interface on every vlan...]

interface Port-channel1
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos

interface TenGigabitEthernet5/4
description [removed]
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan [removed]
switchport mode trunk
mls qos trust cos
channel-group 1 mode active

[repeat for the 4 10gig interfaces]

The log on the root switch was already overwritten by the time we got to look at it. Here is a line from a access layer switch.
%SPANTREE-2-UNBLOCK_CONSIST_PORT: Unblocking GigabitEthernet5/1
on VLAN0789. Port consistency restored.

5/1 goes to the root bridge. 6/1 goes to the secondary root bridge. All the lines are for 5/1, per vlan.

billscott92787 · June 2010

I don't see anywhere that shows him using the primary | secondary commands, is that what you are using or did you set the priority to 8192 by using spanning-tree vlan # priority ?

Spanning tree putting interfaces in err disable

Comments