Fredrik's CCNP thread

fredrikjj · December 2013

I've taken notes on the first 60 pages of the book. The first chapter contained some basics on mac address learning, differences between layer 2,3,4 switching, intro to CAM/TCAM, campus hierarchical design, and things of that nature. All books seem to have these intro chapters, and I can't say that I'm a huge fan. Chapter 2 deals with VLANs and VTP, and I've covered most of the VLAN section at this point.

fredrikjj · December 2013

I finished the notes on chapter 2. Next chapter is STP.

fredrikjj · December 2013

I'll finish the first STP chapter today, and my most immediate thought is that I probably underestimated it going in. Anyway, I've made a plan to finish the book and the notes before the end of the year which should give me plenty of time to prepare for a late January exam date. I'll be going to a remote cabin over Christmas (yes, really) where'll do most of the remaining 300 pages.

fredrikjj · December 2013

Anyone read the new 6th edition of Comer's Internetworking with TCP/IP? The recent ARP related thread made realize how weak my fundamentals are so something like that should probably be my first book after CCNP, but it's $120 at Amazon which is ridiculous.

fredrikjj · January 2014

I finished my notes on CCNP Simplified, and started working on the Lab Manual (with IOU). IOU does seem to have some issue with a few of the technologies, but it’s probably good enough. I looked into renting a rack and as long as you don’t need anything beyond a few switches, it’s very affordable. I like to keep going back and forth between the devices, notes and books, etc, which makes rack rentals less than ideal, but if IOU starts **** me off I guess I’ll have to go that route.

I would like to be ready to take the exam early next month, but who knows at this point. STP in particular seems somewhat complex and might take a while to get a grip on.

fredrikjj · January 2014

Okay, so I've done most of the stuff in the SLM. The next step will be to actually properly learn each piece of technology to the level where I would comfortably pass the exam. I'll start off with the core stuff like vlans and the spanning tree variations. My main resources will be the notes I've taken, the 3560 configuration guide and (probably) some extra stuff on spanning tree.

joetest · January 2014

I'm on the same path using CCNP Simplified and doing intensive notes too. Planning on doing the SLM afterwards in GNS3 as much as possible together with PT.

Hope you pass. Let's know how it goes

fredrikjj · January 2014

"LAN segment" is a very confusing term. In the context of STP it seems to mean collision domain, but we don't use hubs anymore so all segments are p2p links, right?

joetest · January 2014

p2p links is right on the money yes.. In my mind a LAN segment is something that's connected to the LAN somehow. I don't know if you can get more specific than that.

fredrikjj · January 2014

joetest wrote: »

p2p links is right on the money yes.. In my mind a LAN segment is something that's connected to the LAN somehow. I don't know if you can get more specific than that.

Presumably, the term is used because back in the day, you had hubs, and a hub didn't segment the LAN.

fredrikjj · January 2014

I wrote a short summary on VLANs and VTP, the second chapter of the textbook. If something's wrong or misleading, don't hesitate to point it out.

VLANs

Without VLANs, the Ethernet network is one large broadcast domain, meaning that all broadcasts and unknown unicast/multicasts reach all physically connected hosts. VLANs group switch ports into different logical broadcast domains where a host in one VLAN can't communicate with a host in another VLAN without going through a L3 device.

With VLANs implemented, there are two main types of switch ports:

Access port – a port belonging to a single vlan
Trunk port – a port belonging to multiple vlans

VLANs are assigned to ports in one of three ways:

Manually, by an administrator
Dynamically, by something called a VLAN Management Policy Server, based on the MAC address of the host.

VLANs are identified by a number, or VLAN-ID, between 0 and 4095, with some of these numbers being unavailable for use:

0: reserved for 802.1p
1: default vlan – can be used but not deleted
2-1001: the normal range – can be used, deleted, modified, etc
1002-1005: reserved for legacy token ring and fddi support
1006-4094: the extended range. Mostly works like the normal range, but unsupported in some cases like with VTPv1 and v2.
4095: cannot be used

Trunking:

Trunking refers to carrying frames belonging to multiple vlans within a single physical cable. To accomplish this, the switches need some method to separate the frames. There are two such methods:

Inter-switch Link
802.1Q

ISL is a Cisco proprietary encapsulation method where the frame is given a new 26 byte ISL header and a 4 byte FCS trailer. It is not supported on new platforms because it no longer offers any advantages over 802.1Q, and can be considered legacy. Back in the day, it was required when using Cisco's PVST, but that restriction was removed with PVST+.

802.1Q inserts a 4 byte “tag” into the original frame and therefore must recalculate the FCS before sending the frame across the trunk link. The tag contains 4 different fields: tag protocol identifier, user priority, canonical format indicator, vlan identifier. The vlan identifier field is 12 bits which is what limits the number of vlans to 4095.

802.1Q has a feature called the native vlan where a single vlan can be sent “untagged” across the link, which emulates connecting a normal access port between the switches. It can be disabled with sw(config)#vlan dot1q tag native. Disabled in the sense that the native vlan now carries the dot1q tag as well.

Creating and disabling VLANs:

Normally done with the sw(config)#vlan [vlan-id] command, but you can also use the legacy vlan database mode if you are a hipster.

Once created with switch(config)#vlan [vlan-id] you enter vlan config mode where you can change variables such as the name: switch(config-vlan)#name [name]

There are two ways of preventing a vlan from carrying traffic:

Shutting the vlan down with switch(config)#shutdown vlan [number], or typing shutdown in the vlan mode for that particular vlan. This locally shuts the vlan down and it enters the “act/lshut” state, verified by #show vlan
Suspending the vlan with switch(config-vlan)#state suspend. If VTP is used, this is propagated throughout the VTP domain as opposed to shutting it down which is only locally significant.

Configuring an access port:

sw(config-if)#switchport (if the port is an L3 port this makes it L2)
sw(config-if)#switchport mode access
sw(config-if)#switchport access vlan [vlan-id]

You can also end up with an access port if two ports in the dynamic auto state are connected. They will fail to negotiate a trunk and fall back to access.

Configurating a trunk port:

It becomes a slightly more complicated than an access port due to the presence of the Dynamic Trunking Protocol. The purpose of DTP is to dynamically create trunk links and select an appropriate encapsulation type (ISL or dot1q) for said trunk.

To not involve DTP at all in the creation of the trunk you need to issue the following commands:

switch(config-if)#switchport trunk encapsulation [isl|dot1q]
switch(config-if)#switchport mode trunk
switch(config-if)#switchport nonegotiate

Identical configuration must be issued on the other side of the link. The main issue with this is that it becomes more difficult to verify that the trunk's actually up.

With DTP, you have two modes available: “desirable” and “auto”. Desirable will actively try to initiate a trunk connection with another port, and auto will passively listen to another Desirable (or Trunk) port.

Depending on platform, dynamic desirable or dynamic auto will be the default on a switch port with no “mode” command configured.

Note that switchport mode trunk by itself doesn't disable DTP the way switchport mode access disables it. Mode trunk initiates a trunk just as mode desirable does. This is honestly a bit strange because it essentially means that it's identical to switchport mode dynamic desirable.

The end result is that of the three relevant switchport modes (dynamic desirable, dynamic auto, and trunk), all combinations except auto ↔ auto will form a trunk.

There's also the encapsulation to contend with. On platforms with ISL and 802.1Q, ISL is the preferred encapsulation if the trunk is negotiated by DTP. If ISL fails, dot1q is chosen. This negotiation can be overridden with the trunk encapsulation command.

switchport trunk encapsulation dot1q
switchport trunk encapsulation isl
switchport trunk encapsulation negotiate (default)

This command is removed when ISL isn't supported.

Native vlan and trunk allowed list

Once the trunk is up and running, by default, all VLANs are allowed to cross it, and the native vlan is 1. The native vlan's original purpose was to communicate with devices that didn't understand vlan tags and it can be changed to any number in the normal vlan range with switch(config-if)#switchport trunk native vlan [vlan-id]. This opens the door for mismatched native vlans between switches.

What will happen in that case is that Spanning-Tree will put the port in a VLAN-ID Inconsistent state and not forward the port, but if there's no STP running, the broadcast domains will leak into each other, depending on which direction the traffic is flowing. CDP will also complain because control plane traffic for protocols like CDP, VTP and PAgP is carried on the native vlan and CDPv2 is capable of detecting this mismatch.

Finally, you can manually adjust which vlans are allowed over a trunk with switch(config-if)#switchport trunk allowed vlan [options]. For example, to manually prune to trunk links.

Verification of VLANs

The main verification command for a particular port is #show interface [name] switchport

Trunk links are verified with #show interface trunk

The vlan database is verified with #show vlan

#show spanning-tree is also an important command for verifying that STP is allowing the port to forward.

VTP – VLAN Trunking Protocol

A Cisco proprietary layer 2 protocol for the administration of vlans.

The basic idea is to let one or more switches act as servers, with the remaining switches being clients. Changes to the vlan database made on a server are propagated down to the clients automatically.

VTP Domain

The VTP domain is simply a string of characters (1-32 of them) that needs to be the same on all switches that wish to belong to that domain and take part in the sharing of vlan information.

The domain can be assigned either manually, or by simply connecting the switch over a trunk link to another switch that already has a domain name configured. I.e. a null value will be replaced by the domain name used by the other switch.

switch(config)#vtp domain [name]

VTP Modes

Server: controls the creation/modification/deletion of vlans. Can be one or several – there is no hard limit.

Client: receives VTP information from the server, but does not create or modify vlans locally. The only exception is if a client has a higher revision number than the server, in which case it will propagate its local information throughout the domain.

Transparent: Relays vtp information... transparently, and can still configure vlans locally with no impact on the rest of the domain. Supports the use of the extended range vlans which VTPv1 and v2 do not support in server/client mode.

switch(config)#vtp mode [server|client|transparent]

VTP Advertisements

Multicast to 01-00-0C-CC-CC-CC

There's a revision number in the updates which is used to keep track of the most recent version of the vlan database. Each time a change is made on a server, the revision number is increased by 1, and as the information is spread throughout the domain every switch has its revision number changed to this new value. If a change is made on a server with a lower revision number than the currently highest number, nothing happens and the change is only locally significant.

This system, where a higher revision number unconditionally overrides the entire database is the main issue with VTP. The classic scenario is someone using a switch in a lab, and then inserting it into the production environment where it deletes the vlans from every switch. The person doing this might also think that setting the mode to client will prevent this, but it really doesn't. Resetting the revision number to zero by changing the mode to transparent and then back to server/client, or by changing to a temporary domain name and back, will prevent this from occurring.

Though, the vlan port assignments will, afaik, remain, so reverting this catastrophic change is simply a matter of adding the vlans again on the real vtp server.

VTP has 3 message types, responsible for transmitting the vlan information.

Advertisement Request
Summary Advertisement
Subset Advertisement

An advertisement Request is sent by clients to servers when the switch resets, the domain name changes or if the switch has received a vtp summary advertisement frame with a higher revision number than its own. The reason that it must request new information if it reboots is that vlan information isn't stored in vlan.dat on clients like it is on servers and transparent switches.

The summary advertisement is sent by the server every 5 minutes or when receiving a request. It contains things like vtp version, domain name, md5 digest, timestamp, and things of that nature, but also indicates the number of subset advertisements to follow.

The subset advertisements contain the actual VLAN information like vlan number and name.

VTP Password

A case sensitive string of 1 to 32 characters used to generate an md5 hash.

switch(config)#vtp password [string]

Must match on switches wishing to belong to the same domain, and can only be configured on servers and clients.

VTP Versions

Version 1 is the default, version 2 has some modifications, and version 3 is new and outside the scope of the exam. There is (or was) a well known inconsistency in the documentation for VTPv2. Read more: http://blog.ipexpert.com/2010/04/07/old-ccie-myths-vtp/

I probably want to expand this section a bit in the future.

VTP Pruning

Pruning is the process of removing vlans from trunks connecting to switches that do not have these vlans assigned to any access port. This results in less broadcasts and unknown traffic being sent to these switches.

You could accomplish the same thing by manually adjusting the vlan trunk allowed list, but that would likely become administratively difficult.

VTP pruning is enabled on the server with switch(config)#vtp pruning

VTP pruning has something called the eligibility list, which allows you to exclude particular vlans from being pruned, i.e. they will be allowed on all trunk ports, even if they are not used on ports on those switches.

Misc features

You can change the VTP update IP with switch(config)#vtp interface [name] only
This is an ip address shown in the output of #show vtp status belonging to the device that was responsible for the last update to the database.

You can make VTP store vlan and vtp information in a different file than vlan.dat with switch(config)#vtp file [filename]

This copies the information from vlan.dat into the new file, and from that point on, all changes are stored in the new file.

You change the version with switch(config)#vtp version [1|2|3]

Verification

The main verification command is #show vtp status
The configured password can be seen with #show vtp password

There's also debugging capabilities under #debug sw-vlan vtp [options]

fredrikjj · January 2014

I've been reading some blog posts by Petr Lapukhov on how to study (http://blog.ine.com/2009/03/22/how-to-study/). He's big into "active reading", writing essays, spaced repetitions, and things like that. I must say that it makes a ton of sense, and I've been working on a post on STP in the last few days where I've basically tried to condense what I've picked up from various textbooks and online sources into one essay. It's not exactly original and it takes a ton of time, but the fact that I'm going to post it here makes me double and triple check things, and not just assume. I've come to the conclusion that STP is pretty easy after all until you try to predict its behavior during reconvergence. Then it gets a bit tricky, but I that think I pretty much got it.

joetest · January 2014

Go ahead.. I'll read it for sure.. one more source is always good

oh.. and his active learning.. I do that too actually using Anki! It's like flash cards, with a schedule dependant on how you rate the question you wrote

Friendly, intelligent flash cards.
Remembering things just became much easier.

fredrikjj · January 2014

Yes, Flash Cards seem like a really good idea. Especially if you create them yourself, forcing you to evaluate what's important to know, and what's not.

PVST+

This is the default flavor of STP on a Cisco switch and is mostly relevant as a learning tool since the convergence time is way too high for use in modern networks. It's called Per Vlan Spanning Tree because a single instance runs for each vlan (up to 128 instances), allowing you create separate logical topologies for each vlan. The plus sign at the end refers to the fact that this is an improved version of PVST which had the limitation of only supporting ISL trunks. That in turn was an enhanced version of the IEEE 802.1D standard which only supports a single instance for all vlans.

The goal of STP is to make a set of switches to agree on a logical loop free topology in a physical topology with redundant paths. This is accomplished by letting certain ports forward traffic while keeping others in a blocked state. To make these independent elements agree on the topology, special messages, called BPDUs (Bridge Protocol Data Unit) are sent across the network. The information contained in these messages, in conjunction with a set of rules for how to interpret them, allows each switch to know its role in the tree.

The first step in achieving this loop free topology is to elect one of the switch as the Root Bridge, which is then used a reference point for working out the rest of the topology.

When initialized, all switches assume that they are the Root Bridge until they receive a BPDU with a lower Bridge ID. If they do, they stop sending BPDUs of their own. The priority value, vlan-id and the MAC address of the switch make up the Bridge ID. Of these, only the priority value can be modified by the user, and because the priority is placed in the highest 4 bits of this 8 byte field, only increments of 4096 can be used. That the vlan-id is present in the field means that only a single MAC address has to be allocated for all instances of PVST on the switch (the so called extended system ID feature).

There are a few things to keep in mind when it comes to the Root Bridge election. With the default values, all switches will have the same priority and vlan-id. making the MAC address the tie breaker. This is obviously not desirable because this essentially leaves the placement of the Root to chance or, perhaps even worse, the oldest switch becomes to Root if the vendor has assigned the MAC addresses in chronological order.

Secondly, because the placement of the Root influences the traffic flows, it's essentially that the Root is placed in a central location in the domain. For example, if the Root is placed in the access layer, traffic will still reach the core, but it may take suboptimal paths. So, what you want to do is to manually assign the lowest priority to the switch that you think should be the Root, and assign the second lowest value to the switch that is most suitable to take over the role in case there is a failure.

Once this reference point is agreed upon by all switches in the domain, only the Root sends BPDUs. Other switches simply update certain fields before sending the BPDU along. For our next step, the assignment of Root Ports, we're primarily interested in the Root Path Cost field. Similar to OSPF, STP has a cost value associated with each port, based on the bandwidth. If you want to override the default values, you could change the bandwidth, but that makes little sense when there's a specific STP command for changing the cost.

When the Root Bridge generates a BPDU, the Root Path Cost is 0. As each switch receives this BPDU, it updates the Root Path Cost with the cost associated with the port where they received it, in a cumulative fashion. Before there are any blocked ports in the topology, switches will receive these BPDUs on multiple ports, allowing them to determine which port has the BPDU with the lowest total cost to the Root Bridge. This lowest cost port becomes the Root Port (RP). The RP is an upstream port pointing towards the Root Bridge, and forwards traffic.

There must be a single RP on each switch (for each vlan) so if there's a tie in the cost, two other criterion are used to come to a final decision: lowest sender Bridge ID and lowest sender Port ID. I've already covered the Bridge ID, and the Port ID is only used when multiple links are connected between the same two switches (because the Path Cost and Bridge ID will be identical). It's comprised of a priority and a port number in a 2 byte field. The port number can't be changed, but because the priority is in the higher order bits, a lower priority will always defeat a lower port number. It's important to note that it's the sender (of the BPDU) Bridge ID and sender Port ID that are relevant, not the values on the downstream switch where the actual RP election happens.

The next step is to elect the Designated Ports. I say next because this entire process is usually presented as if it actually happens in a particular order. If that is actually true or not, I have no idea, and I don't think the material is particularly clear on that matter. I have a feeling that the entire port role process is more an iterative thing where switches gradually come to an agreement on which ports should have which roles, rather than a step by step process where certain roles are selected before others.

Anyway, the Designated Port (DP). On each link between two devices (or segment if you are using a hub still), one of the switches is elected the Designated Bridge, which in turn has the Designated Port. It's responsible for forwarding traffic from that segment. You could say that it's the downstream facing equivalent to the RP. It's elected the same way the RP is; the cost, bridge id or port id, in that order. All ports on the Root Bridge are DPs by definition since they are not RPs or Alternate (aka Blocking).

Actually assigning the DPs are fairly easy because we only use full duplex links between switches nowadays and wherever there's a RP, the other side of the link is a DP. The only links where you have to put some thought into the selection is when there are no Root Ports on the link. In that case, you select the DP based on the cost, and the other tie breakers, and the remaining port will be given the 'Alternate' role. The Alternate port is put into the Blocking state. The alternate is often called the 'blocking' or 'blocked' port, but blocking is a port state, not a port role. So, strictly speaking, it's an alternate port, or non-designated.

That concludes the process of going from a newly initialized Spanning Tree topology to it reaching a stable state.

So far I've not gone into detail on the port states, but the port states' function is to block traffic (except BPDUs) during convergence of the spanning tree, and to learn station locations (fancy words for mapping an end nodes' MAC address to a port). The role STP determines that a port should have determines what states it should go through. The states are:

1. Disabled – only applies to a port that's administratively shut down or has failed in some manner. It's not a part of the topology.

2. Blocking – a port assumes this state when it first comes up. If it's determined that the port should be an Alternate port, it never leaves this state. It obviously blocks traffic, but it does receive BPDUs. The fact that it receives BPDUs is important, but I'll get back to that later.

3. Listening – if a port receives bpdus indicating that it should be a Root or Designated Port, it enters Listening. Note that the port 'role' is already Root/Designated despite the fact that it's not able to forward traffic in this state. The port can both send and receive BPDUs. To realize why a Listening port must send BPDUs, imagine a link between two switches where one port is Designated, and the other one Alternate. An election was held there, with the conclusion that one of the switches offered a worse BPDU than the other. The one sending that inferior BPDU becomes the Alternate port and goes into the blocking state, effectively stopping the propagation of the inferior BPDU. The Designated port becomes Listening, and continues to send it's superior BPDU to the Blocking port on the other side. This works as a keepalive. If the BPDUs from the DP were to stop, the Alternate port assumes that the other switch's path to the root is gone, and it must transition out of its blocking state.

4. Learning. In addition to the behaviour in Listening, the port can now also dynamically learn MAC addresses. I imagine that the purpose of this is to lessen the impact of unknown unicasts by allowing switches to 'prepare' in a sense, before the actual traffic starts flowing. Why Listening and Learning are broken up into two states isn't entirely clear to me. As I see it, they could just as well be one single state since no port will stop at either of these steps. The timers for them are not even separated.

5. Forwarding. This is the normal operational state where the port can forward frames that are not BPDUs.

This process is governed by a series of timers, three of which are considered within the scope of the CCNP exam:

1. Hello time. The time between each BPDU generated by the Root Bridge, with a default of 2 s, and a minimum value of 1.

2. Forward Delay. The time spent in each of the Listening and Learning states. i.e. a Forward Delay of 15 seconds (the default) will mean a total of 30 seconds spent in LST and LRN.

3. Max Age. Each port stores its best BPDU for this amount of time (technically, the time is reduced by the message age) before discarding it and assuming that there is an indirect topology change somewhere in the network that the switch needs to react to. You can think of this as the hold time, or dead interval from a routing protocol. Default is 20 seconds which means that reconvergence can be painfully slow if the Max Age actually has to time out.

Sources are very clear on the fact that you shouldn't mess with these timers since they are derived from the “diameter” of the network. The diameter is basically the longest chain of switches that you can find in your topology, and the default timers are based on a diameter of 7. So, if you run a smaller topology, you could configure a lower diameter, which in turn would decrease the timers to suitable values. Note that with one exception (generation of TCNs uses the local Hello time), the timers are only used if configured on the Root Bridge, which then instructs the other switches of the proper values through the BPDU. My hunch is that with modern hardware, you could just crank all these timers to their minimums (1, 6, 4 I think, respectively), and nothing bad would happen, but what do I know.

We now have a stable loop free topology, but what happens if a something changes? i.e, a link failure.

There are really two separate events that you need to take into consideration here. There's the actual reconvergence of the logical topology in response to the new physical topology, but also the propagation of a new type of BPDU; the TCN BPDU, or just TCN. Every time a port transitions into the Forwarding state, or when a port in Forwarding or Learning moves into Blocking, this Topology Change Notification is generated. It's generated by the switch where the change happens, but doesn't contain any information about the change. In fact, it barely contains any information at all besides a field indicating that it's a TCN.

The TCN is sent out the Root Port every hello interval until the switch receives an Ack from the upstream switch. That switch behaves the same way, and sooner or later the TCN reaches the Root Bridge. The Root then sets the TCN Flag in it's (configuration) BPDU. When a BPDU with the TCN Flag set reaches the non-root switches, they lower their MAC table aging timer from the configured value (default: 300 seconds) to whatever the forward delay is. This flag is set for Forward Delay + Max Age seconds.

The purpose of this is to age out dynamically learned MAC addresses from of the CAM tables of the switches to prevent traffic from potentially being block holed for up to 5 minutes, or however long the aging time is. Initially I had a hard time understanding why that would prevent anything because aren't switches supposed to just learn MAC addresses automatically? Well, the key to understanding why this is a problem is to consider how the flooding of broadcast and unknown unicasts work. It's flooded to all ports except the incoming port. This means that if traffic enters a transit switch that's a dead end due to a link failure, it will not be able to flood back to where it came from. It's trapped.

This reduced aging timer also presents problem; it could lead to an excessive amount of flooding. Especially if the traffic is destined for a host that doesn't send frames back into the switch very often. For example, a file server that only receives traffic. Unless that station updates its CAM entry at least every 15 seconds, it will age out and all traffic destined for it will be flooded until the file server sends a frame. That example was brought up in an old Cisco document and I have no idea how valid it is in a modern storage solution.

Now, the problem with the TCN is that it's a very crude way of handling this issue because it doesn't discriminate between actual topology changes that would require a reduced aging time, and changes to edge devices. That an edge device goes up or down is more or less irrelevant in the overall topology, but it generates a TCN nonetheless. As you can imagine, with a lot of end users, the switch could potentially be in a constant state of reduced aging time, i.e. the TCN flag never gets removed - just refreshed by a constant influx of TCNs.

The solution to this is to enable PortFast on edge ports as such ports do not generate a TCN when they go up or down.

As for the actual topology changes, we need to differentiate between a “direct” and an “indirect” change. From a particular switch's perspective, a direct change would be if the link failure can be detected due to a loss of layer 1 keepalives. With a direct failure we don't have to wait for the Max Age to reach zero before reacting to the event; the switch removes the information from that port and STP begins to reconverge.

Indirect would be if there is some other device between the switch and the failure which means that we have to rely on the Max Age to time out before we can react to an inferior BPDU.

It's important to note that there's a difference between receiving a superior BPDU and receiving an inferior one. If we due to some change receive a BPDU with better parameters (i.e. a lower Brigde ID) the old one is immediately flushed and the port starts going through Listening and Learning and becomes the new Root Port. It's when receiving an inferior message that the Max Age acts as a dead interval. But, in a stable topology, a superior BPDU can only be received if there is a configuration change, or if someone adds a new switch with a lower Bridge ID. It's not associated with a link failure.

(This section draws heavily from Lapukhovs article on STP and RSTP convergence)

A direct link failure is handled differently by the local switch depending on what kind of port detects the failure:

-If it's a blocked port the switches does nothing except expiring the BPDU from the port.
-If it's a designated port, the switches does nothing, but it could force a downstream switch to elect a new Root Port
-If the RP fails, the BPDU information stored in that port is immediately flushed and a new Root Port is elected.
-If there are no new Root Ports to elect, the switch elects itself as the Root Bridge, and starts announcing inferior (to the rest of the topology) BPDUs.

This is pretty straightforward, and they key point is that since the failure is directly detected, Max Age can be ignored. In general the purpose of a hold/dead timer is to not reconverge immediately if a single keepalive is lost in transit. Instead, a series of keepalives in a row has to be lost, greatly increasing the likelihood that the failure is real and not just a flapping interface or some kind of temporary packet loss. However, if the directly connected physical link fails, it is safe to ignore the hold time. IGPs like OSPF operate in a similar fashion.

Now, and indirect failure in STP is when the switch stops receiving BPDUs from an upstream device on any of its ports without the switch being able to detect a downed interface on that port. Upstream is important because if you think about how BPDUs are propagated throughout the topology, a failure in a downstream device will not affect the upstream device (besides the TCN stuff of course) unless it's a direct failure of a link.

Reconvergence becomes especially slow in this case because we could have to wait for (Max Age – Message Age) + 2xForward Delay, instead of just 2xForward Delay. This would be close to 50 seconds with default timers.

Lapukhov brings up two scenarios:

1. The upstream device has two connections to the root and one of them fails, electing the other as the new root port.

2. The upstream device's link to the root fails, and it has no alternate port.

In the first scenario, BPDUs from the Root Bridge will continue to flow as the new RP transitions through Listening and Learning (remember, those states allow ports to send BPDUs, but not forward traffic). Unless this new port has a new cost that makes the downstream switch change its root port, nothing happens beyond the change of root port on the upstream device.

The second one is more interesting, and where the Max Age timer comes into the picture.

Topology:

A

D
| |
| |
C

x-B

(the formatting is broken, but the second link should be between D and B obviously)

A is the root.

If the link between A and C goes down, C will declare itself the Root Bridge and start sending inferio BPDUs because it no longer has any BPDUs from the Root Bridge stored on any of its ports (because B is blocking on the C—B link and blocking ports only receive BPDUs).

B will ignore the inferior BPDUs from C until the Max Age expires. When it expires, B will transition its blocking port to designated and make it go through listening and learning. C will start receiving BPDUs from A again, and stop thinking of itself as the Root Bridge. After roughly 50 seconds, C can rejoin the topology as a downstream switch of B.

There's also the possibility of the root itself going down. In that case, C and D will detect this directly and immediately flush the BPDUs from their root ports, but B will have to wait for the Max Age to expire. Once B's Max Age expires on both its ports, it can accept inferior (inferior to the original root) information from C and D and a new Root Bridge election is held.

STP was invented at a time when a 50 second convergence time wasn't such a big deal, but as time went by this became more and more unacceptable. In order to fix this, a series of enhancements were developed which I will cover in my next post.

joetest · January 2014

I havent read it thoroughly, but it looks nice!

Few buts: seems you're mixin 802.1D with 802.1W(Rapid) a bit. I.e. you write about Alternate in PVST+?

And you write only the Root Bridge sends BPDUs, yet you mention TCN BPDUs? In my mind I have TCN BPDUs as Boomerangs.. It's sent to the Root Bridge by the switch which got some topology change(like you wrote) and the Root Bridge sends back a new update with the change to the multicast group

awesome review

fredrikjj · January 2014

joetest wrote: »

Few buts: seems you're mixin 802.1D with 802.1W(Rapid) a bit. I.e. you write about Alternate in PVST+?

The port role that isn't root port or designated port is called alternate, at least in cisco's implementation. Its state is blocking.

And you write only the Root Bridge sends BPDUs, yet you mention TCN BPDUs? In my mind I have TCN BPDUs as Boomerangs.. It's sent to the Root Bridge by the switch which got some topology change(like you wrote) and the Root Bridge sends back a new update with the change to the multicast group

1.Configuration BPDU without TCN flags (normal bpdu)
2.Configuration BPDU with TCN bit set
3.Configuration BPDU with TCN-Ack bit set
4.TCN BPDU

Only the root sends the first two. Any switch can send a tcn bpdu, and the upstream device will send a config bpdu with the tcn ack bit set, i.e. the root isn't acking to every single switch - it's handled on the local link. The ack received from the upstream switch on the local link tells the downstream device to stop sending the tcn. Once the tcn reaches the root, the root sets the tcn flag (bit 1) in the tcn field in the configuration bpdu for a period of max age + forward delay. The thing is, it's much simpler to just refer to these messages as "bpdu" and "tcn". I guess it becomes a little bit confusing because technically then a non-root bridge can send a configuration bpdu as it's used as an ack mechanism for the tcn bpdu.

joetest · January 2014

nice clarification!

fredrikjj · January 2014

STP Enhancements

BackboneFast

BackboneFast (BF) is designed to solve the slow convergence after an indirect link failure issue that I covered in the previous STP post. A switch that loses its RP, and has no alternate upstream port, has to wait for a downstream switch to unblock its alternate port before receiving a new BPDU from the Root Bridge. That normally takes (max age - message age) + 2xforward delay.

BF solves this by using a new protocol called Root Link Query (RLQ) that will attempt to detect the status of the root bridge whenever an inferior BPDU is received.

When a switch receives an inferior BPDU, it will send an RLQ request out all non-designated ports, except for the port where the inferior BPDU was received. Upstream switches will either relay this request upstream, or respond with an RLQ response if they are a root bridge. The RLQ request contains the Bridge ID that belongs that what the originator of the request thinks is the root. If that matches with the Bridge ID of the root that is responding, the RLQ reponse is positive. If they don't match, the RLQ response is negative.

If at least one response is positive, the switch knows that its path to the root bridge is still available, and it can move its blocked port to listening before the max age has expired. If all responses are negative it has no path to the switch it thought was the root, and it declares itself the root bridge. The max age on the blocked port is expired to speed up the new root bridge election in the topology.

You would need to enable backbonefast on all switches in the topology for it to work properly

UplinkFast

This enhancement is designed to speed up convergence in cases where a switch has two connections upstream, and doesn't act as transit for traffic from other switches. The fact that the switch isn't a transit switch means that (temporary) loops do not form when the active port changes from one to the other without going through the listening and learning states. In the traditional hierarchical design, this would be applied to the access layer.

There are two moving parts:

1. When the primary link fails, the backup is immediately activated, bypassing LST & LRN.
2. "Dummy" multicast frames are sent out the new port with a source address of all currently known MAC addresses. This is important because it ensures that the distribution layer knows the new location of these stations right away.

When you enable uplinkfast, the switch is given a higher than default priority and the cost of all its ports is increased to a high value (+3000). This makes it less likely that the switch will act as a transit switch. You also need to make and educated decision with regards to where you deploy it.

A feature called Flex Link independently implements something similar to this, with the benefit that you could disable spanning tree, and no TCN would be sent when one of the ports fail.

Uplinkfast is enabled globally.

PortFast

I discussed the TCN suppressing feature of portfast in the previous post so I won't go into that here, but the more well known feature is that it makes a port bypass listening and learning when it's initialized. The port still sends BPDUs, and the feature is disabled if a BPDU is received. Meaning, it can only be used (and stay enabled) on end hosts, or anything that doesn't generate BPDUs.

You can enable portfast both globally and on the interface level. If enabled globally it's not enabled on trunks. If a trunk needs portfast - perhaps to a server - you need to activate it on the interface, not forgetting the extra trunk keyword.

BPDU Guard

This feature puts the port into errdisable if a bpdu is received on the port. It's often combined with portfast since the assumption is that a portfast enabled port will only receive a bpdu if there is some kind of cabling error or malicious behaviour by a user. It can be configured on the interface level and globally under the portfast command. spanning-tree portfast bpduguard default will enable bpduguard on ports where portfast is enabled, and if portfast is disabled, so is bpduguard.

By default, you need to manually bring the port back up with shut/no shut for it to recover from the errdisable state. However, the errdisable feature itself has options for automatic recovery if that is desired. You're able to set a global recovery time for all features, and select which features should recovery automatically, and which shouldn't.

BPDU Filter

This feature has two separate uses:

1. If enabled on the interface level, it can effectively disable spanning tree by making the port no longer send or receive bpdus (while still forwarding all other traffic). This is the only way to disable STP on a per port basis. Your other option is to disable STP for the entire vlan with the no spanning-tree vlan command in global config.

2. Under global config you can enable the portfast bpdufilter default feature. What this does is enable bpdufilter whenever portfast is on. If a bpdu is recieved, portfast is disabled, and so is the filter. The purpose of this is to stop the switch from sending bpdus down to end hosts, though a small number of them will be sent as the port is initialized.

On an access port you would enable:

portfast: to stop tcn generation and to make the port jump straight to the forwarding state.
portfast bpdufilter default: to stop portfast ports from sending bpdus to the end host.
portfast bpduguard default: to disable the portfast ports if bpdus are received due to a cabling error, or some kind of attack on STP.

But it's my understanding that you wouldn't run bpdufilter with bpduguard because you would no longer be protected against two switch ports being connected. Since neither port sends bpdus, bpduguard wouldn't be able to detect the loop. Is that right? Or, do the bpdus that are sent each time a (global) bpdufilter port is enabled have the purpose of triggering bpduguard even if you run portfast bpdufilter+guard default?

Root Guard

A root guard enabled port gets temporarily disabled ("root inconsistent") if it attempts to transition its role to root port. You would enable it on the designated ports of your core and/or distribution switches. If an access layer switch becomes the root bridge, the upstream switches disable their ports rather than agreeing with this new topology. Once the superior bpdus stop, the port will recover automatically.

This feature didn't really make sense to me at first because the issue of a rogue root bridge is already solved by bpduguard. If you run it on your distribution switches, all you are really protecting yourself against is someone really messing up and configuring an access layer switch as root bridge, which seems really unlikely. However, the configuration guide made the issue much clearer by demonstrating that this is really a service provider feature for running spanning tree with a customer. In that case, running bpduguard is obviously not possible, but you still want protection against an incorrect root bridge being elected.

Loop Guard and Unidirectional Link Detection

If a blocked port stops receiving bpdus and the port is still up, the switch will assume that it's safe to bring it to the forwarding state. Now, imagine a link where one side has a designated port and the other side is alternate (blocking). BPDUs will flow out from the designated port and continuously refresh the alternate port's max age timer, but if the link becomes unidirectional due to some fault, BPDUs will stop flowing across the link and the max age will expire. The alternate port will become designated and transition through listening and learning to forwarding.

However, since link is still operational in the other direction a loop will form, and there's no way for STP to detect it. Loop Guard and Unidirectional Link Detection are both designed to solve this problem.

Loop Guard solves it by keeping track of all non-designated ports. If such a port stops receiving BPDUs and attempts to become a designated port, loop guard puts it into the "loop inconsistent" state which doesn't forward traffic.

UDLD isn't a STP feature and can be used independently of STP to detect unidirectional links. Instead of simply reacting to the symptoms like loop guard, UDLD implements a new layer 2 message protocol where each switch adds the information received from the neighbor to its own messages. Identifying its own information in a reply tells the switch that the link is bidirectional.

The UDLD frames are sent every 15 seconds by default and the hold time is 45 seconds. This is tuned to be slightly shorter than the default max age + 2xforward delay which means that UDLD can intervene before STP starts forwarding through the unidirectional link.

It can be enabled globally or in the interface level, and both sides of the link should be running the protocol. If enabled globally, it's only enabled on fiber ports because that's where unidirectional links are more likely to occur. If you do want to run it on a copper inteface, it must be enabled on the interface level.

UDLD has two modes: normal and aggressive. In mormal mode simply marks the port as having an "undetermined state" if detects that it is unidirectional, but takes no further action. In aggressive mode the port is put into errdisable if it is unidirectional.

fredrikjj · January 2014

Rapid Spanning Tree Protocol (802.1w)

Two flavors:

Rapid PVST - One RSTP instance per vlan
Multiple STP - Allowing you to map particular vlans to particular instances

This post covers the first one.

First, let me just say that the textbooks for the ccnp switch exam cover RSTP very briefly so don't expect this to be more than it is.

RSTP maintains backwards compatibility with 802.1D by using a BPDU that's largely the same. The new functionality simply use the previously unused bits in the flag field. It doesn't require much additional configuration beyond enabling it with the spanning-tree mode rapid-pvst command. However, under the hood, it there are some differences.

-the bpdu now has version 2 in the version field
-switches send BPDUs every hello interval, independently of the root bridge.
-port information is aged out in 3xhello time instead of relying on a max age timer.
-message age is no longer a part of the max age time out - in rstp it's simply used as a hop count.

The port states have been reduced from 5 (disabled, blocking, listening, learning, forwarding) to 3 (discarding, learning, forwarding). Discarding replaces the first 3 states in old STP. Cisco still seems to use the blocking state, not discarding, in the show command output.

The port roles operate the same way as before; the role determines what state a port should be in. RSTP uses the same roles we're familiar with from STP: Root, Designated, Alternate. It also defines the Backup role, but it seems to be something that only applies if you have more than two switches on the same LAN segment (i.e. using hubs).

//

The reason that 802.1D is so very slow is that it has to transition through the listening and learning states before a link between two switches can become operational. RSTP replaces this behaviour with something called rstp synchronization. Think of synchronization as a handshake where a series of messages are sent between two switches, resulting in them agreeing upon what roles the two ports should have.

It's important to note that this process only works on "point to point" links. In this context, that is any full duplex link between two switches. A full duplex interface will automatically become a RSTP p2p port, and a half duplex link will become "shared". A shared port could potentially connect to a hub which means that synchronization is disabled and 802.1D behavior is used.

In today's network's this shouldn't be an issue unless there's some problem that makes a port negotiate half instead of full duplex. You can statically configure the port link type if you don't trust that your interfaces will negotiate full duplex.

So how does this synchronization thing work?

1. Two switches are connected over a RSTP p2p link. They both initially assume that they should have the designated port for that link and they indicate this by sending BPDUs with the proposal bit set. The two ports are in discarding or learning at this point.

2. The switch that receives the superior bpdu realizes that the other switch should have the designated port for that link. It blocks all other ports, except edge ports (edge port = portfast)

3. The switch with the inferior bpdu sends a bpdu with the agreement flag to its upstream neighbor, and places that port into the forwarding state as a root port.

4. The upstream neighbor receives the agreement, and moves its designated port into forwarding.

The synchronization is now complete between these particlar switches, however, the downstream switch now potentially has other designated ports in the discarding state. They then in turn synchronize with their downstream neighbors until the process reaches the edge of the network.

//

Topology changes are handled differently in RSTP:

-The TCN BPDU is no longer used and replaced with a config BPDU with the TC flag set
-BPDUs with TC flag are sent out all non-edge ports for 2x hello time from the switch where the topology change happened.
-No Acks are used
-These BPDUs are flooded throughout the topology, with every switch sending them out all non-edge ports for the duration of two times their hello time. The only exception is that they do not send them back to the port where they were received.
-Receiving a BPDU(TC) instructs the switch to flush its mac addresses from all non-edge ports except for the one where it was received
-Edge ports do not generate topology changes, which is the same behavior as 802.1D with portfast.

Note the difference here between the reduced aging time in 802.1D and the immediate removal of CAM entries in RSTP. The advantage seems to be faster reconvergence, but at the expense of more unicast flooding.

Another difference is that only ports entering the forwarding state generate a topology change, not links failing. The logic is that the flushing of mac addresses is required when a new path opens up, not when an existing one fails. If an alternate path does exists, it will transition to forwarding, at which point the TC will be generated. This prevents the unnecessary flushing of cam tables when no alternate path exists.

//

Functionality similar to backbonefast and uplinkfast is integrated into RSTP, but operates differently. It's not something you would have to manually configure anymore.

RSTP's "backbonefast" uses a port's stored root bridge information to immediately send a propsal if it receives an inferior bpdu. It then goes through the synchronization process with that other switch. The result is that the inferior switch receives a new root port and can rejoin the rest of the topology very quickly.

Receiving a superior bpdu also triggers a new synchronization, as you would expect, but that would be the result of introducing a new switch or a reconfiguration, not the indirect link failure that backbonefast typically is associated with.

RSTP is capable of switching over to an alternate root port similarly to a PVST+ switch with uplinkfast enabled. The main difference is that it's no longer required to send those dummy frames where the src are the local cam table entries since the TC BPDU will instruct the rest of the topology to flush their tables.

//

RSTP and 802.1D interoperability is centered around making RSTP behave like 802.1D when required.

802.1D switches drop 802.1w BPDUs which means that 802.1D must initiate the communication. Whenever 802.1w starts sending BPDUs, it activates a migration delay timer during which it ignores the bpdu version on any received bpdus and the port mode (i.e. 802.1w) is locked. If it recieves 802.1D bpdus during the the migration delay timer, the port mode is changed to 802.1D once the timer expires. What was previously a RSTP port can now communicate with a 802.1D switch.

joetest · January 2014

Fantastic summary. It's great to use as a reminder of what I've read a week ago.. keeping it fresh!

Though isn't Multiple STP 802.1s even though it uses rapid convergence? iirc 802.1w is just Rapid STP - It's on top at "flavors"

Keep doing what you're doing and go back and do the same for ROUTE when done with SWITCH. I haven't done ROUTE yet. Hehe :P

fredrikjj · January 2014

joetest wrote: »

Though isn't Multiple STP 802.1s even though it uses rapid convergence?

Indeed it is.

MST - Multiple Spanning Tree Protocol

MST allows you to arbitrarily group vlans to RSTP instances. There are some advantages to this:

- It reduces CPU utilization in an environment with many vlans since you no longer need to run a single instance per vlan.

- It gives you the ability to use different logical topologies for different vlans without having to run a single instance per vlan.

The main disadvantage seems to be that the configuration is a bit more involved than for the other types of STP. Also, and I'm not completely sure about this, but it seems like you really can't add a new vlan to an instance while the switches are in production since the regions will be temporarily mismatched. Depending on which ports are forwarding before the change, they could have to reconverge. I'm assuming that you get around that by assigning every single vlan to instances beforehand.

MST is built on top of the 802.1w standard so the underlying behavior is the same as Rapid-PVST with things like synchronization and the new port roles and states. That was covered in a previous post so the focus of this one will be on (some of) the new MST specific features.

An MST region is a group of switches that agree on instance to vlan mappings, the name of the region and a region revision number. Note that the revision number is manually defined, and not something that's continuously updated like in VTP. In PVST, instances are consistent across switches automatically since there's a 1 to 1 mapping of instance to vlan. In MST that is not necessarily the case since you can map any vlan to any instance. Because of this, there must be some mechanism to detect if the vlans belonging to each instance match across switches.

If switches don't agree on the region parameters, they are in different regions. Multi-region MST is outside the scope of the exam, and from what I've read, not widely deployed. In a sentence, the purpose of using multiple regions is fault isolation, trading possibly higher inter-region convergence time for less of the bad things about RSTP (temporary unicast flooding due to a TC event). A switch can only belong to a single region.

By default, MST runs a single instance, 0, mapped to all vlans. This instance zero cannot be deleted, but you can move vlans to other instances of your creation. This configuration is done in a new MST configuration mode that you enter with the spanning-tree mst configuration command.

Once an instance is created you can modify its behaviour in the same way that you would modify an instance belonging to a single vlan in PVST. One thing you could do would be to create two instances and then run two different root bridges in the distribution layer. The access layer switch will then (with equal cost paths) have a separate root port for each instance.

The reason that you can't delete instance zero is that it is the one handling the exchange of BPDUs between switches. Instead of every instance sending separate BPDUs, instance 0 incorporates certain information from the other instances into its own BPDU. If there's a mismatch in the BPDUs exchanged in instance 0, switches will detect this and know that they are in different regions. This instance 0 is called the Internal Spanning Tree (IST).

Implementation of single-region MST is pretty straightforward:

1. Make sure you have consistent region parameters across all switches:

!
spanning-tree mst configuration
name TECHEXAMS
revision 1
instance 1 vlan 5,10,15,20
instance 2 vlan 25,30,35,40
!

2. Replace the VLAN number with the MST instance number when modifying STP behaviour:

!
spanning-tree mst 1 priority 24576
!

MST can interoperate with the other types of STP while maintaining the MST specific topology within each region. MST does this by collapsing its intra-region topology into a single logical switch that it uses to communicate with devices outside the region. These devices could be RSTP, PVST+, MST, 802.1D, etc. The key point being that as long as a loop free topology is created between the abstraction of the mst region and the external device, the mst region can continue to function internally.

This is also how you would build a multi region MST topology. Each region would run MST specific stuff, but the inter-region topology would be dictated by an overarching CST (Common Spanning Tree - one instance for all vlans).

fredrikjj · January 2014

EtherChannel

EtherChannel (EC) is Cisco's name for link aggregation. It lets you create a bundle of up to 8 physical interfaces, increasing bandwidth and introducing redundancy. The number of supported interfaces varies, but it's my understanding that 8 is the usual number. Redundancy is introduced because if a physical link in the EC fails, traffic is very quickly (milliseconds) redirected to the remaining links.

There are two protocols that help facilitate this:

PAgP: Port Aggregation Protocol (cisco proprietary)
LACP: Link Aggregation Control Protocol (open standard)

You can also run EC without using either of the two control protocols with the 'On' mode, which I will cover later.

You can aggregate both layer 2 and layer 3 ports.

PAgP

When you group interfaces together, PAgP can use two 'modes'; desirable and auto. Desirable will actively initiate an EC with the other side, while auto only listens for PAgP control frames. Note that this is the same functionality and terminology that DTP uses. So, for the channel to form, only the combinations desirable-auto and desirable-desirable will actually work.

For the EC to form, the member interfaces must share certain characteristics. It would be incorrect to say that they must be identical, but things like vlan assignments and the speed do need to match. Because of this, it makes sense to just default the interfaces you want to have in the channel, and then configure them at the same time with the int range command.

Configuration is easy, but can get confusing because the names of the commands used aren't very consistent. For example, to configure ports as member of a PAgP EtherChannel, you would enter:

switch(config)#interface range f0/1 - 3
switch(config-if)#channel-group 1 mode [desirable | auto] [non-silent] //the number used is only locally significant.

(optionally, you would revert the interfaces to their defaults first)

To configure the logical interface, you would use the 'port-channel' command.

switch(config)#interface port-channel 1

The main verification command is switch#show etherchannel summary

To create an L3 port channel, the configuration guide says to first create the logical port channel interface and then assign the physical ports to it with the channel-group command. It doesn't explain the significance of this order of operations and just adding no switchport at any point seems to work fine.

The non-silent keyword in the channel-group command requires some extra attention because the information about it in the books is just plain wrong. Obligatory reading on this Busting myths – PAgP desirable runs in silent mode by default | Daniels networking blog

In short, desirable mode runs in non-silent mode by default which makes the extra non-silent keyword unnecessary. So what is silent and non-silent? The 3560 config guide only says this:

If your switch is connected to a partner that is PAgP-capable, you can configure the switch port for nonsilent operation by using the non-silent keyword. If you do not specify non-silent with the auto or desirable mode, silent mode is assumed.
Use the silent mode when the switch is connected to a device that is not PAgP-capable and seldom, if ever, sends packets. An example of a silent partner is a file server or a packet analyzer that is not generating traffic. In this case, running PAgP on a physical port connected to a silent partner prevents that switch port from ever becoming operational. However, the silent setting allows PAgP to operate, to attach the port to a channel group, and to use the port for transmission.

This leads to a series of questions. Does this mean that two auto-silent ports will form a port channel? Aren’t we told that at least one side needs to run desirable? Is silent/non-silent a completely separate concept that has nothing to do with desirable/auto? Why can we form a PAgP port channel with something that is not PAgP capable? Is silent mode the equivalent of running the On mode?

Different documentation has this explanation:

The silent/non-silent settings affect how ports react to situations that cause unidirectional traffic. When a port is unable to transmit because of a failed physical interface or a broken fiber or cable, the neighbor port can still be left in an operational state. The partner continues to transmit data. But, data are lost because return traffic cannot be received. Spanning-tree loops can also form because of the unidirectional nature of the link.

This seems to indicate that it has nothing to do with whether the devices are PAgP capable per se, but rather how the bundle reacts to one side no longer sending data.

I’m just going to leave it at that because definitive answers would require hardware and research time that I don’t have.

Once the switch has created a port channel, that logical interface will be the one inserted into the CAM table when the switch learns a MAC address on any of the member physical interfaces. This means that a switch can load balance across the links even if frames have the same destination mac address. In fact, the switch can use a number of different load balancing schemes, but it must be configured globally for all port channels. I.e. you can't use source mac for one channel, and source IP for another. Note that if you are using some old platforms, they might only support mac source based load balancing, and can only enter the member interfaces to the CAM table. The main issue with this, beyond severely limiting your load balancing options, is that if you are creating a channel between a switch that only supports this physical port learning, and one that supports logical learning, you must manually configure the logical learner as a physical learner. You must also set the load balance method to mac-src.

Depending on what kind of traffic patterns you have, you might have to adjust the load balanceing method. For example, if the majority of the traffic across the channel is downloads from a particular server, the source mac address will be the same for most of the frames. In that case, using src-mac as the load balancing method will be a pretty bad idea since one link potentially will be satured, and the others unused. Destination IP might be a better method in that case if multiple clients are downloading from the server. This has to be evaluated on a case by case basis, but note that you can't load balance on a frame by frame basis; it must be some function of the mac and ip addresses.

LACP

LACP is the open standard version of PAgP, meaning that it is what you would use when interoperating with devices from other vendors. Generally speaking though, there doesn't seem to be any particular advantage to using PAgP even in an all Cisco environment.

Just like PAgP, LACP has two modes, where one initiates the port channel, and the other just listens. Here, they are called active and passive. As you would expect, at least one of the sides must run active. And again, member port parameters must match. LACP doesn't support half duplex links like PAgP does.

LACP has some additional redundancy features. One is called Hot-Standy.

Hot-standy is the ability to bundle more ports than what the switch platform supports, and placing the excess ports in a standby state. If one of the links fail, one of the standby port will automatically join the port channel. It's obviously useful in a scenario where it's a necessity that the aggregate bandwidth is maintained even after a link failure. There are some knobs that you can turn with this feature, like changing the maximum bundle number, or changing the port priority.

The switch(config)#lacp max-bundle [number] command allows you to restrict the number of links in an LACP port channel. This would allow you to run for example 4 links active, with another 2 in a standby state. If you don't use this command, all 6 would be members of the bundle and actively forwarding traffic. Changing the port priority allows you to influence which ports are active and which are in the standby state. A lower numerical value makes it more likely that the port is actively used.

Configuration of LACP port channels is largely identical to PAgP, except for using different modes.

The On mode

If you configure a port channel with this mode, no negotiation takes place which means that certain misconfigurations that would normally be prevented by the negotiation protocols can go live. You would only use this mode if you have to create a channel with a device that doesn't support LACP or PAgP. Incorrectly configuring this mode can make bad things happen like loops.

reaper81 · January 2014

Nice thread Fredrik.

Seems you are on the right path. Some of the things you write are definitely beyond CCNP level.

fredrikjj · January 2014

My plan was to spend this weekend doing labs on the core* technologies that I've covered so far (vlan,vtp,stp,rstp,mst,etherc). I ran out of labs at around the 4 hour mark and since I didn't really run into any issues I'm just going to move on to the second half which covers auxiliary stuff like security, mls, qos, wireless.

*core in the sense that they impact the L2 topology.

reaper81 wrote:

Nice thread Fredrik.

Seems you are on the right path. Some of the things you write are definitely beyond CCNP level.

Thanks

fredrikjj · January 2014

CCNP SWITCH SECURITY

A brief overview of these topics:

Port Security
DHCP Snooping
IP Source Guard
Dynamic ARP Inspection
Securing Trunk Links
Identity Based Networking Services
Private VLANs
Port ACL and VLAN ACL
Storm Control
Protected Ports
Port Blocking

I found this post quite difficult to write because it was sometimes hard to make sense of these features with the limited coverage of them in the CCNP textbooks. I occasionally turned to the 3560 configuration guide, but that introduced another set of problems; how do I determine what's CCNP level material what isn't? This is a problem in general when studying for the CCNP, but doubly so here.

Port Security

This feature modifies and restricts how MAC addresses are learned on ports.

There are two primary attacks that this is designed to prevent:

CAM Table Overflow
MAC Spoofing

A CAM table overflow attack targets the fact that a switch has a limited amount of memory for storing MAC addresses and that there's no limit on the number of addresses that can be learned on each port. Once the CAM is full, the switch is unable to learn new station locations and the switch will flood all unknown unicasts. The attacker could then potentially intercept this traffic.

MAC spoofing involves an attacker sending a frame to the switch with a source MAC address belonging to a different host. The switch will simply accept this new location and send traffic destined to the original host to this new port. If the other host is active as well, the address would continuously be relearned on the two ports, disrupting traffic.

The default port security setting is null which means that these attacks are possible unless you actively configure the feature. While port security prevents certain attacks, it' doesn't prevent someone from removing a station and replacing it with another one that spoofs the original machine's MAC address. So, if your goal is to definitively prevent unauthorized machines from using the network, you need to look at other features as well.

Port Security has three modes:

Static
Dynamic
Sticky

Static is pretty self-explanatory. It lets you preconfigure MAC addresses that are allowed to use a specific port. Frames with a different MAC than what is configured will trigger a violation.

Dynamic allows the port to learn a configured number of address (default 1), and if any other address attempts to use the port once the limit is reached, a violation is triggered. One thing to note here is that the address will show up as 'static' in the address table despite being learned dynamically.

Dynamically learned secure address:

Vlan Mac Address Type Ports
----

1 aabb.cc00.0c00 STATIC Et2/0

No port security:

Vlan Mac Address Type Ports
----

1 aabb.cc00.0c00 DYNAMIC Et2/0

I don't know if this really means anything, but if you are used to filtering the address table list with the dynamic keyword, it's a thing to keep in mind.

Sticky is the third type, and the one that could be a bit confusing. Essentially, it's a dynamic address that can be stored in the configuration file, but it can also be configured manually. To understand the purpose of this feature, think of how you would actually have to go about configuring an entire network with static addresses. The administration of it would require knowing the mac addresses of the each station, and then going to the specific port where it is connected and manually configuring it.

Sticky allows you to use assignments that are saved in the startup configuration (and thus surviving a switch reboot), but instead of manually entering the address of the machine, the first machine plugged into the port is assigned.

The default response to a port security violation is a shutdown of the port. I mentioned before that breaking the rules applied to a specific port creates a violation. Less known is that it's also a violation if a MAC address that is learned or configured on one port is seen on a different secure port.

Violation types:

Protect
Shutdown
Restrict
Shutdown VLAN

Protect simply drops traffic from unknown sources. For example, a port is allowed to dynamically learn five addresses. The sixth address will not be able to use the port. There's really no notification sent when this happens and for troubleshooting reasons you probably wouldn't use this mode in most cases.

Restrict behaves like Protect, but certain notifications are sent.

Shutdown errdisables the port with everything that entails. The errdisable type is 'psecure-violation' in case you want to configure automatic recovery.

Shutdown VLAN disables the vlan on the port instead of the entire port.

Configuration and verification of port security is done with switch#show port-security [options] and switch(config-if)#switchport port-security [options]. Things to configure besides what I've already covered would things like how addresses age out, and configuring specific addresses for specific vlans.

There are also restrictions on what kind of interfaces are compatible with port security. Notable is that you can use the feature on trunk ports, but not on etherchannel interfaces, which means that if you are running say LACP with a server, port security isn't an option.

DHCP Snooping

This is used to prevent a rogue DHCP server from operating on the LAN. This rogue server could first exhaust the legitimate server's pool of IP addresses, and then begin offering incorrect information to clients with the rogue server's IP as default gateway. This station would then act as a man in the middle, more or less transparently to the legitimate users.

To protect against this, DHCP snooping uses trusted and untrusted ports. Untrusted ports drop inbound DHCP server messages, while trusted ports allow them. Naturally, you would configure the ports where you have legitimate DHCP servers as trusted, and everything else as untrusted.

In addition to discarding illegitimate DHCP traffic, the switch builds a database of the DHCP bindings. A binding would be the combination of the ip address, mac address, vlan, port and lease time for a particular DHCP lease. Normally this type of information would only be stored on the DHCP server, but this is a separate database maintained by the local switch. This database is then used as a reference point for other features so I reckon that there is an assumption that the information stored in it is guaranteed to be correct.

DCHP snooping is first enabled globally, and then on a per vlan basis. By default all ports are untrusted so you have to manually set your trusted ports.

IP Source Guard

Source Guard works in conjunction with DHCP snooping to prevent access on layer 2 ports based on the DHCP snooping binding table. Traffic coming into an untrusted interface with a source IP address other than what is assigned by the dhcp server, and thus stored in the binding table, is discarded. Besides using the snooping binding table, you can also configure static IP source bindings which are required for hosts not using DHCP. You can prevent access based on either IP address only or IP address and MAC address.

IPSG is enabled at the interface level with switch(config-if)#ip verify source [port-security]

The port-security option enables source MAC address verification in addition to IP. For source MAC address filtering to work on DHCP using host, the DHCP server must support option 82, and Port Security must be enabled on the interface. Option 82 is a feature that enables the switch to add more information about the port where the DHCP client is connected to the DHCP messages.

Behind the scenes, IPSG operates by adding a port-ACL to the port that only allows traffic to pass if the source addresses match what is found in the dhcp bindings or static source bindings. If IPSG is enabled on an interface that isn't found in these tables, the port ACL simply drops all traffic.

Dynamic ARP Inspection

ARP Spoofing, aka ARP poisoning, involves a malicious host sending a gratuitous ARP associating someone else's IP address with the MAC address belonging to the attacker. A gratuitous ARP is basically an ARP reply without there first being a broadcast requesting the information.

The result is incorrect information in the arp caches of machines on that LAN. When someone wants to send traffic to the compromised IP, the Ethernet frame will be created with the destination MAC address that belongs to the attacker instead of the actual owner of the IP address in question. Presumably, the attacker can then relay the frame to the legitimate destination, creating a man in the middle type situation.

As you might glean from the name, Dynamic ARP Inspection (DAI) is designed to prevent this attack by verifying that the ARP replies sent on the segment are legitimate. It does this by referencing the DHCP snooping binding table. That table contains the IP, MAC address, vlan, and port of each host that has received its IP from a DHCP server. Since the ARP spoofing attack sends an arp reply or gratuitous arp with a combination of ip and mac address that doesn't align with this table, it is denied. And just like with the IP source guard features, you can manually create bindings between IP and MAC address for hosts that are not receiving their IP from DHCP.

Note however that the static bindings don't use the same table as the source guard feature does. It's a separate configuration using something called an ARP ACL. This seems to be the major issue with these features – the administration must be a complete nightmare unless you use a DHCP server. But then again, you probably are.

DAI uses the concept of trusted and untrusted interfaces, just like DHCP snooping does. However, trust for DAI is configured separately from DCHP snooping trust. Trusted interfaces are not subjected to inspection while untrusted ones are. DAI is enabled on a per vlan basis, and all interfaces are untrusted by default. Trusted interfaces are then explicitly configured. Other notable configuration options are rate limiting the number of arps that can be sent on a port. Because the CPU now performs inspection of arp frames, it becomes susceptible to a denial of service attack based on arp flooding.

Securing Trunk Links

The two attacks against trunks are:

Switch spoofing
Double-tagging aka VLAN hopping

Switch spoofing exploits the fact that Cisco switches run dynamic desirable or auto as the default switch port mode. An attacker connected to a port that has DTP enabled could initiate a trunk and get access to all vlans on the switch. It's quite simple to prevent this by disabling DTP on all relevant ports. This is done either by issuing switchport mode access or switchport nonegotiate. Note that switchport mode trunk by itself does not disable DTP!

Double-tagging exploits the untagged native vlan on 802.1Q trunks with the result that an attacker is able to send frames to a different vlan than what she currently has access to. The attack targets the fact that if a trunk receives a frame that's tagged with the vlan-id of the native vlan, the tag is stripped before being sent across the trunk. Naturally, this requires that the access vlan of the attacker is the same as the native vlan of the trunk; otherwise, the first tag won't be stripped.

The frame has a second vlan tag inside the one belonging to the native vlan, and when it is recevied on the other side, that switch will interpret the frame as belonging to whatever vlan the attacker has chosen.

There are a few different ways of preventing this attack:

Not using the native vlan as a user access vlan.
Pruning the native vlan off trunks
Making the native vlan carry a 802.1Q tag with vlan dot1q tag native

Identity Based Networking Services

The basic premise here is to manage network access based on user identity rather than on some function of the device itself, e.g. the mac or ip address. To accomplish this there are a few moving parts:

802.1X (yes, big X)
EAP
RADIUS

This is a pretty big topic and only a very brief overview is within the scope of the CCNP Switch exam.

802.1X has three primary roles:

Supplicant: This is simply an 802.1X compliant device such as an IP phone, laptop or workstation. This device sends an authentication request to the authenticator.

Authenticator: This is a device that enforces physical access control to the network based on the status of the supplicant. It can be a switch or access point. It relays information between the supplicant and the authentication server.

Authentication server is the database policy software that must support the RADIUS server protocol. It performs authentication of the supplicant.

802.1X is basically the framework that allows EAP and RADIUS to do their thing. So what is EAP?

EAP (Extensible Authentication Protocol) defines the frames sent from the supplicant to the authenticator when the supplicant wants to access or disconnect from the network. The authenticator then translates the contents of these frames into the RADIUS format and sends them to the authentication server. If the RADIUS server decides that the information provided by the supplicant through the EAP frames is legitimate, the authenticator provides access to the network. If not, no access, or limited access is provided.

What probably is within the scope of this exam is enabling certain features on the switch (authenticator). Specifically, these would be enabling “aaa new-model”, configuring the location of the RADIUS server(s), enabling dot1x on the switch and on the interfaces.

Private VLANs

This feature allows the segmentation of a broadcast domain while still using the same ip subnet for all hosts. It's typically considered a service provider feature because in that environment you would want to isolate each customer at L2, but using a single VLAN for each customer would waste IP space, or possible requiring a renumbering of the VLAN if the customer requires more addresses than expected. You also need to use far fewer vlan-ids if you utilize pvlans.

Terminology;

Primary VLAN: A normal VLAN that acts as a container for the isolated and community VLANs.
Isolated VLAN: Ports in the isolated VLAN are isolated from all other ports, and can only communicate with the promiscuous port. Only a single isolated vlan is required per primary vlan since there's isolation even within the isolated vlan.
Community VLAN: Ports in a community vlan can communicate with each other and with the promiscuous port, but not with ports in other communities or in the isolated vlan. Multiple community VLANs can be defined per primary VLAN.

The promiscuous port belongs to the primary VLAN and acts as a gateway between the secondary vlans, and to the rest of the network. Only a single promiscuous port can exist per PVLAN.

The result of a PVLAN configuration is that all ports, except those in the same community, are isolated at L2, and must use the promiscuous port to reach other ports contained in the same primary VLAN. The fact that the promiscuous port must be used to reach other hosts enables you to use that interface as a choke point, restricting access with ACLs, etc.

Configuration of PVLANs is explicitly mentioned on the exam blueprint.

Port ACL

This is an ACL applied inbound on an L2 port. While the port it's applied to is a layer 2 port, it can filter both on MAC address and IP. One IP ACL and one MAC ACL per interface can be applied. Supported ACL types are standard IP, extended IP and extended MAC. A PACL is similar to a RACL (router acl), except that the RACL is applied to a L3 interface.

VLAN ACL

This is a means of apply access control to traffic within a VLAN. It has no sense of direction like other ACLs, and it's globally enabled on a per-vlan basis. This stands in contrast to other ACLs that are applied to specific interfaces. The literature compares the VACL to the Route Map because configuration involves first matching certain packets in an ACL, and then creating a VLAN access-map where that ACL is referenced. If traffic matches the ACL, you choose to either drop or forward the traffic.

Interaction between VACLs and other ACLs deserves some attention. VACLs and RACLs can be used at the same time, and are processed in a particular order. Scenario: you send traffic from VLAN 10 to VLAN 20 and there are VACLs applied to both of these VLANs in addition to inbound and outbound RACLs on the router interfaces that will route the traffic between them. In that case, the order of operations is:

The VACL for VLAN 10.
The inbound RACL on the VLAN 10 interface.
The outbound RACL on the VLAN 20 interface.
The VACL for VLAN 20.

Additionally, on the 4500 and 6500 platforms, you can adjust how VACLs and PACLs interact with the access-group mode interface command. If a PACL is configured on an interface and there's also a VACL configured on the associated VLAN, you could either issue 'prefer port' to only apply the PACL and ignore the VACL, or 'prefer vlan' to do the opposite. The 'merge' keyword would merge the PACL and VACL.

Storm Control

This feature monitors inbound traffic on an interface, and based on a 1 second average various actions can be taken. You can base the action taken on the percentage bandwidth used or on the packets per second rate. The switch is able to distinguish between unicast, multicast and broadcast traffic. You can kind of see where this is going; Storm Control can provide protection against broadcast storms by disabling (or sending SNMP trap) interfaces that reach a certain unrealistic threshold of broadcast traffic in relation to the port's capacity.

For example, this configuration would shutdown the port if broadcast traffic uses 20% of the available bandwidth:

switch(config-if)#storm-control action shutdown
switch(config-if)#storm-control broadcast level 20

Shutdown errdisables the port and you can choose to either manually have to re-enable to port, or have it be reinitialized after some time interval. This would be configured under the errdisable command.

Protected Ports

Protected ports are ports that cannot communicate with other protected ports at layer 2. In a sense, it's a less feature rich implementation of private VLANs available even on lower end switches. Protected ports can only communicate with each other through a layer 3 device, with the exception of certain control plane data that still can be bridged. Protected and non-protected ports communicate normally.

Port Blocking

Normal switch behavior when processing an incoming frame is to look at the CAM table, and if the destination address isn't found, flood the frame out all other interfaces in that broadcast domain. Sometimes, that behavior is undesirable, and the port blocking feature lets you disable the forwarding of these unknown unicast/muilticast frames.

switch(config-if)#switchport block multicast
switch(config-if)#switchport block unicast

//

Danielh22185 · January 2014

I haven't kept up much with your thread but I am very impressed! I like the idea of sharing the information you have learned. Like they say... you don't really know something unless you can speak intelligently about it!

Good luck to ya!

fredrikjj · January 2014

This post covers multilayer switching based on the 7th chapter in the CCNP Switch Simplified textbook and some Cisco documents on CEF.

Inter VLAN Routing

Some kind of IP routing is required for traffic to pass from one broadcast domain to another. There are three methods that you can use to accomplish this:

Using physical interfaces
Using router subinterfaces
Using switched virtual interfaces

Using a single physical router interface per vlan is simple to implement, but obviously isn't very practical or scalable. I imagine that it was something people did a very long time ago before the features we are used to today existed.

Using router subinterfaces is the famous “router on a stick” where you configure subinterfaces on the router interface facing the switch, and configure a trunk port on the switch connecting to the router. This implementation overcomes the scalability issues with using a single physical interface for every VLAN, but introduces another limitation in that the bottleneck now becomes the bandwidth of that single physical link.

Switched virtual interfaces, or SVIs, move the inter-vlan routing inside the switch itself. An SVI is simply a logical layer 3 interface created on the switch that in many respects operates like a physical interface would. When configured with an IP address, and with IP routing enabled, the switch is able to move traffic from one SVI to another within the same box. Since you are no longer limited by that one oversubscribed uplink to an external router, this is a powerful upgrade from router-on-a-stick in an environment where heavy inter-vlan traffic is required between hosts located within the same physical device.

Multilayer Switching Overview

Multilayer switching (MLS) is the idea that a switch can make data forwarding decisions based on not only the Ethernet header, but also IP header and TCP port numbers. This without any significant performance loss. A pure layer 2 device, or bridge, doesn't have those capabilities.

For MLS to make sense you probably need a basic understanding of the terms control plane and data plane. The control plane is essentially the part of the device that exchanges topology information with other devices through the various protocols we're familiar with, like OSPF or CDP. The most obvious example of 'control plane' would be routing protocol updates and the routing table. The data plane handles the actual forwarding of user traffic, typically based on information derived from the control plane.

For example, the routing table contains control plane information saying that IP packets with a certain destination address should be sent out a certain interface. However, when a packet enters the switch, the routing table (or the 'RIB') itself isn't used as a basis for making forwarding decisions in a multilayer switch. Instead, that information has already been distributed to a data plane function called the FIB, or forwarding information base. By using the FIB, the device can switch the packet between ingress and egress interfaces using specialized ASICs without interrupting other processes.

CEF

The current method for doing this kind of forwarding is called Cisco Express Forwarding. Before CEF, other methods were used.

The slowest of these is called Process Switching. Process refers to the fact that moving the packet from one interface to another and rewriting/modifying the L2/L3 headers is handled by a normal process running on the operating system. Those other processes are not interrupted to handle an incoming packet, and the switching of a packet is scheduled alongside other tasks that the router is interested in executing. The drawback of this is throughput, but it offers flexibility in that packets that can't be handled in hardware (i.e. by dedicated ASICs) for whatever reason often can be process switched “in software”. It also varies from platform to platform which features are supported by the faster switching paths, and which must be process switched.

Depending on the feature, and what throughput needs you have, it is something that you would have to verify before buying a particular piece of kit. For example, a low end router might be able to run code that supports something like Policy Based Routing, but actually forwarding packets through that feature would be very slow due to the switching path it would have to take. On the other hand, PBR on a high-end switch would be forwarded through the more advanced ASICs built into that platform and thus offering orders of magnitude higher throughput.

// A side note: I know that you can get extremely high throughput using a normal x86 server acting as a router as long as it has the necessary horsepower. The question then becomes if it’s just a load of crap that you need dedicated hardware to get high performance; maybe the issue is that the CPUs in the switches are crap? //

The other switching methods fall under what's called “Interrupt Context Switching”. Its main feature is that switching a packet can interrupt any process currently being run by the IOS, i.e. it doesn't have to wait for its turn. Additionally, the information required to move the packet from ingress to egress interface is stored in some kind of route cache instead of being collected from the RIB, ARP cache, and other such tables for every packet.

In older implementations, the route cache entry for a particular session (aka flow) had to be built as a packet was process switched. Subsequent packets could then use the higher performance switching path as long as they maintained the same src-ip, dst-ip, ports numbers, etc. The downside to this was that since synchronization between the route cache and the control plane was based on receiving packets, the route cache had to be gradually aged out. If not, it would eventually have reached its maximum capacity, possibly with a bunch of unused entries. The closer to maximum capacity the route cache got, the faster it aged out entries, and the more packets had to be process switched to rebuild it.

CEF removed that problem by inserting relevant information into its version of a route cache independently of receiving traffic. The two data structures CEF uses to store information are the forwarding information base (FIB) and the adjacency table. The FIB contains reachability information; a packet with destination X must exit interface Y. The adjacency table contains information required to forward the packet, like what MAC addresses the new header should contain.

The FIB is derived from the routing table, and is a table of all prefixes and their associated next hops. Changes to the routing table are synchronized with the FIB on an ongoing basis. Note that an unstable routing table can lead to packets being process switched because the FIB isn't properly synchronized.

The adjacency table contains the source and destination MAC addresses belonging to the egress interfaces and the next hops from the FIB. It's derived from the ARP cache. Remember, when an IP packet is switched between L2 domains, it needs a new source and destination MAC address (at least on Ethernet). With this information being “precalculated” in the adjacency table, the switch doesn't have to look at the arp cache to determine the input for that rewrite and can just automagically slap on the new header with lightning speed.

Basic CEF verification

Verification of the FIB:

switch#show ip cef [options]

The output of the show ip cef command will look something like this:

switch# show ip cef
Prefix Next Hop Interface
1.2.3.4/24 attached eth0/0
drop
receive
5.6.7.8
wildcard

The next hop attribute can assume several different values.

attached

This is an entry that represents a locally connected IP subnet.

drop

Any packet that matches the destination IP address or IP subnet will be dropped.

receive

Packets that match this prefix will be sent to the processor.

resolved

An IP address that has been resolved as the next hop for a particular destination

wildcard

This entry matches any packets that do not match any other entry, and traffic is dropped.

Using various options in the show ip cef command can display more detailed output.

Verification of the adjacency table:

switch#show adjacency [options]

Typical output would be:

switch#show adjacency detail
Protocol Interface Address
IP FastEthernet0/1 150.254.0.6(7)
0 packets, 0 bytes
001319860A200000CCEA7F3A00800
ARP 03:33:44
Epoch: 0

The long string of characters is src and dst mac address (and ethertype) for the new header when a packet is switched out fa0/1.

Glean

The adjacency table can contain other information besides just the MAC and IP next-hop pairing, such as the “glean” entries. A glean entry is used when the router is directly connected to network and doesn't have an ARP entry for an IP address on that network. For example, let's say that a /24 is used on a link between two routers. Two of the IP addresses in the subnet are in use and their MACs are in the ARP cache. Instead of individually installing the remaining addresses in the subnet in the adjacency table, the switch puts the entire network in the glean state. I think of it as a single entry for the “potential” future full adjacencies.

R1 (10.0.0.1/24) directly connect to R2 (10.0.0.2/24). Notice how there's a normal adjacency for the 10.0.0.2 address because R1 has an entry for that address in its ARP cache.

R1#show arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.0.0.2 14 c201.1664.0000 ARPA FastEthernet0/0
Internet 10.0.0.1 - c200.1664.0000 ARPA FastEthernet0/0

R1#show ip cef detail
10.0.0.0/24, version 5, epoch 0, attached, connected
0 packets, 0 bytes
via FastEthernet0/0, 0 dependencies
valid glean adjacency
10.0.0.2/32, version 6, epoch 0, connected, cached adjacency 10.0.0.2
0 packets, 0 bytes
via 10.0.0.2, FastEthernet0/0, 0 dependencies
next hop 10.0.0.2, FastEthernet0/0
valid cached adjacency

Now, if CEF receives a packet destined for say 10.0.0.35, it will reach this glean adjacency which will signal the route processor to send an ARP request for that address.

This is all dependent upon which data link layer you are using. Everything discussed here applies to Ethernet, but glean adjacencies exist with frame relay as well because it's a multiaccess technology. PPP for example doesn't need gleans because only a single device can be reached out an interface, and because of that, it doesn't really need any IP to L2 address resolution.

Punt

Punting is the CEF term for being unable to CEF switch a packet. If a packet is punted it is sent to be process switched. The reasons for this could be for example a packet requiring fragmentation, an unsupported encapsulation or simply because the switch itself is the intended to receive the packet. You can check the statistics on punted packets with #show cef not-cef-switched. This is significant because too many processed packets could overwhelm a router. To protect yourself against this you could configure rate limiters on various types of traffic with #mls rate-limit unicast

CEF Polarization

This is a problem that can occur with CEF where certain links are underutilized in a topology with redundant paths. CEF has a hashing algorithm that, by default, uses the src and dst IP as input for deciding between equal cost links. With the same input IPs, the link selection will always be the same. It's comparable to what that happens if you select an inappropriate load balancing method for EtherChannel. Certain methods exist to alleviate this problem. You can introduce port numbers as an additional input to the hashing algorithm, and there's something called the universal ID that you can configure to make one router's hash output different from another one's, though I don't see how that helps link selection for an individual router. Either way, it's something to keep in mind.

DevilWAH · January 2014

Great posts

many be Techexams need to give you a blog

fredrikjj · January 2014

CCNP Switch High Availability – Chapter 8 in CCNP Switch Simplified

First Hop Redundancy Protocols – HSRP, VRRP, GLBP
ICMP Router Discovery Protocol
Supervisor Redundancy
Stacking
Power Redundancy
Non-Stop Forwarding

First Hop Redundancy

The problem that this is trying to solve is that the default gateway IP address of an end host is not easily changed. If the interface that owns that IP address goes down, the host can no longer communicate outside its own network. Having two or more devices share the default gateway IP address would remove this single point of failure.

If you want a group of devices be able to receive traffic destined for the same IP address, how would you go about that? You would need to 'hack' the ARP process in some way to make only one of them respond to ARP requests for that IP. The first hop redundancy protocols VRRP and HSRP accomplish this by sharing a virtual IP and MAC address pair among multiple devices. Normally, it wouldn't really work* to configure the same IP and MAC address on two different interfaces on the same LAN because they would both respond to ARP requests and the switches would have to constantly relearn the location of the MAC address as traffic was received from both devices. The trick with VRRP/HSRP is that only one of the devices in a group is active and responds to ARP requests while the others assume a passive role until the primary fails.

*It works in a lab environment, if your idea of working is to just get basic connectivity going. As long as the address table of the switch and the ARP caches are stable, one of the devices will just not receive traffic destined for the shared IP. However, that will probably break down pretty quickly when the devices are no longer just passively sitting there on an isolated LAN.

VRRP and HSRP are very similar.

HSRP

HSRP Overview

Cisco proprietary.
Allows the sharing of a virtual gateway address among several physical routers.
The primary forwards all packets destined to the virtual gateway and if it fails, the secondary takes over.

HSRP Version 1

The default version.
Group number can be 0-255.
Communicates by sending messages to 224.0.0.2 using UDP port 1985
The version field is 0.

HSRP Version 2

Uses 224.0.0.102 on UDP port 1985.
Not the same packet format as v1.
Uses a TLV format.
Not interoperable with v1.

HSRP v1 & v2 Comparison

Group numbers: v1 0-255, v2 0-4095, but this doesn't mean that you can run this meany instances.
Improved management and troubleshooting in v2 by including an identifier field in the hello messages, which holds the router interface MAC to identify the source of HSRP active hello messages. In v1 these messages has the virtual mac as source which makes it impossible to identify the source.
Virtual MAC: v1 0000.0C07.ACxx where xx is the group number in hex, v2 0000.0C9F.F000 to .FFFF

HSRP Primary Gateway Election

By adjusting the priority.
Priority values: 1-255 where a higher number is preferred in the election of primary.
Default value: 100
Priority value is carried in the HSRP packet, as is the current state of the router (active or standby)

HSRP Messages

Hello: Exchanged to tell the other router the HSRP state and priority of the local router. Also includes group ID, HSRP timers, version, and authentication info.
Coup: Sent when the current standby router wants to assume the role of primary.
Resign: Sent when the active gateway is about to shut down or when a gateway that has a higher priority sends a Hello or Coup message.

HSRP Preemption

If a standby is configured with a higher priority than the primary, by default, the primary still remains the primary until it fails. That is, the standby router will not actively take over that role. In techno speak that means that HSRP doesn't run 'preemption'.
Preemption can be configured
You can configure a delay telling the router to wait a certain time before preempting.

HSRP States

Disabled & Init: Gateway not yet ready or unable to participate – possibly because the associated interface isn't up.
Listen: A standby router state that monitors hello messages from the active gateway. If the standby doesn't receive hellos within 10 seconds it assumes that the active router is down and takes on that role.
Speak: An exchange between the standby and the active gateway. When this phase is completed, the primary transitions to the active state and the backup transitions to standby.
Standby: A router in this state is in the standby state.
Active: A router in this state is acting as the active forwarder for the group.

HSRP Addressing

Sometimes you don't want to use a single virtual mac address for each HSRP group. For example, because you would have to modify Port Security as new groups gets added. In that case you could use the bia address of the physical interface for all groups (bia = burned in address). You could also manually configure an address for each group if that is desired.

HSRP Authentication

By default, HSRP messages are sent with the plain text password cisco.
This provides very little security – md5 is recommended
MD5 type authentication can be configured directly in the HSRP configuration or via a key chain.
If using authentication, a gateway will reject an HSRP packet if any of the following is true:
The authentication scheme differs on the router and in the incoming packets.
The MD5 digest differs on the router and in the incoming packet.
The text authentication strings differ on the router and in the incoming packet.

HSRP Interface Tracking

The idea is to track the status of an interface on the primary router, and if that fails, decrease the priority of that gateway.
This feature doesn't make sense unless you also configure preemption because otherwise the standby wouldn't be able to take over.
You could use simple link up/down status, or use Enhanced Object Tracking with IP SLAs and all that jazz.

HSRP Load Balancing

If you want to load balance with HSRP you would have to configure different active and standby routers on a per vlan basis.
Technically, you could configure two different virtual gateways within the same subnet and load balance by assigning different default gateways to different hosts. Configuration guide calls this multiple HSRP.
You should align the STP root bridge with the HSRP active router for each VLAN to get optimal traffic paths.
Creating many HSRP groups will increase CPU utilization and control plane traffic due to multiple HSRP message exchanges.
You can configure HSRP slave groups that follow a master group, and don't participate in HSRP elections.
The slave group follows the master and therefore don't need to send periodic hellos, and only need to periodically refresh their virtual MAC in switches.

HSRP Configuration

Most, if not all, configuration is done with switch(config-if)#standby [group-number] [command]
The thing to watch out for is certain commands that are configured for all groups, like the use of the bia mac.

HSRP Debugging

HSRP has its own debug commands for errors, events, packets.

VRRP

VRRP Overview

Open standard
Advertisements to 224.0.0.18 using IP protocol 112.
Virtual MAC of 0000.5E00.01xx where xx is the group number in hex.
Unlike HSRP, VRRP doesn't support using a single burned in address for all groups, or statically configuring a MAC address for a group.
One gateway is elected master and the others backup

VRRP Groups

Supports up to 255 groups, but the actual number depends on:
Router processing capability.
Router memory capability.
Router interface support of multiple MAC addresses.

VRRP Master Router Election

Uses priority values 1-254.
Default value 100.
Higher value more preferred.
If same priority, higher IP becomes the master.

VRRP Preemption

Preemption is on by default.
Can be disabled.

VRRP Load Balancing

Load balancing on a per-vlan basis is possible by configuring different masters for different vlans.
Make sure STP root is aligned.

VRRP Versions

Version 2 is running on Cisco IOS.
Not possible to switch to version 1.
There was a draft for version 3 when this textbook was written.

VRRP Advertisements

Master virtual router sends advertisements to other VRRP routers in the same group.
Communicates priority and state of the master.
Encapsulated in IP packets and sent to 224.0.0.18.
Sent once every second by default.
Backup routers, optionally, can learn the advertisement interval from the master.

VRRP Authentication

Not enabled by default.
Plain text or MD5.
MD5 can be done with or without key chain.

VRRP Configuration

switch(config-if)#vrrp [group-number] [command]
Verification: switch#show vrrp [options]
Debugging: switch#debug vrrp [options]

An advantage to VRRP is that you can reuse one of the member interfaces' IP address as the virtual IP for the group. Other than that, HSRP and VRRP seem to be pretty much identical to the point where it probably doesn't matter which one you use in most cases. Configuration and verification is very similar as well.

GLBP

A problem with HSRP and VRRP is that you can't load balance within a VLAN without assigning different default gateways to different hosts. GLBP is an attempt to solve this problem by using multiple virtual MAC addresses; one for each gateway. As hosts send ARP requests, GLBP assigns different virtual MACs to different hosts.

GLBP Overview

Gateway Load Balancing Protocol is an attempt to solve the load balancing problem in HSRP/VRRP
It can have multiple active routers for the same VLAN using the same default gateway address.
Cisco proprietary.
Uses 224.0.0.102 UDP port 3222.
Elects one active gateway based on the highest priority.
Other routers become backups.

The active gateway (AVG) answers all ARP requests for the virtual router IP address.
The AVG assigns an individual virtual mac address to each member of the GLBP group.
When an ARP request comes in, the AVG responds with the MAC addresses assigned to the other group members according to the configured load balancing scheme (round robin default).
When hosts send traffic to the virtual gateway IP address, they encapsulate the frames with different destination MAC addresses which achieves a form of load balancing.

GLBP Virtual MAC Assignment

A GLBP group can have up to four virtual MAC addresses/virtual forwarders.
Group members request a virtual MAC after discovering the AVG through hello messages and become Active Virtual Forwarders (AVF).
Gateways are assigned MACs sequentially by the AVG.

GLBP Redundancy

A single gateway is elected as the AVG.
One other gateway is the standby.
The rest are placed in listening.
When an AVF fails, the AVG temporarily assigns its MAC to another AVF.
Eventually, this MAC will be aged out, based on two timers: redirect and timeout.
Redirect timer: This is the time which the AVG continues to use the old virtual forwarder MAC in ARP replies. When this timer expires, the AVG stops using the now failed AVF MAC address, but the AVF that got it assigned will continue to forward packets that are sent to that MAC address.
Timeout timer: When this timer expires, the AVF stops forwarding traffic for that MAC address and clients still using it must refresh their ARP cache.
GLBP uses the Hello messages to communicate the current state of these two timers.
Default redirect: 10 minutes. Default timeout: 4 hours.

GLBP Preemption

Enabled by default (or possibly disabled, depending on platform)
Can be disabled.
The default delay is 30 seconds. i.e. how long it takes before a router with higher priority to take over as AVG.

GLBP Weighting

Used to determine the forwarding ratio of the gateways.
Default weight: 100.
Weights can be dynamically adjusted with object tracking. For example, if a particular interface fails, weight for a gateway can be decreased.
A threshold can be set to disable forwarding when the weighting falls below a certain vlaue, and then automatically reenable it when it rises again.

GLBP Load Sharing

Host dependent: A host always receives the same MAC in reply to ARP requests for the virtual router address.
Round-robin: Distributes the virtual MACs evenly across all hosts that request. This is the default.
Weighted: Uses the weighting values to determine the proportion of traffic sent to a particular AVF. A higher weight means more frequent ARP replies using that AVF's MAC.

GLBP Client Cache

Contains information about hosts that are using the GLBP group as default gateway.
Which hosts have been assigned to which AVF.
Enabled with sw(config)#glbp client-cache.

GLBP Authentication

Disabled by default.
Supports plain text and MD5 with or without key chain.

Configuring GLBP

switch(config-if)#glbp [group-number] [commands]

ICMP Router Discovery Protocol

An alternative method to DHCP (or static config) for assigning a gateway to a host.
Uses router advertisements and router solicitation ICMP messages to assign a gateway to a host.
Presumably backported from IPv6.
Routers are able to work together and redirect hosts to better gateways with ICMP redirect messages.
RAs are sent to 255.255.255.255, but can be configured to use multicast 224.0.0.1 instead.
IRDP is enabled on a per interface basis.

Supervisor Redundancy

The supervisor is the part of the switch that runs software and handles the control plane functions. In the bigger chassis based switches, you can buy a second supervisor to make sure that you are not dead in the water if the primary fails.

Cat4500/6500 supports 2 supervisor modules.
When booting, one becomes the active and the other the standby
The standby assumes the active role when the primary fails or crashes, the primary reboots, the admin forces a manual failover or the primary is physically removed.

IOS supports 3 different modes for supervisor redundancy, from older to newer:
Route Processor Redundancy (RPR)
Route Processor Redundancy Plus (RPR+)
Stateful Switchover (SSO)

RPR:

The first supervisor to complete the boot process becomes the active one.
The standby is only partially booted and not all subsystems become operational.
Clock synchronization between the active and the standby occurs every 60 seconds and configuration is synchronized.
When the active fails the standby becomes operational, and all switching modules are reloaded and powered up again, etc. Because the standby isn't fully initialized a failover is disruptive and generally takes 2 to 4 minutes.

RPR+:

Failover is improved to 30-60 seconds.
The redundant supervisor is fully initialized and configured.
When the redundant one boots, it copies the configs from the active sup, and overrides its own configs.
During normal operation, when config changes occur, these are synchronized between active and standby.
Despite being fully initialized, the console on the standby is locked and you cannot enter commands.
When the active sup fails, the standby is able to take over without rebooting the other switch modules.
Traffic is disrupted until the standby takes over.
Static routes survive the failover, but the dynamic routing protocol information is flushed.
FIB, CAM, TCP sessions, etc, do not carry over to the new supervisor.
The two supervisors must be of the same model, version, memory, and run the same version of IOS, otherwise they will revert back to the RPR mode.

Stateful Switchover (SSO):

The preferred method.
SSO has one active and one standby supervisor.
The standby is fully booted and initialized.
Both supervisors must be running the same config so that the standby is always ready to assume control.
Unlike RPR+, SSO synchronizes state information like the FIB and adjacency tables between the supervisors.
All system control and routing protocol execution is transferred within 0 to 3 seconds.

Stacking

You can combine up to 9 3750s into a single unit with a 32 Gbps interconnect.
The stack is managed as a single unit from the master switch.
The master is responsible for updating the CAM and routing tables for the stack.
The master is typically elected within 20 seconds of the stack being initialized.
Any switch in the stack can become the master, but there is an order to the election: highest stack number priority, highest hardware and software priority, the switch with non-default config, longest system uptime, and lowest mac address.
A master election is held when: the stack is rebooted, the master is powered off, the master is removed from the stack, the master fails, when switches are added to the existing stack.
Master election doesn't affect data forwarding for layer 2 because the stack continues to operate with the tables received from the old master.
IP routing tables are flushed when a new master has been elected, but Non-Stop Forwarding can rapidly transition the layer 3 forwarding from the old master to the new one.

Power Redundancy

cat4500 and cat6500 power redundancy:

The power supplies must be identical.
Two modes: combined and redundant.

Combined: Both power supplies are used at the same time, and the total power usage of the chassis may exceed that of one psu. If one psu fails, the remaining one may not be able to power all currently running modules in which case some are powered off.
Redundant: The switch will use both PSUs, but only at max 50% capacity each. When one fails, the remaining can run the switch using 100% of its capacity, if necessary.

cat3750 power redundancy:

Unlike 4500 and 6500, it is supplied by an external unit: the Cisco Redundant Power System (RPS) 2300.
Each RPS can have two PSUs and can supply complete power supply redundancy for two attached 3750s.
Protects against internal power supply failure, failure of an AC circuit, etc.

Non-Stop Forwarding

Works in conjunction with Stateful Switchover to minimize downtime following a supervisor failover.
When a standby supervisor takes over, the FIB is maintained, but the RIB is not.
To ensure that the FIB is updated with new information as quickly as possible, the RIB must be rebuilt.
When NSF is enabled, routing protocols communicate with NSF enabled neighbors to quickly rebuild the RIB after a supervisor failover.
Supported by BGP, OSPF, EIGRP, IS-IS.
NSF is configured on a per routing protocol basis.

fredrikjj · January 2014

DevilWAH wrote: »

Great posts many be Techexams need to give you a blog

I'm just trying to keep myself motivated since I don't find the switch exam nearly as engaging as route was. I've found that I'm able to study longer hours if I'm writing something at the same time, and I really want to finish in 2 months instead of 3 like last time.

Fredrik's CCNP thread

Comments