Cat 6K w SUP720-3B and Policy Based Routing
cisco_trooper
Member Posts: 1,441 ■■■■□□□□□□
in CCNP
Ok. I have Catalyst 6513 outfitted with a SUP720-3B. Monday night I re-routed a significant portion of the end user traffic through the use of policy based routing using a next hop of Y.Y.Y.Y to an HA Pair of ASA 5520s with the remaining traffic continuing to utilizing the default route of X.X.X.X to an ASA 5510. Traffic from the ASA 5520s hits an HA Pair of F5 Big IP LTMs and from there hits the internet over Circuits B and C, 100M and 150M, respectively. Traffic from the ASA 5510 goes straight to the internet over circuit A, a 100M Circuit. Now, my understanding of this platform is that PBR is handled in hardware, but already I have had two instances of performance issues.
In one issue I had a web application that was performing poorly from a terminal server farm that were not included in the PBR. The traffic that was routed through PBR was performing great. Forcing all the traffic headed to the web application to the next hop of Y.Y.Y.Y resolved the issue. Very strange.
Another issue occurred this afternoon when the packets per second reached 380,000pps. At the same time the pps reached its peak the CPU hit 100% and stayed there for roughly five minutes. Now, keep in mind this is a SUP720 and this is NOT a lot of packets for this platform. I have never seen this device with CPU anywhere near this high before.
It doesn't make sense since PBR on the SUP720 is handled in hardware but the last major configuration change was the implementation of the policy based routing and NOW I have issues..
Can someone smarter than me maybe shed some light on the matter?
In one issue I had a web application that was performing poorly from a terminal server farm that were not included in the PBR. The traffic that was routed through PBR was performing great. Forcing all the traffic headed to the web application to the next hop of Y.Y.Y.Y resolved the issue. Very strange.
Another issue occurred this afternoon when the packets per second reached 380,000pps. At the same time the pps reached its peak the CPU hit 100% and stayed there for roughly five minutes. Now, keep in mind this is a SUP720 and this is NOT a lot of packets for this platform. I have never seen this device with CPU anywhere near this high before.
It doesn't make sense since PBR on the SUP720 is handled in hardware but the last major configuration change was the implementation of the policy based routing and NOW I have issues..
Can someone smarter than me maybe shed some light on the matter?
Comments
-
dtlokee Member Posts: 2,378 ■■■■□□□□□□Handled in hardware might be a bit misleading and may not be correct at all sometimes depending on the lincards, what linecards are you using and what DFC modules are they using?The only easy day was yesterday!
-
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Module 3, 4, and 6 are being decommissioned. I can't believe they even had these in a Cat 6K with SUP720. What a waste of a SUP720....Mod Ports Card Type Model Serial No.
---
3 48 48 port 10/100/1000mb EtherModule WS-X6148-GE-TX SAXXXXXXXXX
4 48 48-port 10/100/1000 RJ45 EtherModule WS-X6148A-GE-TX SAXXXXXXXXX
6 48 48-port 10/100/1000 RJ45 EtherModule WS-X6148A-GE-45AF SAXXXXXXXXX
7 2 Supervisor Engine 720 (Active) WS-SUP720-BASE SAXXXXXXXXX
9 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAXXXXXXXXX
10 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAXXXXXXXXX
11 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAXXXXXXXXX
12 24 CEF720 24 port 1000mb SFP WS-X6724-SFP SAXXXXXXXXX
13 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE SAXXXXXXXXX
Mod MAC addresses Hw Fw Sw Status
---
3 000f.34d8.b340 to 000f.34d8.b36f 6.0 7.2(1) 8.7(0.22)BUB Ok
4 001a.6d8a.c550 to 001a.6d8a.c57f 1.7 8.4(1) 8.7(0.22)BUB Ok
6 0023.5e4c.4ac0 to 0023.5e4c.4aef 2.4 8.4(1) 8.7(0.22)BUB Ok
7 0013.c42e.ef58 to 0013.c42e.ef5b 3.3 8.1(3) 12.2(33)SXH7 Ok
9 0013.1a22.e690 to 0013.1a22.e6bf 2.9 12.2(14r)S5 12.2(33)SXH7 Ok
10 0024.14f5.adf0 to 0024.14f5.ae1f 3.0 12.2(18r)S1 12.2(33)SXH7 Ok
11 1cdf.0f9f.d088 to 1cdf.0f9f.d0b7 3.4 12.2(18r)S1 12.2(33)SXH7 Ok
12 5475.d015.b090 to 5475.d015.b0a7 4.3 12.2(18r)S1 12.2(33)SXH7 Ok
13 0018.19e5.e23c to 0018.19e5.e23f 2.4 12.2(14r)S5 12.2(33)SXH7 Ok
Mod Sub-Module Model Serial Hw Status
----
6 IEEE Voice Daughter Card WS-F6K-48-AF SAXXXXXXXXX 2.4 Ok
7 Policy Feature Card 3 WS-F6K-PFC3B SAXXXXXXXXX 2.1 Ok
7 MSFC3 Daughterboard WS-SUP720 SAXXXXXXXXX 2.5 Ok
9 Centralized Forwarding Card WS-F6700-CFC SAXXXXXXXXX 2.1 Ok
10 Centralized Forwarding Card WS-F6700-CFC SAXXXXXXXXX 2.0 Ok
11 Centralized Forwarding Card WS-F6700-CFC SAXXXXXXXXX 4.1 Ok
12 Centralized Forwarding Card WS-F6700-CFC SAXXXXXXXXX 4.1 Ok
13 Centralized Forwarding Card WS-F6700-CFC SAXXXXXXXXX 2.0 Ok
Mod Online Diag Status
----
3 Pass
4 Pass
6 Pass
7 Pass
9 Pass
10 Pass
11 Pass
12 Pass
13 Pass -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Modules 9 through 13 are Centralized Forwarding Cards. No Distributed Forward Cards are in place.
-
dtlokee Member Posts: 2,378 ■■■■□□□□□□One other question, what does your PBR configuration look like, if I remember correctly only some of the route map command set is supported for hardware acceleration. You don't need the DFC cards for hardware support but they help when the backplane is oversubscribed.
On another note, since it is a 6513 is it a E chassis or a classic chassis? (the 6513 E is a "new" product in terms of the 6500 family and you have a really old line card in there!)The only easy day was yesterday! -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□It's the classic chassis. Yeah, the 61XX series aren't even fabric enabled. The whole switch was full of them when I got it. SUP720 without a single fabric enabled module. CRAZY!
The PBR is relatively simple:
route-map PBR_F5_GTM, permit, sequence 10
Match clauses:
ip address (access-lists): PBR_F5_GTM
Set clauses:
ip next-hop Y.Y.Y.Y
Policy routing matches: 2690098 packets, 351378025 bytes
route-map PBR_F5_GTM, permit, sequence 20
Match clauses:
Set clauses:
Policy routing matches: 748939612 packets, 2433433318 bytes -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□I can't wait to get the rest of the old modules out of there. The simple act of having those in the chassis decreases the performance of the whole chassis because it has to maintain the 32G shared bus. The forwarding capacity will go through the roof once those are removed. I'll have to dig up that docoumentation to share. It did a good job of explained the fabric architecture of this platform.
-
dtlokee Member Posts: 2,378 ■■■■□□□□□□You don't need the second clause in the route map, it has an empty set condition so it will punt it to the MSFC anyhow the same as the implicit deny at the end of the route map. In the case of a route map used for PBR a match on a "deny" doesn't drop the packets it simply "denies" them from being policy routed.The Policy Feature Card (PFC) and any Distributed Feature Cards (DFCs) provide hardware support for policy-based routing (PBR) for route-map sequences that use the match ip address, set ip next-hop, and ip default next-hop PBR keywords.
When configuring PBR, follow these guidelines and restrictions:
–The PFC provides hardware support for PBR configured on a tunnel interface.
–The PFC does not provide hardware support for PBR configured with the set ip next-hop keywords if the next hop is a tunnel interface.
–If the MSFC address falls within the range of a PBR ACL, traffic addressed to the MSFC is policy routed in hardware instead of being forwarded to the MSFC. To prevent policy routing of traffic addressed to the MSFC, configure PBR ACLs to deny traffic addressed to the MSFC.
–Any options in Cisco IOS ACLs that provide filtering in a PBR route-map that would cause flows to be sent to the MSFC to be switched in software are ignored. For example, logging is not supported in ACEs in Cisco IOS ACLs that provide filtering in PBR route-maps.
–PBR traffic through switching module ports where PBR is configured is routed in software if the switching module resets. (CSCee92191)
–Any permit route-map sequence with no set statement will cause matching traffic to be processed by the MSFC.
–In Cisco IOS Release 12.2(33)SXF16 and later releases, for efficient use of hardware resources, enter the platform ipv4 pbr optimize tcam command in global configuration mode when configuring multiple PBR sequences (or a single PBR sequence with multiple ACLs) in which more than one PBR ACL contains DENY entries. In earlier releases, we recommend avoiding this type of configuration. (CSCsr45495)
–In Cisco IOS Release 12.2(33)SXH4 and later releases, the BOOTP/DHCP traffic will be dropped unless explicitly permitted. In Cisco IOS Release 12.2(1SXF, BOOTP/DHCP packets are not subjected to a PBR configured in the ingress interfaces and the BOOTP/DHCP packets are forwarded to the BOOTP/DHCP server, although they are not explicitly permitted.
Unless you are doing something odd with the acl I don't think the CPU should be spiking at 100%The only easy day was yesterday! -
chrisone Member Posts: 2,278 ■■■■■■■■■□Correct me if i am wrong but regarding the hardware acceleration with PBR, look at the statement in the PDF that dtlokee put. It seems like your route-map sequence 20 is matching the statement below, since it has no set or match clauses, actually forcing a huge load of traffic to the MSFC software processing. It is as if your route-map says anything in sequence 10 route to path YYYY using hardware and anything else no matter what it is route in MSFC. So you might be pushing extra traffic that normally would use hardware processing to the MSFC. Look and compare the policy routing matches between route-map sequence 10 and 20, look how massive sequence 20 is to 10. Anyways this is what i am leaning towards. I could be wrong, however it seems that might be the problem based on what you described, that everything matching the routemap 10 is working great and everything else is going to ish.
–Any permit route-map sequence with no set statement will cause matching traffic to be processed by the MSFC.
route-map PBR_F5_GTM, permit, sequence 20
Match clauses:
Set clauses:
Policy routing matches: 748939612 packets, 2433433318 bytes
insert: I just thought of this, how about in sequence 20 putting in the set clause to next hop XXXX where you actually want the rest of the traffic going to? That will add a set statement to the route-map sequence 20. Therefore placing traffic matching sequence 20 on hardware acceleration.
What do you think?Certs: CISSP, EnCE, OSCP, CRTP, eCTHPv2, eCPPT, eCIR, LFCS, CEH, SPLK-1002, SC-200, SC-300, AZ-900, AZ-500, VHL:Advanced+
2023 Cert Goals: SC-100, eCPTX -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□I think I am going to have to just remove the permit 20 statement. If I set next hop XXXX then all remaining traffic will go to that address instead of falling through to the routing table. There are quite a few other routes that need to be accounted for other than the default route that heads to XXXX. It's been a while since I've last used route-maps so I need to verify it's operation in the context of PBR. From what I've read so far I should be able to just remove the permit 20 and all remaining traffic will with the routing table to determine next hop.
-
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Or...keep the permit 20.
New ACL that doesn't match any traffic
set the next hop to XXXX
The traffic won't get sent to the next hop in the route-map, there is a match statement, and set statement, so all remaining traffic falls get sent back to the normal forwarding channels - the routing table based on destination. -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Yep, I'm going to just remove permit 20.
Going back to old reliable Routing TCP/IP Volume I, page 736.Again as with access lists, there must be a default action for the route map to take in the
event that a route or packet passes through every statement without a match. An implicit
deny exists at the end of every route map. Routes that pass through a redistribution route
map without a match are not redistributed, and packets that pass through a policy route map
without a match are sent to the normal routing process. -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□For those curious. Removing the permit 20 did indeed fix the issue.
-
chrisone Member Posts: 2,278 ■■■■■■■■■□Nice job! Always eager to learn new things on the 65kCerts: CISSP, EnCE, OSCP, CRTP, eCTHPv2, eCPPT, eCIR, LFCS, CEH, SPLK-1002, SC-200, SC-300, AZ-900, AZ-500, VHL:Advanced+
2023 Cert Goals: SC-100, eCPTX -
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Yep. You can never learn enough about the Cat 6500 platform. I love this platform because there is always something new to learn and the darn thing is a workhorse if properly outfitted. The 6500-E chassis has a Sup-2T. You should check it out because it is a bamf.
-
cisco_trooper Member Posts: 1,441 ■■■■□□□□□□Oh, and thanks dtlokee. I very much appreciate the help.
-
dtlokee Member Posts: 2,378 ■■■■□□□□□□Just wait until you get a datacenter full of Nexus products...The only easy day was yesterday!
-
vinbuck Member Posts: 785 ■■■■□□□□□□If you aren't already aware of it, sh ip cef switching statistics is a very helpful command to see what isn't being handled by the hardware and the processor is having to tackle. It has helped me track down some some complex high CPU utilization issues in 7600 series stuff with SUP and RSP 720sCisco was my first networking love, but my "other" router is a Mikrotik...