Wireless Outage I Caused Today Need Help Understanding How
Danielh22185
Member Posts: 1,195 ■■■■□□□□□□
So as you may have read I caused a boo boo on the network today. Lucky for me EVERYBODY in my company uses wireless all day every day so this looked pretty bad on me .
So here is some back ground:
I've been troubleshooting with Cisco loss of NTP / SNMP traffic when leaving my WLC. Turns out we at least found out why. This management traffic is egressing a port on the controller unexpectedly to a completely different vlan the management subnet originates (comes in under vlan 6 (10.15.6.0/24) and attaches to vlan 350 leaving the controller (port 3)). We actually detected this by doing an ELAM capture on the 6509 and found NTP traffic arriving on the wrong port.
So this got us digging in further.
Here are the interfaces on the WLC:
Interface Name Port Vlan Id IP Address Type Ap Mgr Guest
----
management 1 untagged 10.15.6.5 Static Yes No
redundancy-management 1 untagged 0.0.0.0 Static No No
wireless user vlan 1 1 60 10.15.60.5 Dynamic No No
wireless user vlan 2 1 50 10.15.52.5 Dynamic No No
hotspot wireless 3 untagged 172.19.16.5 Dynamic No No
guest wireless 2 untagged 192.168.250.5 Dynamic No No
So Cisco advised me to make a change on the management interface ^^ seen above to tag it to vlan 6. In doing so I isolated the controller completely and knocked out wireless for the company and my ability to access the box.
Now I have 3 other interfaces on the 6509 that connect to the WLC:
switch#show ip arp 10.15.6.5
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.15.6.5 84 d0d0.fd1f.8600 ARPA Vlan6
2 are access ports for vlan 250 and 350. The other is a trunk port tagging vlan native vlan 6:
switch#sh run int gi1/2/2
Building configuration...
Current configuration : 159 bytes
!
interface GigabitEthernet1/2/2
description Connection to Cisco 5508 WLAN Controller
switchport
switchport trunk native vlan 6
switchport mode trunk
end
So.... Why did everything blow up when I set the WLC port to tag vlan 6 on the management interface? Does this make it an access port? That is the only thing I would think it did. I am still a bit lost what might have happened because there are other subnets as you can see WITH tagging.
Not going to sleep for days on this... (btw I'm still trying to get answers from Cisco too but thought i'd approach this from all vectors).
Edit Cisco's Response:
1.) Why is mgmt. traffic forwarding from the WLC out to gi2/2/6 on the 6509 (vlan 350)
We are suspecting this behavior is due having several untagged interfaces mapped to different physical ports. This issue is already documented on the following ID: CSCvc12594. The bug could be applicable for all management traffic(snmp, NTP, Radius,etc). In order to confirm this issue it is necessary to tag all the interface on the controller.
2.) Why when tagging the management interface on the WLC with vlan 6 (the mgmt. vlan) all connectivity to the WLC was lost.
The connection with the controller was lost because we only tag the interface on the controller side. We need also to tag the interface on the switch side in order to both devices match the same configuration. In this case the controller was sending the traffic tagged to the switch on vlan 6 but the switch was expecting the vlan 6 without tag [native vlan 6]
So here is some back ground:
I've been troubleshooting with Cisco loss of NTP / SNMP traffic when leaving my WLC. Turns out we at least found out why. This management traffic is egressing a port on the controller unexpectedly to a completely different vlan the management subnet originates (comes in under vlan 6 (10.15.6.0/24) and attaches to vlan 350 leaving the controller (port 3)). We actually detected this by doing an ELAM capture on the 6509 and found NTP traffic arriving on the wrong port.
So this got us digging in further.
Here are the interfaces on the WLC:
Interface Name Port Vlan Id IP Address Type Ap Mgr Guest
----
management 1 untagged 10.15.6.5 Static Yes No
redundancy-management 1 untagged 0.0.0.0 Static No No
wireless user vlan 1 1 60 10.15.60.5 Dynamic No No
wireless user vlan 2 1 50 10.15.52.5 Dynamic No No
hotspot wireless 3 untagged 172.19.16.5 Dynamic No No
guest wireless 2 untagged 192.168.250.5 Dynamic No No
So Cisco advised me to make a change on the management interface ^^ seen above to tag it to vlan 6. In doing so I isolated the controller completely and knocked out wireless for the company and my ability to access the box.
Now I have 3 other interfaces on the 6509 that connect to the WLC:
switch#show ip arp 10.15.6.5
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.15.6.5 84 d0d0.fd1f.8600 ARPA Vlan6
2 are access ports for vlan 250 and 350. The other is a trunk port tagging vlan native vlan 6:
switch#sh run int gi1/2/2
Building configuration...
Current configuration : 159 bytes
!
interface GigabitEthernet1/2/2
description Connection to Cisco 5508 WLAN Controller
switchport
switchport trunk native vlan 6
switchport mode trunk
end
So.... Why did everything blow up when I set the WLC port to tag vlan 6 on the management interface? Does this make it an access port? That is the only thing I would think it did. I am still a bit lost what might have happened because there are other subnets as you can see WITH tagging.
Not going to sleep for days on this... (btw I'm still trying to get answers from Cisco too but thought i'd approach this from all vectors).
Edit Cisco's Response:
1.) Why is mgmt. traffic forwarding from the WLC out to gi2/2/6 on the 6509 (vlan 350)
We are suspecting this behavior is due having several untagged interfaces mapped to different physical ports. This issue is already documented on the following ID: CSCvc12594. The bug could be applicable for all management traffic(snmp, NTP, Radius,etc). In order to confirm this issue it is necessary to tag all the interface on the controller.
2.) Why when tagging the management interface on the WLC with vlan 6 (the mgmt. vlan) all connectivity to the WLC was lost.
The connection with the controller was lost because we only tag the interface on the controller side. We need also to tag the interface on the switch side in order to both devices match the same configuration. In this case the controller was sending the traffic tagged to the switch on vlan 6 but the switch was expecting the vlan 6 without tag [native vlan 6]
Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi
Comments
-
jelevated Member Posts: 139Yep, WLC sending tagged frames on VLAN6, Switch expecting tags for everything but VLAN6. You probably would see native vlan errors in the console.
-
hurricane1091 Member Posts: 919 ■■■■□□□□□□I'm new to wireless so I'm curious in this. I can see that the switch would drop management traffic because it was coming in tagged with the native vlan. Exactly what happened to the user data traffic here? Are CAPWAP tunnels built upon the management VLAN? If so I guess I see how this causes a problem. Totally new dealing with my own wireless problems, so just spit balling.
-
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□I guess one thing I didn't fully understand was the fact that there was several untagged "interfaces"? assigned to the same port number (1). So I thought that I would only be affecting mgmt traffic on the 10.5.6.0/24 network However if capwap traffic rides on the same mgmt subnet I can see how I blew up all the WAPs. I guess I am slightly disappointed with Cisco TAC...
Either way it seems my original problem of missing mgmt traffic such as NTP / SNMP would still be lost due to the possible bug I am facing that Cisco mentions.Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□hurricane1091 wrote: »I'm new to wireless so I'm curious in this. I can see that the switch would drop management traffic because it was coming in tagged with the native vlan. Exactly what happened to the user data traffic here? Are CAPWAP tunnels built upon the management VLAN? If so I guess I see how this causes a problem. Totally new dealing with my own wireless problems, so just spit balling.
Ya I lost literally everything. 42 waps nation wideCurrently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
drewbert87 Member Posts: 16 ■□□□□□□□□□Literally no WLC experience here but just wanted to share my sympathies. That's a bad day. Sorry
-
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□drewbert87 wrote: »Literally no WLC experience here but just wanted to share my sympathies. That's a bad day. Sorry
Yep it was not a fun one. I was all smiles once I figured out why my management traffic was being lost. Then quickly felt dumb and defeated after causing an outageCurrently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□hurricane1091 wrote: »I'm new to wireless so I'm curious in this. I can see that the switch would drop management traffic because it was coming in tagged with the native vlan. Exactly what happened to the user data traffic here? Are CAPWAP tunnels built upon the management VLAN? If so I guess I see how this causes a problem. Totally new dealing with my own wireless problems, so just spit balling.
I wanted to comment back to you since you brought this up and these were my same thoughts when I initially broke stuff. Yes that interface is the interface used for all Layer 3 communications between the controller and access points joined to the controller. Have a read here if you want to catch up with more but effectively by me breaking the AP Mgr interface it took everything out. A big oops on my part but I have learned a lot recently and managed to keep my job...lol.
Cisco Wireless LAN Controller Configuration Guide, Release 7.4 - Configuring the AP-Manager Interface [Cisco Wireless LAN Controller Software] - CiscoCurrently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
JoJoCal19 Mod Posts: 2,835 ModAs well, no experience with this but wanted to say sorry to hear about that. Hopefully it doesn't end up being a resume generating event. If anything, I'd do up a full report and submit to management about what happened, why, and lessons learned. Show them the takeaway from this and what you will learn/do differently.Have: CISSP, CISM, CISA, CRISC, eJPT, GCIA, GSEC, CCSP, CCSK, AWS CSAA, AWS CCP, OCI Foundations Associate, ITIL-F, MS Cyber Security - USF, BSBA - UF, MSISA - WGU
Currently Working On: Python, OSCP Prep
Next Up: OSCP
Studying: Code Academy (Python), Bash Scripting, Virtual Hacking Lab Coursework -
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□As well, no experience with this but wanted to say sorry to hear about that. Hopefully it doesn't end up being a resume generating event. If anything, I'd do up a full report and submit to management about what happened, why, and lessons learned. Show them the takeaway from this and what you will learn/do differently.
I think they are going to let me stay haha! I guess if anything I picked a decent time to make a mistake (on a Friday towards the end of the business day on a holiday week) and we were only down 20-30 mins tops.
As far as giving a full report: My manager yanked me into a meeting with my director about an hour after the event. At the time I was still floundering trying to figure out how I broke things so badly. However, I owned up to it and explained I made a mistake and did not choose a wise time to do so. I assured them I would be sure to not make changes during the the normal production business day no mater how insignificant / simple it seems at the time (especially for the environment I was working on due to our dependency of wireless as a company). Also if I felt the need to do so I would only do so in break/fix scenarios and inform my manager prior to executing those changes.
I haven't given them a "full written report" because I was yanked into that meeting after the fact so quickly. Do you think I should still give a written one off to my immediate manager? Especially now that I know much more about what issue I caused and am much more educated of the environment?
This does eat at me a bit still because I am new to the company, I had a fundamental lapse in understanding, and most importantly: I KNEW BETTER!! I previously worked for a fortune 50 bank where we would get our hands slapped if we went into config mode to clear a err-disabled user access port even at the highest technical support level.
Edit: I went ahead and sent my manager and email anyway to give him a written detailed report. I thought about it for a while and didn't see how this could be a bad thing ever by any means.Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi -
NOC-Ninja Member Posts: 1,403You live and you learn.
Although, making changes like this should be done after hours. Implementation 101.
There should be a back up config and a roll back plan just in case it stops the service just like your scenario. Definitely dont make a change in production unless its a access port. You never know what can happen. Ive seen this again and again. It doesnt matter how great we are, it takes 1 mistake to make us look really bad.
Im not really sure about your network but reading from your post, its a weird design.
Supposedly, it should be designed as port channel coming from 6500 and lag in WLC. It seems that you have a connection 1 network at each port? -
Danielh22185 Member Posts: 1,195 ■■■■□□□□□□You live and you learn.
Although, making changes like this should be done after hours. Implementation 101.
There should be a back up config and a roll back plan just in case it stops the service just like your scenario. Definitely dont make a change in production unless its a access port. You never know what can happen. Ive seen this again and again. It doesnt matter how great we are, it takes 1 mistake to make us look really bad.
Im not really sure about your network but reading from your post, its a weird design.
Supposedly, it should be designed as port channel coming from 6500 and lag in WLC. It seems that you have a connection 1 network at each port?
Hey NOC-Ninja,
I was hoping you might catch wind of this. I'd also like to ping some other things off you if you don't mind via PM.
In short yes, our wireless network design has some holes. I am only a few months into this place so am still learning the environment. Also getting tons of other stuff dumped in my lap that contain pretty weak setups as well. Lots of previous engineers that did a little and left. I get the feeling the place has been a bit of a revolving door the past decade. So I saw this opportunity as a challenge to get knee deep into fixing stuff and being able to make a GOOD name for myself.
So...There are 4 total physical interfaces coming from the WLC into the 6509. 1 is the one I messed with (the management port) which is port 1 on the WLC. 2 other alternate physical ports connect the hotspot and guest networks (hotspot = port 3 vlan 350 guest = port 2 vlan 250). 1 for a down backup port that connects to the same 6509 (currently admin down, not sure why...) Why this was setup this way I have no idea, nor does the existing longer stent engineer seem to know either. However all of those ports are also untagged, but port 1 is the trunk. Port 2 and port 3 are access ports on the 6509 side. So after much brainstorming on my side I think I might be able to resolve the other issue I hinted at (lost SNMP / NTP, mgmt traffic) by forcing the already access ports to be just that and tag them with their appropriate vlan ID on the WLC side, so then I would have just one open untagged interface as Cisco suggests as best practice.Currently Studying: IE Stuff...kinda...for now...
My ultimate career goal: To climb to the top of the computer network industry food chain.
"Winning means you're willing to go longer, work harder, and give more than anyone else." - Vince Lombardi