ASA issue... TAC came through.
mikearama
Member Posts: 749
Just wanted to share this with you ASA techies... great learning curve yesterday, and as usual, the smallest things cause the greatest headaches.
We are in the middle of a project to replace all of our PIX firewalls with ASA's. UAT was completed at Xmas, and we installed two pairs (External and Internal) of ASA's... internal 5540's, and 5520's external. They all share the common management vlan for their management ports, and the inside ASA's also share a BRIDGE network to the rest of the network.
Yesterday, we racked the PROD pairs, added power, and connected to the management vlan.
A couple hours later, I tried to ASDM into the internal UAT active ASA... no joy. SSH... no luck either. I then consoled into the Active unit and did some digging, and did a 'sh fail', and got this:
This host: Primary - Active
Active time: 1787 (sec)
slot 0: ASA5520 hw/sw rev (2.0/8.0(4)) status (Up Sys)
admin Interface Brdgnet-admin (10.22.208.9): Normal
admin Interface management-admin (10.22.151.56): Normal (Waiting)
ASA-CON1 Interface Outside-Con1 (172.23.6.1): Normal
ASA-CON1 Interface UAT-WEB (172.23.2.1): Normal
ASA-CON1 Interface UAT-APP (10.22.153.1): Normal
ASA-CON1 Interface Brdgnet-con1 (10.22.208.58 ): Normal
ASA-CON1 Interface management-con1 (10.22.151.58 ): Normal (Waiting)
ASA-CON2 Interface Outside-Con2 (172.23.7.1): Normal
ASA-CON2 Interface DMZ-WebFuture (172.23.3.1): Normal
ASA-CON2 Interface Brdgnet-con2 (10.22.208.50): Normal
ASA-CON2 Interface management-con2 (10.22.151.50): Normal (Waiting)
And according to the syslog server, everything was good until 12:45pm, when this started:
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105008: (Primary) Testing Interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105009: (Primary) Testing on interface management-con1 Passed
...
over and over, every 15 seconds.
No outage, no failover, just an inability to remotely manage the device outside of the console.
I blew a couple hours checking everything I could think of... new cat6 cables, different ports, different IP addresses... nothing.
I opened a tac case and was put through to an engineer in India... great guy name Rahul. He'd obviously seen this kinds crap before, cause he narrowed it down to a layer 2 issue in minutes. Not long after, he asked if I'd installed any other new devices on the management vlan today. When I said I had, and that they were ASA's for production, he asked me to get the mac addresses of the management interface.
Turns out that when I enabled the failover pair and issued the 'mac-address auto' command, the ASA generated the three virtual mac's... one for each of our three contexts... as it's supposed to, and as it did for the UAT ASA's back in December. HOWEVER, of interest, is that the new PROD ASA's generated the EXACT SAME MAC ADDRESSES as the UAT ASA's.
So, the engineer changed the UAT mac addresses by one digit, and everything came up just peachy. Let that be a lesson... don't trust 'mac-address auto' if you're running more than one pair of ASA's that share any common network. Now that I mention it, kinda retarded, eh? You would think the ASA's would be smart enough to generate completely random mac's without overlapping. Oh, and while the UAT pair are 5540, the PROD pair are 5550's... and they still share the same mac-generating algorithm! Go figure.
We are in the middle of a project to replace all of our PIX firewalls with ASA's. UAT was completed at Xmas, and we installed two pairs (External and Internal) of ASA's... internal 5540's, and 5520's external. They all share the common management vlan for their management ports, and the inside ASA's also share a BRIDGE network to the rest of the network.
Yesterday, we racked the PROD pairs, added power, and connected to the management vlan.
A couple hours later, I tried to ASDM into the internal UAT active ASA... no joy. SSH... no luck either. I then consoled into the Active unit and did some digging, and did a 'sh fail', and got this:
This host: Primary - Active
Active time: 1787 (sec)
slot 0: ASA5520 hw/sw rev (2.0/8.0(4)) status (Up Sys)
admin Interface Brdgnet-admin (10.22.208.9): Normal
admin Interface management-admin (10.22.151.56): Normal (Waiting)
ASA-CON1 Interface Outside-Con1 (172.23.6.1): Normal
ASA-CON1 Interface UAT-WEB (172.23.2.1): Normal
ASA-CON1 Interface UAT-APP (10.22.153.1): Normal
ASA-CON1 Interface Brdgnet-con1 (10.22.208.58 ): Normal
ASA-CON1 Interface management-con1 (10.22.151.58 ): Normal (Waiting)
ASA-CON2 Interface Outside-Con2 (172.23.7.1): Normal
ASA-CON2 Interface DMZ-WebFuture (172.23.3.1): Normal
ASA-CON2 Interface Brdgnet-con2 (10.22.208.50): Normal
ASA-CON2 Interface management-con2 (10.22.151.50): Normal (Waiting)
And according to the syslog server, everything was good until 12:45pm, when this started:
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105008: (Primary) Testing Interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105009: (Primary) Testing on interface management-con1 Passed
...
over and over, every 15 seconds.
No outage, no failover, just an inability to remotely manage the device outside of the console.
I blew a couple hours checking everything I could think of... new cat6 cables, different ports, different IP addresses... nothing.
I opened a tac case and was put through to an engineer in India... great guy name Rahul. He'd obviously seen this kinds crap before, cause he narrowed it down to a layer 2 issue in minutes. Not long after, he asked if I'd installed any other new devices on the management vlan today. When I said I had, and that they were ASA's for production, he asked me to get the mac addresses of the management interface.
Turns out that when I enabled the failover pair and issued the 'mac-address auto' command, the ASA generated the three virtual mac's... one for each of our three contexts... as it's supposed to, and as it did for the UAT ASA's back in December. HOWEVER, of interest, is that the new PROD ASA's generated the EXACT SAME MAC ADDRESSES as the UAT ASA's.
So, the engineer changed the UAT mac addresses by one digit, and everything came up just peachy. Let that be a lesson... don't trust 'mac-address auto' if you're running more than one pair of ASA's that share any common network. Now that I mention it, kinda retarded, eh? You would think the ASA's would be smart enough to generate completely random mac's without overlapping. Oh, and while the UAT pair are 5540, the PROD pair are 5550's... and they still share the same mac-generating algorithm! Go figure.
There are only 10 kinds of people... those who understand binary, and those that don't.
CCIE Studies: Written passed: Jan 21/12 Lab Prep: Hours reading: 385. Hours labbing: 110
Taking a time-out to add the CCVP. Capitalizing on a current IPT pilot project.
CCIE Studies: Written passed: Jan 21/12 Lab Prep: Hours reading: 385. Hours labbing: 110
Taking a time-out to add the CCVP. Capitalizing on a current IPT pilot project.
Comments
-
bertieb Member Posts: 1,031 ■■■■■■□□□□Nice, good fix.
I wish this kind of logic would apply between my random lottery numbers and the lottery companies number generatorThe trouble with quotes on the internet is that you can never tell if they are genuine - Abraham Lincoln -
Ahriakin Member Posts: 1,799 ■■■■■■■■□□Thanks for sharing.
A thing I'm finding is the deeper you go into networking the less you can troubleshoot with logs and packet captures become your new best friend, so many little things like this will only show up on captures.We responded to the Year 2000 issue with "Y2K" solutions...isn't this the kind of thinking that got us into trouble in the first place?