ASA issue... TAC came through.
Just wanted to share this with you ASA techies... great learning curve yesterday, and as usual, the smallest things cause the greatest headaches.
We are in the middle of a project to replace all of our PIX firewalls with ASA's. UAT was completed at Xmas, and we installed two pairs (External and Internal) of ASA's... internal 5540's, and 5520's external. They all share the common management vlan for their management ports, and the inside ASA's also share a BRIDGE network to the rest of the network.
Yesterday, we racked the PROD pairs, added power, and connected to the management vlan.
A couple hours later, I tried to ASDM into the internal UAT active ASA... no joy. SSH... no luck either. I then consoled into the Active unit and did some digging, and did a 'sh fail', and got this:
This host: Primary - Active
Active time: 1787 (sec)
slot 0: ASA5520 hw/sw rev (2.0/8.0(4)) status (Up Sys)
admin Interface Brdgnet-admin (10.22.208.9): Normal
admin Interface management-admin (10.22.151.56): Normal (Waiting)
ASA-CON1 Interface Outside-Con1 (172.23.6.1): Normal
ASA-CON1 Interface UAT-WEB (172.23.2.1): Normal
ASA-CON1 Interface UAT-APP (10.22.153.1): Normal
ASA-CON1 Interface Brdgnet-con1 (10.22.208.58 ): Normal
ASA-CON1 Interface management-con1 (10.22.151.58 ): Normal (Waiting)
ASA-CON2 Interface Outside-Con2 (172.23.7.1): Normal
ASA-CON2 Interface DMZ-WebFuture (172.23.3.1): Normal
ASA-CON2 Interface Brdgnet-con2 (10.22.208.50): Normal
ASA-CON2 Interface management-con2 (10.22.151.50): Normal (Waiting)
And according to the syslog server, everything was good until 12:45pm, when this started:
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105005: (Primary) Lost Failover communications with mate on interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105008: (Primary) Testing Interface management-con1
Apr 08 12:45:38 10.22.151.58 local5.alert Apr 08 2009 12:49:55: %ASA-1-105009: (Primary) Testing on interface management-con1 Passed
...
over and over, every 15 seconds.
No outage, no failover, just an inability to remotely manage the device outside of the console.
I blew a couple hours checking everything I could think of... new cat6 cables, different ports, different IP addresses... nothing.
I opened a tac case and was put through to an engineer in India... great guy name Rahul. He'd obviously seen this kinds crap before, cause he narrowed it down to a layer 2 issue in minutes. Not long after, he asked if I'd installed any other new devices on the management vlan today. When I said I had, and that they were ASA's for production, he asked me to get the mac addresses of the management interface.
Turns out that when I enabled the failover pair and issued the 'mac-address auto' command, the ASA generated the three virtual mac's... one for each of our three contexts... as it's supposed to, and as it did for the UAT ASA's back in December. HOWEVER, of interest, is that the new PROD ASA's generated the EXACT SAME MAC ADDRESSES as the UAT ASA's.
So, the engineer changed the UAT mac addresses by one digit, and everything came up just peachy. Let that be a lesson... don't trust 'mac-address auto' if you're running more than one pair of ASA's that share any common network. Now that I mention it, kinda retarded, eh? You would think the ASA's would be smart enough to generate completely random mac's without overlapping. Oh, and while the UAT pair are 5540, the PROD pair are 5550's... and they still share the same mac-generating algorithm! Go figure.