Options

Service Provider issue with burned-in MAC

MonkerzMonkerz Member Posts: 842
I thought I would bounce this off you guys/gals to see what you thought.

So, if you take a gander at the diagram you will see a snippet containing three sites. At each site a 6509E chassis with two connections to the service provider's OCCAM devices. There are 100Mbps point-to-point connections from A-to-B, B-to-C, and C-to-A.

The other day, while implementing OSPF in efforts to rid the enterprise of RIP (yes I said RIP and yes I laughed when I was told RIP was used) we discovered a problem with the circuit from site A to site B. Before OSPF, the link was weighted as a backup only link so the problem didn't rear it's head as the link was never used.

The problem initially diagnosed as packet loss between Site A and Site B, both of the other circuits had no problem. The provider tested the circuit and came back clean. At this point myself and another analyst were engaged to troubleshoot. At first we attempted to ping from 10.1.0.2 to 10.1.0.1, which resulted in very intermittent packet loss. We then attempted to ping from 10.1.0.1 to 10.1.0.2, which resulted in 100% packet loss.

We threw a couple of Laptops on the link and was able to ping through, both ways, without loss.

I found this very odd and decided normalize the hardware and SPAN'd the ports touching the provider gear to verify traffic.

While pinging from 10.1.0.2 to 10.1.0.1, the switch arp'd and the correct MAC was returned. ICMP requests were sent and I watched every one of them make it to Site B's 6509. I then noticed Site B's 6509 replying to every one of those requests, but not every reply was making it back through the provider's network to Site A.

While pinging from 10.1.0.1 to 10.1.0.2, the switch arp'd and the correct MAC was returned. ICMP requests were generated and sent toward the provider's gear, but never emerged from the provider's gear at Site A.

After getting the provider back on the phone (as I am very ignorant when it comes to provider networks currently, working to fix that) to verify that they were not mac filtering on that link we, we decided to test a hybrid of the two initial tests.

We threw a laptop on on Site B's end of the link while leaving Site A connected to the 6509. We were able to ping through with no problem both ways. We then switched it up by throwing the laptop on Site A's end of the link and connecting Site B's side to the 6509. From the laptop we were getting 100% packet loss to the 6509, and from the 6509 we were seeing very intermittent packet loss to the laptop.

We normalized hardware and configured a soft MAC on Site B's 10.1.0.2 interface and were able to ping through with no problems both ways.

We asked the provider again to verify they were not blocking our mac address for some reason to which they said they weren't. We asked them to rebuild the circuit from port to port, but that resulted in the same exact problem.

At this point we are running with the soft MAC configured on the interface with no problems. I am beginning to think that because the 6509 uses the same mac for all L3 interfaces that the provider is leaking traffic somewhere within their network. That traffic destined for 10.1.0.2 is actually being forwarded to 10.2.0.2 as they share the same mac address. I guess I could have verified this if I SPAN's the 10.2.0.2 interface and sniffed traffic.

What do ya'll think?

Comments

Sign In or Register to comment.