Options

802.3ad fault tolerance?

W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
Just though I'd check with the networking gurus out there. 802.3ad doesn't seem to actually failover if the speed of the links are different. I noticed I could ping out from the server but couldn't ping the server itself from another system but this could just be the result of the way our networking guys have things set up. I can't seem to find any documentation specifically saying that it's not fault tolerant but I did notice that the kernel documentation for channel bonding specifies that certain link aggregation modes were fault tolerant and didn't specify it for 802.3ad or mode 4. Is this mode not meant to be fault tolerant?

Comments

  • Options
    it_consultantit_consultant Member Posts: 1,903
    You shouldn't be bonding links of different speeds - in fact I thought that was not allowed. If the links are the same speed I have come to expect LACP to be completely fault tolerant. I have tested it real world and in a lab and nary a ping is lost when a link is failed out.
  • Options
    rowelldrowelld Member Posts: 176
    it_consultant is correct. You shouldn't bond links of different speeds. Bonded links should have the same characteristics.
    Visit my blog: http://www.packet6.com - I'm on the CWNE journey!
  • Options
    W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    The links weren't intentionally at different speeds. One of the ports on the NIC seems to be going back and only operates at 100mbps. I'd expect it to cause the channel bonding to fail but I'd hope that it would at least fail over to one of the links rather than just fail altogether.
  • Options
    it_consultantit_consultant Member Posts: 1,903
    It sounds like the switch actually failed out one of the links but the NIC itself is still trying to talk on the failed out link, or the hash is messed up. Either way the problem is usually always on the NIC side. I have found Broadcom NICs to be some of the worst offenders.

    Server 2012 WILL let you bond unlike port speeds together if and only if you are not using LACP or static bonding (in other words, the switch is unaware of the set up) and you select one of the links to be a failover link as opposed to an active link. I have this setup in my DR site where we have 10GB links but not enough to be totally redundant. I have tested this and the failover is instantaneous, not quite as seemless as a LACP bond but good enough that packets don't drop.
  • Options
    phoeneousphoeneous Member Posts: 2,333 ■■■■■■■□□□
    W Stewart wrote: »
    The links weren't intentionally at different speeds. One of the ports on the NIC seems to be going back and only operates at 100mbps. I'd expect it to cause the channel bonding to fail but I'd hope that it would at least fail over to one of the links rather than just fail altogether.

    What kind of device?
  • Options
    W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    PowerEdge server. Looking at the documentation in the linux kernel, I'd have to guess that changing the speed or duplex of one interface prevents 802.3ad from working at all. It seems like it may only provide fault tolerance if one interface goes offline.
  • Options
    EdTheLadEdTheLad Member Posts: 2,111 ■■■■□□□□□□
    Is auto neg enabled on both sides for all links? Weather the bundle fails will depend on how its configured, min-links etc. I'm not sure offhand on the mechanism lacp uses to consider a link down, is it purely layer 1 failure or does it look at speed duplex etc. If i understand you correctly the bundle shows no issue i.e. all links look good to the bundle? From your description it indicates the link is not forwarding traffic while the other link or links are. Which link the traffic takes is dependent on the hashing algorithm, just so happens that you hit the good link when pinging out and the bad link when pinging in. You should be able to look at interface statistics to determine if the individual link is dropping or not being used.
    Networking, sometimes i love it, mostly i hate it.Its all about the $$$$
  • Options
    W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    Not too sure about the switch since I don't have access to it but the server is auto negotiating. The bond interface and the bad link both showed as down. the good link showed as up. The ip address is on the bonded interface but I was still pinging out so it's possible that it was just using the good link when pinging out but replies were coming in on the bonded link. It just seems strange that the bonded link didn't fail over but the kernel documentation does make it seem like matching speed and duplex is a prerequisite for the link even though it doesn't seem to directly say it.





    IEEE 802.3ad Dynamic link aggregation. Creates
    aggregation groups that share the same speed and
    duplex settings. Utilizes all slaves in the active
    aggregator according to the 802.3ad specification.

    Slave selection for outgoing traffic is done according
    to the transmit hash policy, which may be changed from
    the default simple XOR policy via the xmit_hash_policy
    option, documented below. Note that not all transmit
    policies may be 802.3ad compliant, particularly in
    regards to the packet mis-ordering requirements of
    section 43.2.4 of the 802.3ad standard. Differing
    peer implementations will have varying tolerances for
    noncompliance.

    Prerequisites:

    1. Ethtool support in the base drivers for retrieving
    the speed and duplex of each slave.

    2. A switch that supports IEEE 802.3ad Dynamic link
    aggregation.

    Most switches will require some type of configuration
    to enable 802.3ad mode.
  • Options
    it_consultantit_consultant Member Posts: 1,903
    I am not sure what brand of switch you are using but there should be a show trunk or show etherchannel command which will show errors on the bond. If the NIC is not operating properly you will probably see the LACP protocol error counter abnormally high. You would also see hash mismatches etc.
  • Options
    phoeneousphoeneous Member Posts: 2,333 ■■■■■■■□□□
    I am not sure what brand of switch you are using but there should be a show trunk or show etherchannel command which will show errors on the bond. If the NIC is not operating properly you will probably see the LACP protocol error counter abnormally high. You would also see hash mismatches etc.

    With my dell pe windows boxes, it is all controlled through BACS. Once you get the switch configured correctly, I'm sure there is a linux version of BACS that you can setup. Provided that it is a Broadcom nic of course.
  • Options
    W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    No access to switch. I'm the server guy. Either way, it was fixed by using another port on the server. I was just curious if LACP was supposed to failover or not but out network engineer said that it should failover so I'm guessing it didn't because having two NICs at different speeds doesn't meet the requirements for using mode 4 with the bonding module in the linux kernel.
  • Options
    deth1kdeth1k Member Posts: 312
    Ive had similar issues on linux server using ifenslave for bonding one of the links was negotiating at lower speed therefore switch would take that link out of etherchannel. Ive had to manually set the speed using ethtools / miitool.
  • Options
    W StewartW Stewart Member Posts: 794 ■■■■□□□□□□
    Yeah sounds like the same issue. I tried manually setting the speed but the NIC wouldn't let me. I think there was a hardware issue preventing the NIC from operating at 1Gbps like it was supposed to.
Sign In or Register to comment.