Solved: Etherchannel group dropping packets to specific ip addresses.

fallingtree · ‎04-07-2014

Hello,

I'm having an issue where an etherchannel will drop packets when trying to connect to a few specific ip address. The majority of the traffic will go through fine, but then there will be a few specific addresses that do not work. When I change back to a single ethernet cord without channel grouping and just use a trunk, all traffic flows properly.

I am currently connecting 2 ports on a WS-C2960S-48FPS-L to a stack of 3 WS-C3850-48P switches.

2 ports on the 2960(1/0/49,1/0/50) are grouped(LACP) and connected to 1/0/47 and 2/0/47 on the stack of 3 3850s.

The only commands on the actual ports are switchport mode trunk and channel-group mode 2 active.

The traffic having the issues seem to be on the 2960. Also note that this occurs even when there is hardly any traffic on the switch, so it is not being overloaded by any means.

For example, I have a few addresses say 192.168.1.1 , 192.168.1.20, and 192.68.1.30 that I can not ping or connect to at all and all of these reside on the stack of 3 3850s. Yet if I remove the channel group and just go back to a single trunk line all is well.

if I try to ping the 192.168.1.1 from the 2960, it will not respond when on the etherchannel. I will also note that this issue does not appear to happen on any of the other etherchannels from the stack of 3 3850s to any other switches, just this particular 2960.

I don't really understand why a few specific ip addresses would be dropped while the majority pass through. To be clear though, it's not dropping random packets to these ip addresses, no packets seem to make it to these ip addresses at all.

Any help in this matter would be appreciated.

Other info

(Stack of 3 3850s) - 03.02.02.SE

2960 - 12.2(55)SE3

Akshay Balaganur · ‎04-08-2014

This is a classic case of one link being bad in an ether channel.

When the traffic has to traverse an ether channel, it first undergoes a hashing algorithm, that will decide which port of the ether channel will be used to forward the traffic.

The Hashing algorithm currently benign used can be checked with the command.

show etherchannel load-balance.

If one of the links/ports is bad and is dropping packets, then only certain streams of traffic , which hash to that port, will be affected.
Once you have identified the bad link, you have to look for issues like layer 1, interface errors, cabling etc.

How to identify which port of etherchannel is bad?

1. Shut one port at a time. This will make the Etherchannel has algo to recalculate the Hashes. In your case, since you have only two ports, all the traffic will be moved to the remaining port.

With this method you will be able to identify the culprit port in maximum 2 steps.

2. Second way to identify is more methodological and quite “cool” ! ;-)
In this method, we will use the Cisco CLI to identify which port the problematic stream ( Source IP and Destination IP ) are hashing to.
The command on 2960/3750 is

test etherchannel load-balance interface port-channel number {ip | mac} [source_ip_add |

For Example.

                { 1/0/49 ———————— 1/0/47 }
   2960 — PO1                                    PO1 — 3850
                {1/0/50————————— 2/0/47}

You say you cannot ping 192.168.1.1 from 2960. For illustration, lets say the interface on 2960 is 192.168.1.2.

We should check the following command on the 2960.

test etherchannel load-balance interface port-channel 1 ip 192.168.1.2 192.168.1.1

Let me know how it goes.
My bet is the culprit port is 1/0/50. I say this because,when you unbundle the ether channel, there will be two redundant links. The spanning-tree will put 1/0/49 in forwarding state, as it is the lower port number. 1/0/50 should be in blocked state.

More information of ether-channel load-balancing.

http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12023-4.html#cat2950_3550

View solution in original post

Akshay Balaganur · ‎04-08-2014

This is a classic case of one link being bad in an ether channel.

When the traffic has to traverse an ether channel, it first undergoes a hashing algorithm, that will decide which port of the ether channel will be used to forward the traffic.

The Hashing algorithm currently benign used can be checked with the command.

show etherchannel load-balance.

If one of the links/ports is bad and is dropping packets, then only certain streams of traffic , which hash to that port, will be affected.
Once you have identified the bad link, you have to look for issues like layer 1, interface errors, cabling etc.

How to identify which port of etherchannel is bad?

1. Shut one port at a time. This will make the Etherchannel has algo to recalculate the Hashes. In your case, since you have only two ports, all the traffic will be moved to the remaining port.

With this method you will be able to identify the culprit port in maximum 2 steps.

2. Second way to identify is more methodological and quite “cool” ! ;-)
In this method, we will use the Cisco CLI to identify which port the problematic stream ( Source IP and Destination IP ) are hashing to.
The command on 2960/3750 is

test etherchannel load-balance interface port-channel number {ip | mac} [source_ip_add |

For Example.

                { 1/0/49 ———————— 1/0/47 }
   2960 — PO1                                    PO1 — 3850
                {1/0/50————————— 2/0/47}

You say you cannot ping 192.168.1.1 from 2960. For illustration, lets say the interface on 2960 is 192.168.1.2.

We should check the following command on the 2960.

test etherchannel load-balance interface port-channel 1 ip 192.168.1.2 192.168.1.1

Let me know how it goes.
My bet is the culprit port is 1/0/50. I say this because,when you unbundle the ether channel, there will be two redundant links. The spanning-tree will put 1/0/49 in forwarding state, as it is the lower port number. 1/0/50 should be in blocked state.

More information of ether-channel load-balancing.

http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12023-4.html#cat2950_3550

fallingtree · ‎04-09-2014

Thank you for this information. As time allows in the next few days I will try testing new ports/cords and failover to see what happens.

fallingtree · ‎04-11-2014

Hello,

I thought I would give a follow-up for those who may similar issues on 3850 Stacks.

The issue at first seemed to be etherchannel related but soon after posting my question more issues began cropping up and then some major routing issues started to begin.

In the end, we started experiencing random and inconsistent routing issues and multicast became completely unreliable.

It turns out the version of the IOS we were using has some really nasty bugs in it.

Once I updated the ios from 03.02.02 SE up to 03.03.02 we noticed substantial improvements.