Port-channel and STP problem

nibauramos · ‎03-15-2012

Hello, I'm having a problem with some of my switchs, I'll give an example with only two switchs but this is happening in some other switchs also:

I have switchA:

#show version

Cisco IOS Software, C3750 Software (C3750-IPBASE-M), Version 12.2(35)SE5, RELEASE SOFTWARE (fc1)

Compiled Thu 19-Jul-07 19:15 by nachen

Image text-base: 0x00003000, data-base: 0x01080000

and SwitchB:

#show version

Cisco IOS Software, C3750 Software (C3750-IPBASE-M), Version 12.2(35)SE5, RELEASE SOFTWARE (fc1)

Compiled Thu 19-Jul-07 19:15 by nachen

Image text-base: 0x00003000, data-base: 0x01080000

These two switchs are connected with two gigabit ethernet links, like this

SwitchA:Gi2/0/21<->SwitchB:Gi1/0/23 and SwitchA:Gi2/0/22<->SwitchB:GI1/0/24

the configurations of the ports os switchA:

SwitchA#show running-config interface po3

interface Port-channel3

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

end

SwitchA#show running-config int gi 2/0/21

interface GigabitEthernet2/0/21

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 3 mode desirable

end

SwitchA#show running-config int gi 2/0/22

interface GigabitEthernet2/0/22

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 3 mode desirable

end

Configuration of ports in SwitchB:

SwitchB#show running-config interface po3

interface Port-channel3

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

end

SwitchB#show running-config interface gi 1/0/23

interface GigabitEthernet1/0/23

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 3 mode desirable

end

SwitchB#show running-config interface gi 1/0/24

interface GigabitEthernet1/0/24

description uplink

switchport trunk encapsulation dot1q

switchport mode trunk

channel-group 3 mode desirable

end

I believe this is a very "simple" configuration, everything is correct with my port-channel configuration right? Am I doing something wrong?

The problem that is happening is that sometimes STP kicks in and says it detects a loop in one of the ports of the port-channel... and disables that port... I thought STP would work at the channel level and not port-level in this case...

Another thing that puzzels me is....in switchA I have several vlans, and links to other switchs, but in SwitchB, at the moment I have nothing but the two links to SwitchA, no other cables are physically connected to that switch...how is it possible to detect a loop?

Here is an example of an STP blocking message:

#show logging

1w1d: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan900, changed state to up

1w1d: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet1/0/23.

1w1d: %PM-4-ERR_DISABLE: loopback error detected on Gi1/0/23, putting Gi1/0/23 in err-disable state

1w1d: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/23, changed state to down

1w1d: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/23, changed state to down

Why is this happening??

Thank you for your help!

just for reference I did these commands:

SwitchA#show run | i spanning

spanning-tree mode rapid-pvst

spanning-tree loopguard default

spanning-tree extend system-id

SwitchB#show run | i spanning

spanning-tree mode pvst

spanning-tree extend system-id

Pekka Majuri · ‎03-15-2012

Could you please look your etherchannels that they are both ok (and running lacp), where in vlan 900 you have the the spanning root (behind B?) and you better to look also from both swithces the spanning-tree detail of vlan 900.

Now it looks like that A is missing some of stp hellos coming through B, and opens both links, which are not fully configured as a LAG... and when the B receives back a bpdu it has sent to A, the only solutions if to take down

the link as a redudant path (i.e. your channel is not what you assume and it should be...)

why you are both side mode desirable (why not active)?

nibauramos · ‎03-15-2012

Hello,

I placed both ports in desirable but I can pass them to Active, according to cisco http://www.cisco.com/en/US/docs/routers/7600/ios/12.1E/command/reference/c1.pdf
placing them in desirable will enable PAgP, is my problem related do PAgP? I have several other port-channels configured with PAgP that work without a problem.

thank you

Carlos Ramos

Jan Hrnko · ‎03-15-2012

Hi Carlos,

I'm here with some updates on your problem. The problem and error message you are observing is definitely not a problem of the STP. This means that the port has received its own LOOP frame.

My friend suggests that you should try to upgrade IOS to newer version - for example 12.2(55) if the cabling is 100% correct and the other device is not in ON mode. Maybe it is a bug in the old IOS.

Best regards,

Jan

Jan Hrnko · ‎03-15-2012

Hi Pekka,

I think it is very reasonable to have both sides in desirable and not in ON mode. If you can - you should always negotiate , not just force them to form etherchannel. Why do you think it would be good if they were in ON mode?

Best regards,

Jan

Jan Hrnko · ‎03-15-2012

Hi Carlos,

are you absolutely sure that the cause of the problem is STP?

1w1d: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet1/0/23.

I think that this error is not related to STP.

I thought STP would work at the channel level and not port-level in this case...

Yes you are correct. It works at the channel level, not port level.

Best regards,

Jan

nibauramos · ‎03-15-2012

Hello, I believe all the cabling is correct, I've changed the cables, tested everything, this is such a simple scenario that I find it difficult to have it wrong

I'm going to do what you suggest and upgrade the IOS.

thank you

Carlos Ramos

Jan Hrnko · ‎03-15-2012

Hi Carlos,

Let me know if it worked . Yes you are absolutely correct - a simple scenario, no obvious misconfigurations. I suppose you have tried several times to shutdown, then no shutdown the port because of the errdisable status. If it just keeps crashing into errdisable, the problem must be somewhere else.

Best regards,

Jan

Pekka Majuri · ‎03-15-2012

Could you please provide sh etherchannel 3 port-channel / sh etherchannel 3 detail ?

My personal opinion is that it would better to use desiable only in the other end of LAG (PAgP) channel and

on the other end AUTO (which will also negotiate, when the other started to establish the negotiations...)

On the other hand, we have in our network a plenty of the LAGs (between larger and smaller cisco switches

and Nexuses), And due the PAgP intermittent problems, we have activly changed them to run LACP,

which will work nice with the other wendors (server, Blade switches etc) devices..

nibauramos · ‎03-15-2012

Hello, I've just updated the IOS in switchB:

Cisco IOS Software, C3750 Software (C3750-IPSERVICESK9-M), Version 12.2(55)SE4, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Compiled Tue 06-Sep-11 02:59 by prod_rel_team

Image text-base: 0x01000000, data-base: 0x02F00000

I still haven't updated on switchA because it has several customers connected to it, I have a time inwdows in about an hour that will allow me to do it...however right after updating this one I did the following.... I disconnected one of the links in switchB leaving it only with port 24 connected, so the port-channel now has only one gigabitEthernet link active.... I rebooted the switch to load the new IOS Image and the following happened as son as the port came up:

00:02:30: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to up

00:02:30: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan56, changed state to up

00:02:30: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan900, changed state to up

00:02:34: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet1/0/24.

00:02:34: %PM-4-ERR_DISABLE: loopback error detected on Gi1/0/24, putting Gi1/0/24 in err-disable state

00:02:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down

00:02:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan56, changed state to down

00:02:34: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan900, changed state to down

00:02:35: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/24, changed state to down

00:02:35: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel3, changed stat n

then I connected port 23, left 24 in err-disable and the same happened to 23:

00:04:26: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/23, changed state to up

00:04:30: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/23, changed state to up

00:04:31: %LINK-3-UPDOWN: Interface Port-channel3, changed state to up

00:04:32: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel3, changed state to up

00:04:59: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to up

00:05:00: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan56, changed state to up

00:05:00: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan900, changed state to up

00:05:04: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet1/0/23.

00:05:04: %PM-4-ERR_DISABLE: loopback error detected on Gi1/0/23, putting Gi1/0/23 in err-disable state

00:05:04: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down

00:05:04: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan56, changed state to down

00:05:04: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan900, changed state to down

00:05:05: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/23, changed state to down

00:05:05: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel3, changed state to down

00:05:06: %LINK-3-UPDOWN: Interface Port-channel3, changed state to down

00:05:06: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/23, changed state to down

I'm starting to think there is actually some problem (loop) in my vlans... but why is it just being detected when I connect this switch... this switch like I said before has nothing except the uplink to switchA connected....

After this I did a shut/no shut in gi1/0/23 and the port 23 came up, followed by the port-channel, ..... its stil working so far (10 minutes....)

Pekka Majuri, here are the commands you requested executed right now, with port 24 in err-disable and port 23 working:

#show etherchannel 3 port-channel

Port-channels in the group:

---------------------------

Port-channel: Po3

------------

Age of the Port-channel = 0d:00h:19m:52s

Logical slot/port = 10/3 Number of ports = 1

GC = 0x00030001 HotStandBy port = null

Port state = Port-channel Ag-Inuse

Protocol = PAgP

Port security = Disabled

Ports in the Port-channel:

Index Load Port EC state No of bits

------+------+------+------------------+-----------

0 00 Gi1/0/23 Desirable-Sl 0

Time since last port bundled: 0d:00h:12m:47s Gi1/0/23

Time since last port Un-bundled: 0d:00h:16m:37s Gi1/0/23

#show etherchannel 3 detail

Group state = L2

Ports: 2 Maxports = 8

Port-channels: 1 Max Port-channels = 1

Protocol: PAgP

Minimum Links: 0

Ports in the group:

-------------------

Port: Gi1/0/23

------------

Port state = Up Mstr In-Bndl

Channel group = 3 Mode = Desirable-Sl Gcchange = 0

Port-channel = Po3 GC = 0x00030001 Pseudo port-channel = Po3

Port index = 0 Load = 0x00 Protocol = PAgP

Flags: S - Device is sending Slow hello. C - Device is in Consistent state.

A - Device is in Auto mode. P - Device learns on physical port.

d - PAgP is down.

Timers: H - Hello timer is running. Q - Quit timer is running.

S - Switching timer is running. I - Interface timer is running.

Local information:

Hello Partner PAgP Learning Group

Port Flags State Timers Interval Count Priority Method Ifindex

Gi1/0/23 SC U6/S7 H 30s 1 128 Any 5003

Partner's information:

Partner Partner Partner Partner Group

Port Name Device ID Port Age Flags Cap.

Gi1/0/23 SW00A 0024.5137.5b80 Gi2/0/21 21s SC 30001

Age of the port in the current state: 0d:00h:12m:56s

Port: Gi1/0/24

------------

Port state = Down Not-in-Bndl

Channel group = 3 Mode = Desirable-Sl Gcchange = 0

Port-channel = null GC = 0x00000000 Pseudo port-channel = Po3

Port index = 0 Load = 0x00 Protocol = PAgP

Flags: S - Device is sending Slow hello. C - Device is in Consistent state.

A - Device is in Auto mode. P - Device learns on physical port.

d - PAgP is down.

Timers: H - Hello timer is running. Q - Quit timer is running.

S - Switching timer is running. I - Interface timer is running.

Local information:

Hello Partner PAgP Learning Group

Port Flags State Timers Interval Count Priority Method Ifindex

Gi1/0/24 d U1/S1 1s 0 128 Any 0

Age of the port in the current state: 0d:00h:19m:19s

Port-channels in the group:

---------------------------

Port-channel: Po3

------------

Age of the Port-channel = 0d:00h:20m:04s

Logical slot/port = 10/3 Number of ports = 1

GC = 0x00030001 HotStandBy port = null

Port state = Port-channel Ag-Inuse

Protocol = PAgP

Port security = Disabled

Ports in the Port-channel:

Index Load Port EC state No of bits

------+------+------+------------------+-----------

0 00 Gi1/0/23 Desirable-Sl 0

Time since last port bundled: 0d:00h:12m:59s Gi1/0/23

Time since last port Un-bundled: 0d:00h:16m:50s Gi1/0/23

jimmysands73_2 · ‎03-15-2012

http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a00806cd87b.shtml

Loopback error

A loopback error occurs when the keepalive packet is looped back to the port that sent the keepalive. The switch sends keepalives out all the interfaces by default. A device can loop the packets back to the source interface, which usually occurs because there is a logical loop in the network that the spanning tree has not blocked. The source interface receives the keepalive packet that it sent out, and the switch disables the interface (errdisable). This message occurs because the keepalive packet is looped back to the port that sent the keepalive:

%PM-4-ERR_DISABLE: loopback error detected on Gi4/1, putting Gi4/1 in
err-disable state

Keepalives are sent on all interfaces by default in Cisco IOS Software Release 12.1EA-based software. In Cisco IOS Software Release 12.2SE-based software and later, keepalives are not sent by default on fiber and uplink interfaces. For more information, refer to Cisco bug ID CSCea46385 (registered customers only) .

gazillion_dwolfe · ‎03-15-2012

Doesn't anyone think that having one switch in RPVST mode with loopguard on and the other switch in PVST mode with no loopguard might contribute to this issue?

Diego Acuna · ‎03-15-2012

Run this command on the interfaces:

(config-if)#no keepalive

Sent from Cisco Technical Support iPhone App

nibauramos · ‎03-16-2012

Hello,

updated the second IOS but everything remained the same....

After that I disabled the keepalives and it seems to be working.... I also passed the RPVST to PVST I don't think it made "sense" in my case to have one in rapid and the other in normal.

However I don't understand why any of those changes solved the problem...should I be disabling the keepalives? Isn't this only hidding some problem?

thank you for all the help so far!

Carlos Ramos

Pekka Majuri · ‎03-16-2012

Carlos,

I think, you better to monitor the interface bit-rates in you network switches, to obtain, that you do not have a data-loop going to edge-swithes (exspecially if you do have an hub in your network, or you have a windows servers, which are doing a kind of bridging deginition and multiple nic as load-balancing.) A loop can also occur with old Blade-chassises, where you might have switches (running either IEEE spanning tree 802.1d 802.1s (mst). Dataloops may occur easily, when there are a lot of redundant paths and older type (gen 2 and gen 3 ) blade switches in them, running a cirruits over the backplane L2 switch interconnections ( which may count their spanning-tree topoly by default only with Vlan 1)..

without knowing your infrastructure, there is a lot of difficulties to say, which all devices may affect to your STP domain,

and which devices do not take part (even they deliver data flows through it). Therefore the monitoring your

interface loads, if you have a data-loop, you might see very constant load (e.g. a gigabit shitches like 3560 and 3750 can easily deliver in gigabit ports a data-loop up ~ 500-600 M, but quite normaly the data loops are using bandwidth more or less ~100M (due the fact of HUB somewhere). But if you are running HSRP routers in your as L3 gateways for server segments. the 100M data-loop will cause them smelting ! much earlier....

The other element of the STP is to know exactly if you filter the BPDUs, what you are going to resolve by filters...

I have seen dataloops also in environments using old Ethchannel (mode on) when the other end is not running at all the ethchannel (separe ports on the other end)... in some changes the even the PAgP has failed also (in fiber links, when the UDLD were disabled, or not used)...

If you are using PVST+ with extend system-id (i.e. root priorities are n * 4096 + VLAN #, in cases you shold look if you have in multiple vlans same root priority (the lowes is bext and when the larger vlan is using same priority than a lower one, you might see wrong root priorities... ..)...