cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4341
Views
0
Helpful
1
Replies

L2 L1 Loop & STP Failure

nishant.nk
Level 1
Level 1

We have a DCN (Data Communication Network) for managing Customer End Devices (Flexihybrid Microwave). For past few days we are facing DCN visibilities issues.

Our main concern is network availability.

Topology_Diag.JPG

All switch have access port with specific VLANs. MSTP is configured on switches. Router-on-stick is configured for inter-Vlan Routing.

End devices connected to each switch port is Flexihybrid Microwave NE's ring(around 2000 NE in network) having three interfaces which have RSTP configured for their own ring.

flexi.JPG

We are facing frequent STP transitions in switches every 15-20 seconds, and at the same time visibility of End Devices( Flexihybrid NE's) is lost.

As we have more than 2000 NE's in network, chances of miscabling is very high. So, L1 (physical) & L2 loops are common in our network and we want to make our switch more stable to address L1/L2 loops.

On checking logs, we found some errors messages:

Router-Patna#
!
interface FastEthernet0/0.280
  encapsulation dot1Q 400
  ip address 10.203.123.161 255.255.255.224
  no snmp trap link-status
!

switch3750MARWARI#
  082860: Sep 23 10:17:56: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082861: Sep 23 10:17:56: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/21 and port Gi2/0/24
082862: Sep 23 10:17:57: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082863: Sep 23 10:17:58: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/21 and port Gi2/0/24
Sep 23 10:18:12: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:18:53: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:20:40: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:21:17: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
082864: Sep 23 10:23:21: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082865: Sep 23 10:23:27: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
Sep 23 10:35:09: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:36:54: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:40:42: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:43:36: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)

Switch_Ranchi3750#

!

interface GigabitEthernet2/0/6

  description ****3G JamshedpurRing-09(JHJAM-26) connection*****

  switchport access vlan 400

  switchport mode access

  no keepalive

!

GigabitEthernet2/0/6 is up, line protocol is up (connected)

   Hardware is Gigabit Ethernet, address is 04fe.7f82.d106 (bia 04fe.7f82.d106)

   Description: ****3G JamshedpurRing-09(JHJAM-26) connection*****

   MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,

      reliability 255/255, txload 1/255, rxload 1/255

   Encapsulation ARPA, loopback not set

   Keepalive not set

   Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX

   input flow-control is off, output flow-control is unsupported

   ARP type: ARPA, ARP Timeout 04:00:00

   Last input 00:00:23, output 00:00:03, output hang never

   Last clearing of "show interface" counters never

   Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

   Queueing strategy: fifo

   Output queue: 0/40 (size/max)

   5 minute input rate 359000 bits/sec, 130 packets/sec <<<<<<<<<<<<<<<<<<<<<<<<<<

   5 minute output rate 1000 bits/sec, 1 packets/sec

      17085096 packets input, 3677896538 bytes, 0 no buffer

      Received 15412303 broadcasts (0 multicasts)

      0 runts, 0 giants, 0 throttles

      0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

      0 watchdog, 7512423 multicast, 390 pause input

      0 input packets with dribble condition detected

      7388738 packets output, 539793229 bytes, 0 underruns

      0 output errors, 0 collisions, 1 interface resets

      0 babbles, 0 late collision, 0 deferred

      0 lost carrier, 0 no carrier, 0 PAUSE output

      0 output buffer failures, 0 output buffers swapped out

On further analysis, MAC ID :Host 001b.d4da.e850, is MAC of Router Fa0/0 connected with Marwari3750 switch.

we found L1 physical loop on port Switch_Ranchi3750# interface GigabitEthernet2/0/6.

After which problem MAC flap messages stopped but our problem of visibility lost not solved.

We found our switch ports to transtition between STP modes from forwarding to blocking to forwarding every few seconds.

So, enabled debug MSTP roles transitions.

Switch_Ranchi3750#

003118: 1w1d: MST[0]: updt roles, received superior bpdu on St1

003119: 1w1d: MST[0]: St1 is now root port

003120: 1w1d: MST[0]: Gi1/0/24 is now designated

003121: 1w1d: MST[1]: updt roles, CIST reconcile on St1

003122: 1w1d: MST[0]: updt roles, received superior bpdu on St1

003123: 1w1d: MST[0]: St1 is now designated

003124: 1w1d: MST[1]: updt roles, CIST reconcile on St1

003125: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24

003126: 1w1d: MST[0]: Gi1/0/24 is now root port

003127: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24

003128: 1w1d: MST[0]: updt roles, received superior bpdu on St1

003129: 1w1d: MST[0]: St1 is now root port

003130: 1w1d: MST[0]: Gi1/0/24 is now designated

003131: 1w1d: MST[1]: updt roles, CIST reconcile on St1

003132: 1w1d: MST[0]: updt roles, received superior bpdu on St1

003133: 1w1d: MST[1]: updt roles, CIST reconcile on St1

003134: 1w1d: MST[0]: updt roles, received superior bpdu on St1

003135: 1w1d: MST[1]: updt roles, CIST reconcile on St1

003136: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/20

003137: 1w1d: MST[0]: Gi1/0/20 is now root port

003138: 1w1d: MST[0]: St1 is now designated

003139: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/20

003140: 1w1d: MST[1]: Gi1/0/20 is now master port (derived from CIST port)

003141: 1w1d: pm_vp_list_set_stp_state: !vpd || ps == vpd->stpState (vpd=OK)

003142: 1w1d: pm_vp_list_set_stp_state Gi1/0/20(376): ps(forwarding) == vpd->stpState already!

003143: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/17

003144: 1w1d: MST[0]: Gi1/0/17 is now root port

003145: 1w1d: MST[0]: Gi1/0/20 is now designated

003146: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/17

003147: 1w1d: MST[1]: Gi1/0/17 is now master port (derived from CIST port)

003148: 1w1d: MST[1]: Gi1/0/20 is now designated port

003149: 1w1d: pm_vp_list_set_stp_state: !vpd || ps == vpd->stpState (vpd=OK)

003150: 1w1d: pm_vp_list_set_stp_state Gi1/0/17(207): ps(forwarding) == vpd->stpState already!

003151: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24

003152: 1w1d: MST[0]: Gi1/0/24 is now root port

003153: 1w1d: MST[0]: Gi1/0/17 is now designated

003154: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24

003155: 1w1d: MST[1]: Gi1/0/17 is now designated port

003156: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24

003157: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24

switch3750MARWARI#

267957: 21w3d: MST[0]: updt roles, received superior bpdu on St1

267958: 21w3d: MST[1]: updt roles, CIST reconcile on St1

267959: 21w3d: MST[0]: updt roles, received superior bpdu on St1

267960: 21w3d: MST[1]: updt roles, CIST reconcile on St1

267961: 21w3d: MST[0]: updt roles, received superior bpdu on St1

267962: 21w3d: MST[1]: updt roles, CIST reconcile on St1

267963: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/14

267964: 21w3d: MST[0]: Gi1/0/14 is now root port

267965: 21w3d: MST[0]: St1 is now designated

267966: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/14

267967: 21w3d: MST[1]: Gi1/0/14 is now master port (derived from CIST port)

267968: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/15

267969: 21w3d: MST[0]: Gi1/0/15 is now root port

267970: 21w3d: MST[0]: Gi1/0/14 is now designated

267971: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/15

267972: 21w3d: MST[1]: Gi1/0/14 is now designated port

267973: 21w3d: MST[1]: Gi1/0/15 is now master port (derived from CIST port)

267974: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/16

267975: 21w3d: MST[0]: Gi1/0/16 is now root port

267976: 21w3d: MST[0]: Gi1/0/15 is now designated

267977: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/16

267978: 21w3d: MST[1]: Gi1/0/15 is now designated port

267979: 21w3d: MST[1]: Gi1/0/16 is now master port (derived from CIST port)

At the same time, we were getting STP changes in all swicthes.

All these logs suggest that our switch was getting BPDU packets from connecting FlexiHybrid NE’s which was affecting our switch STP topology to change frequently. But STP configured on FlexiHybrid are for their own Flexi Rings and has no role in DCN switch STP and vice versa.

Now, considering affect of Flexihybrid STP packets(BPDU) affecting STP of DCN switches, we considered enabling other STP enhanced features to overcome this issue.

  • •1.  BPDU Filter:

We can use this command on all access ports connecting to Flexihybrid NE’s which will prevent the interface from sending or receiving BPDUs. This is a good option and can isolate Flexihybrid STP with DCN switch STP completely but we are unsure of consequences of enabling this feature on all  access switch ports if some accidental loop is formed between Flexihybrid rings and switch. Suggestions on its implementation is needed.

  • •2.  Root Guard

We enabled root guard on all access ports on per port basis. Root guard does not allow the port to become an STP root port, so the port is always STP−designated. If a better BPDU arrives on this port, root guard does not take the BPDU into account and elect a new STP root which was happening in our case. As soon as we applied Root Guard on All Access Ports, our DCN became stable and no further frequent STP transitions found.

Switch_Ranchi3750(config-if-range)#spanning-tree guard root

Switch_Ranchi3750(config-if-range)#

Oct  7 11:24:46.335 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/1. (Switch_Ranchi3750-2)

Oct  7 11:24:46.352 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/2. (Switch_Ranchi3750-2)

Oct  7 11:24:46.360 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/3. (Switch_Ranchi3750-2)

Oct  7 11:24:46.368 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/4. (Switch_Ranchi3750-2)

Oct  7 11:24:46.377 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/5. (Switch_Ranchi3750-2)

Oct  7 11:24:46.393 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/6. (Switch_Ranchi3750-2)

Oct  7 11:24:46.402 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/7. (Switch_Ranchi3750-2)

Oct  7 11:24:46.410 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/8. (Switch_Ranchi3750-2)

Oct  7 11:24:46.419 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/9. (Switch_Ranchi3750-2)

Oct  7 11:24:46.427 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/10. (Switch_Ranchi3750-2)

Oct  7 11:24:46.435 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/11. (Switch_Ranchi3750-2)

Oct  7 11:24:46.444 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/13. (Switch_Ranchi3750-2)

Oct  7 11:24:46.452 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/14. (Switch_Ranchi3750-2)

Also, Bhagalpur3750 and Muz3750 switches become unreachable from NOC although traffic was running through it and switch was pinging. We tried Local login through CONSOLE port in Bhagalpur3750 and found memory resource problem may be because of too many STP messages, after giving hard reboot to switch both became reachable.

SWITCH Bhaglapur 3750:

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

004107: Oct  7 09:26:09.525: %SYS-2-MALLOCFAIL: Memory allocation of 4120 bytes failed from 0x9D2A08, alignment 0

Pool: Processor  Free: 10613572  Cause: Memory fragmentation

Alternate Pool: None  Free: 0  Cause: No Alternate pool

-Process= "Spanning Tree", ipl= 0, pid= 197

-Traceback= D667B8 D66F04 1595DD4 15982F4 1598558 178821C 9D2A0C 86CCAC 14B336C 1490C1C 1490EE8 149109C 14918F8 1491C98 148D80C 14937A8

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

%% Low on memory; try again later

004108: Oct  7 09:26:51.443: %SYS-2-MALLOCFAIL: Memory allocation of 4120 bytes failed from 0x9D2A08, alignment 0

Pool: Processor  Free: 10613592  Cause: Memory fragmentation

Alternate Pool: None  Free: 0  Cause: No Alternate pool

-Process= "Spanning Tree", ipl= 0, pid= 197

-Traceback= D667B8 D66F04 1595DD4 15982F4 1598558 178821C 9D2A0C 86CCAC 14B336C 1490C1C 1490EE8 149109C 14918F8 1491C98 148D80C 14937A8

%% Low on memory; try again later

Now, our observation showed that when we have L1 loop at access port, a lot of broacast & multicast storm is detected and after some time our switch STP fails and switch start to elect root port. During this period visibility is lost.

Is it possible to disable this port as soon as some L1 loop traffic is observed. We have keepalive enabled but it takes some time to detect keepalive detect(10sec), our switch become unstable before that. Switch utlization rises to high 70-80 percent.

What can be the causes of these events in network and what is the best solution to overcome these issues. ??

I am attaching all logs, tech support outputs taken during our trobleshooting periods.

Thanks,

Nishant

1 Reply 1

nishant.nk
Level 1
Level 1

Also, rootguard disables port and keep it in inconsistent mode, unless & untill superior BPDU stop coming from port. what if superior BPDU keep coming from port. It will keep it disabled. Is it possible to keep traffic moving from port and keep it in inconsistent mode at same time in such situations?

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card