10-14-2011 07:18 AM - edited 03-07-2019 02:48 AM
We have a DCN (Data Communication Network) for managing Customer End Devices (Flexihybrid Microwave). For past few days we are facing DCN visibilities issues.
Our main concern is network availability.
All switch have access port with specific VLANs. MSTP is configured on switches. Router-on-stick is configured for inter-Vlan Routing.
End devices connected to each switch port is Flexihybrid Microwave NE's ring(around 2000 NE in network) having three interfaces which have RSTP configured for their own ring.
We are facing frequent STP transitions in switches every 15-20 seconds, and at the same time visibility of End Devices( Flexihybrid NE's) is lost.
As we have more than 2000 NE's in network, chances of miscabling is very high. So, L1 (physical) & L2 loops are common in our network and we want to make our switch more stable to address L1/L2 loops.
On checking logs, we found some errors messages:
Router-Patna#
!
interface FastEthernet0/0.280
encapsulation dot1Q 400
ip address 10.203.123.161 255.255.255.224
no snmp trap link-status
!
switch3750MARWARI#
082860: Sep 23 10:17:56: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082861: Sep 23 10:17:56: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/21 and port Gi2/0/24
082862: Sep 23 10:17:57: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082863: Sep 23 10:17:58: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/21 and port Gi2/0/24
Sep 23 10:18:12: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:18:53: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:20:40: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:21:17: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/21. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
082864: Sep 23 10:23:21: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
082865: Sep 23 10:23:27: %SW_MATM-4-MACFLAP_NOTIF: Host 001b.d4da.e850 in vlan 400 is flapping between port Gi2/0/24 and port Gi2/0/21
Sep 23 10:35:09: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:36:54: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:40:42: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Sep 23 10:43:36: %STORM_CONTROL-3-FILTERED: A Broadcast storm detected on Gi2/0/18. A packet filter action has been applied on the interface. (switch3750MARWARI-2)
Switch_Ranchi3750#
!
interface GigabitEthernet2/0/6
description ****3G JamshedpurRing-09(JHJAM-26) connection*****
switchport access vlan 400
switchport mode access
no keepalive
!
GigabitEthernet2/0/6 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 04fe.7f82.d106 (bia 04fe.7f82.d106)
Description: ****3G JamshedpurRing-09(JHJAM-26) connection*****
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:23, output 00:00:03, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 359000 bits/sec, 130 packets/sec <<<<<<<<<<<<<<<<<<<<<<<<<<
5 minute output rate 1000 bits/sec, 1 packets/sec
17085096 packets input, 3677896538 bytes, 0 no buffer
Received 15412303 broadcasts (0 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 7512423 multicast, 390 pause input
0 input packets with dribble condition detected
7388738 packets output, 539793229 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
On further analysis, MAC ID :Host 001b.d4da.e850, is MAC of Router Fa0/0 connected with Marwari3750 switch.
we found L1 physical loop on port Switch_Ranchi3750# interface GigabitEthernet2/0/6.
After which problem MAC flap messages stopped but our problem of visibility lost not solved.
We found our switch ports to transtition between STP modes from forwarding to blocking to forwarding every few seconds.
So, enabled debug MSTP roles transitions.
Switch_Ranchi3750#
003118: 1w1d: MST[0]: updt roles, received superior bpdu on St1
003119: 1w1d: MST[0]: St1 is now root port
003120: 1w1d: MST[0]: Gi1/0/24 is now designated
003121: 1w1d: MST[1]: updt roles, CIST reconcile on St1
003122: 1w1d: MST[0]: updt roles, received superior bpdu on St1
003123: 1w1d: MST[0]: St1 is now designated
003124: 1w1d: MST[1]: updt roles, CIST reconcile on St1
003125: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24
003126: 1w1d: MST[0]: Gi1/0/24 is now root port
003127: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24
003128: 1w1d: MST[0]: updt roles, received superior bpdu on St1
003129: 1w1d: MST[0]: St1 is now root port
003130: 1w1d: MST[0]: Gi1/0/24 is now designated
003131: 1w1d: MST[1]: updt roles, CIST reconcile on St1
003132: 1w1d: MST[0]: updt roles, received superior bpdu on St1
003133: 1w1d: MST[1]: updt roles, CIST reconcile on St1
003134: 1w1d: MST[0]: updt roles, received superior bpdu on St1
003135: 1w1d: MST[1]: updt roles, CIST reconcile on St1
003136: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/20
003137: 1w1d: MST[0]: Gi1/0/20 is now root port
003138: 1w1d: MST[0]: St1 is now designated
003139: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/20
003140: 1w1d: MST[1]: Gi1/0/20 is now master port (derived from CIST port)
003141: 1w1d: pm_vp_list_set_stp_state: !vpd || ps == vpd->stpState (vpd=OK)
003142: 1w1d: pm_vp_list_set_stp_state Gi1/0/20(376): ps(forwarding) == vpd->stpState already!
003143: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/17
003144: 1w1d: MST[0]: Gi1/0/17 is now root port
003145: 1w1d: MST[0]: Gi1/0/20 is now designated
003146: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/17
003147: 1w1d: MST[1]: Gi1/0/17 is now master port (derived from CIST port)
003148: 1w1d: MST[1]: Gi1/0/20 is now designated port
003149: 1w1d: pm_vp_list_set_stp_state: !vpd || ps == vpd->stpState (vpd=OK)
003150: 1w1d: pm_vp_list_set_stp_state Gi1/0/17(207): ps(forwarding) == vpd->stpState already!
003151: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24
003152: 1w1d: MST[0]: Gi1/0/24 is now root port
003153: 1w1d: MST[0]: Gi1/0/17 is now designated
003154: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24
003155: 1w1d: MST[1]: Gi1/0/17 is now designated port
003156: 1w1d: MST[0]: updt roles, received superior bpdu on Gi1/0/24
003157: 1w1d: MST[1]: updt roles, CIST reconcile on Gi1/0/24
switch3750MARWARI#
267957: 21w3d: MST[0]: updt roles, received superior bpdu on St1
267958: 21w3d: MST[1]: updt roles, CIST reconcile on St1
267959: 21w3d: MST[0]: updt roles, received superior bpdu on St1
267960: 21w3d: MST[1]: updt roles, CIST reconcile on St1
267961: 21w3d: MST[0]: updt roles, received superior bpdu on St1
267962: 21w3d: MST[1]: updt roles, CIST reconcile on St1
267963: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/14
267964: 21w3d: MST[0]: Gi1/0/14 is now root port
267965: 21w3d: MST[0]: St1 is now designated
267966: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/14
267967: 21w3d: MST[1]: Gi1/0/14 is now master port (derived from CIST port)
267968: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/15
267969: 21w3d: MST[0]: Gi1/0/15 is now root port
267970: 21w3d: MST[0]: Gi1/0/14 is now designated
267971: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/15
267972: 21w3d: MST[1]: Gi1/0/14 is now designated port
267973: 21w3d: MST[1]: Gi1/0/15 is now master port (derived from CIST port)
267974: 21w3d: MST[0]: updt roles, received superior bpdu on Gi1/0/16
267975: 21w3d: MST[0]: Gi1/0/16 is now root port
267976: 21w3d: MST[0]: Gi1/0/15 is now designated
267977: 21w3d: MST[1]: updt roles, CIST reconcile on Gi1/0/16
267978: 21w3d: MST[1]: Gi1/0/15 is now designated port
267979: 21w3d: MST[1]: Gi1/0/16 is now master port (derived from CIST port)
At the same time, we were getting STP changes in all swicthes.
All these logs suggest that our switch was getting BPDU packets from connecting FlexiHybrid NE’s which was affecting our switch STP topology to change frequently. But STP configured on FlexiHybrid are for their own Flexi Rings and has no role in DCN switch STP and vice versa.
Now, considering affect of Flexihybrid STP packets(BPDU) affecting STP of DCN switches, we considered enabling other STP enhanced features to overcome this issue.
We can use this command on all access ports connecting to Flexihybrid NE’s which will prevent the interface from sending or receiving BPDUs. This is a good option and can isolate Flexihybrid STP with DCN switch STP completely but we are unsure of consequences of enabling this feature on all access switch ports if some accidental loop is formed between Flexihybrid rings and switch. Suggestions on its implementation is needed.
We enabled root guard on all access ports on per port basis. Root guard does not allow the port to become an STP root port, so the port is always STP−designated. If a better BPDU arrives on this port, root guard does not take the BPDU into account and elect a new STP root which was happening in our case. As soon as we applied Root Guard on All Access Ports, our DCN became stable and no further frequent STP transitions found.
Switch_Ranchi3750(config-if-range)#spanning-tree guard root
Switch_Ranchi3750(config-if-range)#
Oct 7 11:24:46.335 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/1. (Switch_Ranchi3750-2)
Oct 7 11:24:46.352 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/2. (Switch_Ranchi3750-2)
Oct 7 11:24:46.360 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/3. (Switch_Ranchi3750-2)
Oct 7 11:24:46.368 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/4. (Switch_Ranchi3750-2)
Oct 7 11:24:46.377 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/5. (Switch_Ranchi3750-2)
Oct 7 11:24:46.393 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/6. (Switch_Ranchi3750-2)
Oct 7 11:24:46.402 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/7. (Switch_Ranchi3750-2)
Oct 7 11:24:46.410 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/8. (Switch_Ranchi3750-2)
Oct 7 11:24:46.419 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/9. (Switch_Ranchi3750-2)
Oct 7 11:24:46.427 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/10. (Switch_Ranchi3750-2)
Oct 7 11:24:46.435 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/11. (Switch_Ranchi3750-2)
Oct 7 11:24:46.444 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/13. (Switch_Ranchi3750-2)
Oct 7 11:24:46.452 BST: %SPANTREE-2-ROOTGUARD_CONFIG_CHANGE: Root guard enabled on port GigabitEthernet2/0/14. (Switch_Ranchi3750-2)
Also, Bhagalpur3750 and Muz3750 switches become unreachable from NOC although traffic was running through it and switch was pinging. We tried Local login through CONSOLE port in Bhagalpur3750 and found memory resource problem may be because of too many STP messages, after giving hard reboot to switch both became reachable.
SWITCH Bhaglapur 3750:
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
004107: Oct 7 09:26:09.525: %SYS-2-MALLOCFAIL: Memory allocation of 4120 bytes failed from 0x9D2A08, alignment 0
Pool: Processor Free: 10613572 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Spanning Tree", ipl= 0, pid= 197
-Traceback= D667B8 D66F04 1595DD4 15982F4 1598558 178821C 9D2A0C 86CCAC 14B336C 1490C1C 1490EE8 149109C 14918F8 1491C98 148D80C 14937A8
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
%% Low on memory; try again later
004108: Oct 7 09:26:51.443: %SYS-2-MALLOCFAIL: Memory allocation of 4120 bytes failed from 0x9D2A08, alignment 0
Pool: Processor Free: 10613592 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Spanning Tree", ipl= 0, pid= 197
-Traceback= D667B8 D66F04 1595DD4 15982F4 1598558 178821C 9D2A0C 86CCAC 14B336C 1490C1C 1490EE8 149109C 14918F8 1491C98 148D80C 14937A8
%% Low on memory; try again later
Now, our observation showed that when we have L1 loop at access port, a lot of broacast & multicast storm is detected and after some time our switch STP fails and switch start to elect root port. During this period visibility is lost.
Is it possible to disable this port as soon as some L1 loop traffic is observed. We have keepalive enabled but it takes some time to detect keepalive detect(10sec), our switch become unstable before that. Switch utlization rises to high 70-80 percent.
What can be the causes of these events in network and what is the best solution to overcome these issues. ??
I am attaching all logs, tech support outputs taken during our trobleshooting periods.
Thanks,
Nishant
10-14-2011 07:40 AM
Also, rootguard disables port and keep it in inconsistent mode, unless & untill superior BPDU stop coming from port. what if superior BPDU keep coming from port. It will keep it disabled. Is it possible to keep traffic moving from port and keep it in inconsistent mode at same time in such situations?
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: