cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7594
Views
10
Helpful
27
Replies

Puzzle - Impossible MAC Flap and Random Forward/Blocking SPANTREE

mriksman
Level 1
Level 1

Hi,

 

Firstly, please bare with me - I am certainly no expert, but I am the custodian of our beastly network, which covers a large, remote area utilising fiber and Wi-Max.

 

The (simplified) diagram of the bits I am trying to understand is as follows;

MacFlap_Architecture.JPG

 

Randomly, 3 or 4 times a day, I get the following events across the above switches;

SAEQL250126/03/2019 6:02:46%SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blockingKhepri Link
SAEQL250126/03/2019 6:02:46%SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwardingKhepri Link
SUCRL20126/03/2019 6:02:46%SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from forwarding to blockingCrossover SUCRL201 to SUCRL202
SUCRL20126/03/2019 6:02:46%SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from blocking to forwardingCrossover SUCRL201 to SUCRL202
SAEQL250126/03/2019 6:02:47%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1SA-ESVT-SRV1A FTE Green
SACRL20126/03/2019 6:02:47%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Fa0/5 and port Gi0/1SA-ESVT-SRV1A FTE Green
SACRL20126/03/2019 6:02:47%SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23Zabbix
SAEQL250126/03/2019 6:02:49%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1SA-ESVT-SRV1A FTE Green
SAEQL250226/03/2019 6:02:49%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6SA-ESVT-SRV1A FTE Green
SACRL20126/03/2019 6:02:51%SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/23 and port Fa0/5Zabbix
SACRL20126/03/2019 6:02:51%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Fa0/5 and port Gi0/1SA-ESVT-SRV1A FTE Green
 26/03/2019 6:02:51Timeout on PLC Comms (Alarmed on SA-ESVT-SRV1A) 
SACRL20126/03/2019 6:03:07%SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23Zabbix

 

I enabled debugging on SAEQL2501, and gathered the following during on of these 'events';

Mar 27 16:46:27: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:27: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:27: STP: Data 000003027C8000D824BDA33300000000008000D824BDA3330080010000140001000F00
Mar 27 16:46:27: STP: MST0 Gi1/0/4:0000 03 02 7C 8000D824BDA33300 00000000 8000D824BDA33300 8001 0000 1400 0100 0F00
Mar 27 16:46:27: MST[0]: Gi1/0/4 state change forwarding -> blocking
Mar 27 16:46:27: STP SW: Gi1/0/4 new blocking req for 1 vlans
Mar 27 16:46:27: Found no corresponding dummy port for instance 0, port_id 128.4
Mar 27 16:46:27: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking
Mar 27 16:46:27: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:27: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:27: STP: Data 000003027D8000001A2FD6B380000000008000001A2FD6B38080010000140001000F00
Mar 27 16:46:27: STP: MST0 Gi1/0/4:0000 03 02 7D 8000001A2FD6B380 00000000 8000001A2FD6B380 8001 0000 1400 0100 0F00
Mar 27 16:46:28: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:28: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:28: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:28: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:28: MST[0]: Gi1/0/4 state change blocking -> forwarding
Mar 27 16:46:28: STP SW: Gi1/0/4 new forwarding req for 1 vlans
Mar 27 16:46:28: Found no corresponding dummy port for instance 0, port_id 128.4
Mar 27 16:46:28: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwarding
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: STP SW: VLAN1: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN8: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN16: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN24: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN192: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN195: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN210: topology change over - this bridge is root
Mar 27 16:46:28: MST[0]: flushing Gi1/0/1
Mar 27 16:46:28: MST[0]: flushing Gi1/0/2
Mar 27 16:46:28: MST[0]: flushing Gi1/0/5
Mar 27 16:46:28: MST[0]: flushing Gi1/0/6
Mar 27 16:46:28: MST[0]: flushing Gi1/0/8
Mar 27 16:46:28: MST[0]: flushing Gi1/0/9
Mar 27 16:46:28: MST[0]: flushing Gi1/0/10
Mar 27 16:46:28: MST[0]: flushing Gi1/0/12
Mar 27 16:46:28: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:28: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:28: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:28: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:29: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:29: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:29: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:29: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: MST[0]: flushing Gi1/0/1
Mar 27 16:46:29: MST[0]: flushing Gi1/0/2
Mar 27 16:46:29: MST[0]: flushing Gi1/0/5
Mar 27 16:46:29: MST[0]: flushing Gi1/0/6
Mar 27 16:46:29: MST[0]: flushing Gi1/0/8
Mar 27 16:46:29: MST[0]: flushing Gi1/0/9
Mar 27 16:46:29: MST[0]: flushing Gi1/0/10
Mar 27 16:46:29: MST[0]: flushing Gi1/0/12
Mar 27 16:46:30: %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1
Mar 27 16:46:44: %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6
Mar 27 16:47:03: MST[0]: tc timer expired

 

The two servers (SA-ESVT-SRV1A and SA-ESVT-SRV1B) are redundant, so only one is communicating with these external PLCs at one time. The PLC communications get disrupted during these 'events' - only briefly - but they log an alarm. When I fail over to server SA-ESVT-SRV1B I receive the same set of logs, but with the MAC address of the SA-ESVT-SRV1B instead of SA-ESVT-SRV1A;

SAEQL250127/03/2019 16:46:27%SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blockingKhepri Link
SAEQL250127/03/2019 16:46:27%SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwardingKhepri Link
SUCRL20127/03/2019 16:46:27%SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from forwarding to blockingCrossover SUCRL201 to SUCRL202
SUCRL20127/03/2019 16:46:28%SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from blocking to forwardingCrossover SUCRL201 to SUCRL202
SACRL20127/03/2019 16:46:28%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Fa0/5 and port Gi0/1SA-ESVT-SRV1B FTE Green
SAEQL250227/03/2019 16:46:32%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6SA-ESVT-SRV1B FTE Green
 27/03/2019 16:46:34Timeout on PLC Comms        (Note there was none on the events below) 
SACRL20127/03/2019 16:46:37%SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23Zabbix
SAEQL250227/03/2019 16:46:44%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6SA-ESVT-SRV1B FTE Green
SACRL20127/03/2019 16:46:44%SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi0/1 and port Fa0/5SA-ESVT-SRV1B FTE Green
SACRL20127/03/2019 16:46:50%SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23Zabbix

 

The PLCs are located on port Fa0/4 of SACRL201 (not shown in the architecture diagram attached).

 

 

I have spent days looking at this, but I just cannot work out why;

  • Is there a MAC Flap? There are definitely no loops. I can't see how it is physically possible for any of these devices to flap between these ports/
    • On the VMware ESXi server, we have many VMs; two are Linux OS - built using the same template. Why is only one of them ("Zabbix") appear to have a MAC Flap issue?
    • Why would having a server set as master (SA-ESVT-SRV1A/B) have any impact on whether it gets caught MAC Flapping?
  • Why does it all start with the fiber crossover link between SUCRL201 and SUCRL202 changing from Forwarding to Blocking and back again? And why does it cause SAEQL2501's port Gi1/0/4 to do the same? All switches run MST, and the Wi-Max "Khepri Link" has STP turned off. Only our Cisco switches have STP turned on.

 

I'm hoping someone will read this and have a 'Ah hah!' moment. Because I sure haven't...

 

Thanks. Let me know what config files you need (what command to run to get that) and which switches.

27 Replies 27

Hi Paul,

All Cisco switches (SAEQL2501, SAEQL2502, SACRL201, SACRL202, SUCRL201, SUCRL202) are configured for one MST region that includes all VLANs.

Name []
Revision 0 Instances configured 1

Instance Vlans mapped
-------- ---------------------------------------------------------------------
0 1-4094
-------------------------------------------------------------------------------

The wireless link does NOT have STP enabled. See architecture diagram from my first post. 

 

All edge/access ports are set appropriately for the device that is connected to it. I do not believe we have bpdu-guard enabled for these ports.

 

SUCRL201#show spanning-tree summary
Switch is in mst mode (IEEE Standard)
Root bridge for: none
Extended system ID is enabled
Portfast Default is disabled
PortFast BPDU Guard Default is disabled
Portfast BPDU Filter Default is disabled
Loopguard Default is disabled
EtherChannel misconfig guard is enabled
UplinkFast is disabled
BackboneFast is disabled

 

Root port has been manually set for SAEQL2501 as priority 4096. This is what SUCRL201 sees (it sees the Root switch is over the Wi-Max link)

Spanning tree enabled protocol mstp
Root ID Priority 4096
Address 000a.b84e.4600
Cost 0
Port 1 (FastEthernet0/1)
Hello Time 1 sec Max Age 20 sec Forward Delay 15 sec

 

If "Khepri" (SUCRL201 and SUCRL202) are the only switches/devices on VLAN 16 - shouldn't SUCRL201 be set as the Root for VLAN 16? And SAEQL2501 the Root for all other VLANs? 

Hello
All access-ports should be negated from initiating any stp recalculation, meaning enabling stp portsfast globally ( which you clearly show isnt enabled on that switch) or interface specific, Same apply s to bpduguard

I would suggest though to make your root switch the root for ALL vlans and have stp enabled for all vlans also, if you have disabled stp say for a vlan then you will eventually incur a loop for that vlan if the switches have multiple L2 interconnects.

The stp flapping between the SACRL501 and SURL201 could suggest stp is blocking the port because a possible loop is being introduced, Hence i asked the question is it possible the wifi is bridged between SACRL201 and SUCRL201

However it could just mean what you are seeing stp recalculation due to the wifi link intermittently failing causing the flapping which would then involve SUCRL201-SUCRL2012 in stp negotiation


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

All access ports that have an end device (edge port) connected have Portfast enabled manually, on a port by port basis. 

 

Any port (like Ga1/0/4) that only has a single VLAN on it, appears to automatically set itself to access mode (but not edge)

Name: Gi1/0/4
Switchport: Enabled
Administrative Mode: dynamic auto
Operational Mode: static access
Administrative Trunking Encapsulation: negotiate
Operational Trunking Encapsulation: native
Negotiation of Trunking: On
Access Mode VLAN: 16 (SU)

 

Whilst a port that carries multiple VLANs appear to have been manually set to trunking mode

Name: Gi1/0/2
Switchport: Enabled
Administrative Mode: trunk
Operational Mode: trunk
Administrative Trunking Encapsulation: dot1q
Operational Trunking Encapsulation: dot1q
Negotiation of Trunking: On
Access Mode VLAN: 1 (default)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: enabled

 So from what I can tell, portfast is set correctly - only to access edge, but not on access mode ports that have one VLAN.

 

There is absolutely no way there is any loop along the Wi-Max link. It is only as it is in the diagram I have attached.

SAEQL2501-->SA-NSM2-STN1-->SA-NSM2-AP1   -----Wi-Max----->  SU-NSM2-AP1-->SU-NSM2-STN1-->SUCRL201

 

I can cause the MAC flap to occur by simply restarting either SA-NSM2-STN1 or SA-NSM2-AP1.

Hello


@mriksman wrote:

Hi,

 

Firstly, please bare with me - I am certainly no expert, but I am the custodian of our beastly network, which covers a large, remote area utilising fiber and Wi-Max.

 

The (simplified) diagram of the bits I am trying to understand is as follows;

MacFlap_Architecture.JPG

 

Randomly, 3 or 4 times a day, I get the following events across the above switches;

SAEQL2501 26/03/2019 6:02:46 %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking Khepri Link
SAEQL2501 26/03/2019 6:02:46 %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwarding Khepri Link
SUCRL201 26/03/2019 6:02:46 %SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from forwarding to blocking Crossover SUCRL201 to SUCRL202
SUCRL201 26/03/2019 6:02:46 %SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from blocking to forwarding Crossover SUCRL201 to SUCRL202
SAEQL2501 26/03/2019 6:02:47 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1 SA-ESVT-SRV1A FTE Green
SACRL201 26/03/2019 6:02:47 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Fa0/5 and port Gi0/1 SA-ESVT-SRV1A FTE Green
SACRL201 26/03/2019 6:02:47 %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23 Zabbix
SAEQL2501 26/03/2019 6:02:49 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1 SA-ESVT-SRV1A FTE Green
SAEQL2502 26/03/2019 6:02:49 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6 SA-ESVT-SRV1A FTE Green
SACRL201 26/03/2019 6:02:51 %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/23 and port Fa0/5 Zabbix
SACRL201 26/03/2019 6:02:51 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a6.cc65 in vlan 195 is flapping between port Fa0/5 and port Gi0/1 SA-ESVT-SRV1A FTE Green
  26/03/2019 6:02:51 Timeout on PLC Comms (Alarmed on SA-ESVT-SRV1A)  
SACRL201 26/03/2019 6:03:07 %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23 Zabbix

 

I enabled debugging on SAEQL2501, and gathered the following during on of these 'events';

Mar 27 16:46:27: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:27: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:27: STP: Data 000003027C8000D824BDA33300000000008000D824BDA3330080010000140001000F00
Mar 27 16:46:27: STP: MST0 Gi1/0/4:0000 03 02 7C 8000D824BDA33300 00000000 8000D824BDA33300 8001 0000 1400 0100 0F00
Mar 27 16:46:27: MST[0]: Gi1/0/4 state change forwarding -> blocking
Mar 27 16:46:27: STP SW: Gi1/0/4 new blocking req for 1 vlans
Mar 27 16:46:27: Found no corresponding dummy port for instance 0, port_id 128.4
Mar 27 16:46:27: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking
Mar 27 16:46:27: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:27: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:27: STP: Data 000003027D8000001A2FD6B380000000008000001A2FD6B38080010000140001000F00
Mar 27 16:46:27: STP: MST0 Gi1/0/4:0000 03 02 7D 8000001A2FD6B380 00000000 8000001A2FD6B380 8001 0000 1400 0100 0F00
Mar 27 16:46:28: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:28: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:28: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:28: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:28: MST[0]: Gi1/0/4 state change blocking -> forwarding
Mar 27 16:46:28: STP SW: Gi1/0/4 new forwarding req for 1 vlans
Mar 27 16:46:28: Found no corresponding dummy port for instance 0, port_id 128.4
Mar 27 16:46:28: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwarding
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: STP SW: VLAN1: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN8: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN16: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN24: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN192: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN195: topology change over - this bridge is root
Mar 27 16:46:28: STP SW: VLAN210: topology change over - this bridge is root
Mar 27 16:46:28: MST[0]: flushing Gi1/0/1
Mar 27 16:46:28: MST[0]: flushing Gi1/0/2
Mar 27 16:46:28: MST[0]: flushing Gi1/0/5
Mar 27 16:46:28: MST[0]: flushing Gi1/0/6
Mar 27 16:46:28: MST[0]: flushing Gi1/0/8
Mar 27 16:46:28: MST[0]: flushing Gi1/0/9
Mar 27 16:46:28: MST[0]: flushing Gi1/0/10
Mar 27 16:46:28: MST[0]: flushing Gi1/0/12
Mar 27 16:46:28: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:28: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:28: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:28: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:28: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:29: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:29: STP: Data 00000302791000000AB84E4600000000001000000AB84E460080010000140001000F00
Mar 27 16:46:29: STP: MST0 Gi1/0/4:0000 03 02 79 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Mar 27 16:46:29: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: MST[0]: port Gi1/0/4 received internal tc
Mar 27 16:46:29: MST[0]: flushing Gi1/0/1
Mar 27 16:46:29: MST[0]: flushing Gi1/0/2
Mar 27 16:46:29: MST[0]: flushing Gi1/0/5
Mar 27 16:46:29: MST[0]: flushing Gi1/0/6
Mar 27 16:46:29: MST[0]: flushing Gi1/0/8
Mar 27 16:46:29: MST[0]: flushing Gi1/0/9
Mar 27 16:46:29: MST[0]: flushing Gi1/0/10
Mar 27 16:46:29: MST[0]: flushing Gi1/0/12
Mar 27 16:46:30: %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/6 and port Gi1/0/1
Mar 27 16:46:44: %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6
Mar 27 16:47:03: MST[0]: tc timer expired

 

The two servers (SA-ESVT-SRV1A and SA-ESVT-SRV1B) are redundant, so only one is communicating with these external PLCs at one time. The PLC communications get disrupted during these 'events' - only briefly - but they log an alarm. When I fail over to server SA-ESVT-SRV1B I receive the same set of logs, but with the MAC address of the SA-ESVT-SRV1B instead of SA-ESVT-SRV1A;

SAEQL2501 27/03/2019 16:46:27 %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking Khepri Link
SAEQL2501 27/03/2019 16:46:27 %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from blocking to forwarding Khepri Link
SUCRL201 27/03/2019 16:46:27 %SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from forwarding to blocking Crossover SUCRL201 to SUCRL202
SUCRL201 27/03/2019 16:46:28 %SPANTREE-6-PORT_STATE: Port Gi0/1 instance 0 moving from blocking to forwarding Crossover SUCRL201 to SUCRL202
SACRL201 27/03/2019 16:46:28 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Fa0/5 and port Gi0/1 SA-ESVT-SRV1B FTE Green
SAEQL2502 27/03/2019 16:46:32 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6 SA-ESVT-SRV1B FTE Green
  27/03/2019 16:46:34 Timeout on PLC Comms        (Note there was none on the events below)  
SACRL201 27/03/2019 16:46:37 %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23 Zabbix
SAEQL2502 27/03/2019 16:46:44 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi1/0/1 and port Gi1/0/6 SA-ESVT-SRV1B FTE Green
SACRL201 27/03/2019 16:46:44 %SW_MATM-4-MACFLAP_NOTIF: Host 2880.23a4.f63d in vlan 195 is flapping between port Gi0/1 and port Fa0/5 SA-ESVT-SRV1B FTE Green
SACRL201 27/03/2019 16:46:50 %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2994.c8e2 in vlan 195 is flapping between port Fa0/5 and port Fa0/23 Zabbix

 

The PLCs are located on port Fa0/4 of SACRL201 (not shown in the architecture diagram attached).

 

 

I have spent days looking at this, but I just cannot work out why;

  • Is there a MAC Flap? There are definitely no loops. I can't see how it is physically possible for any of these devices to flap between these ports/
    • On the VMware ESXi server, we have many VMs; two are Linux OS - built using the same template. Why is only one of them ("Zabbix") appear to have a MAC Flap issue?
    • Why would having a server set as master (SA-ESVT-SRV1A/B) have any impact on whether it gets caught MAC Flapping?
  • Why does it all start with the fiber crossover link between SUCRL201 and SUCRL202 changing from Forwarding to Blocking and back again? And why does it cause SAEQL2501's port Gi1/0/4 to do the same? All switches run MST, and the Wi-Max "Khepri Link" has STP turned off. Only our Cisco switches have STP turned on.

Its possible the ZABBIC VM device is continuously  being seen between SACRL201 and SAEQL2501, then SACRL202  and SAEQL2502 - Why have you turned off STP and what ports/vlans  is this applied to?

As for the root port transitioning SAEQL2501 -SUCRL201 - Are your WAP's bridging between SACRL and SUCRL?

 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

STP is turned off by default on the Ubiquiti Wi-Max devices by default. I haven't turned them on (the entire setup of our network has been by amateurs - the cost of working in Egypt). To be fair, I too am an amateur...

 

What benefits would there be having STP turned on these Wi-Max devices if they are simply point-to-point connections?

 

Ummm, I think 'yes', they are Bridging. Here is the configuration for the SAEQL2501 and SA-NSM2-STN1;

SA-NSM2-STN1_Config.png

Hi,

As per your diagram it is not required to involed wifi solution in th STP but I am checking Device Uptime. Did you noticed? 

 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Adding to what the other guys have said, the wireless bridge is most likley the cause of what you are experiencing, if the bridge has all vlans allowed or a portion on the same bridge you will see the above.
There will be some configuration setting which allows you to determine the vlans on the bridge, this will allow you to control the what is bridged and what is not, which should in theory resolve what you are seeing.

The only VLAN that runs across Ga1/0/4 on SAEQL2501 (which is connected to the wireless bridge) is VLAN 16

SAEQL2501#show vlan

VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Gi1/0/3, Gi1/0/5, Gi1/0/7, Gi1/0/11, Gi1/0/12
8 Tarek active
16 SU active Gi1/0/4
24 ESXi active Gi1/0/10
192 NPCS active
195 QASR active Gi1/0/6
210 PMS active Gi1/0/8

The show int switchport command shows

Name: Gi1/0/4
Switchport: Enabled
Administrative Mode: dynamic auto
Operational Mode: static access
Administrative Trunking Encapsulation: negotiate
Operational Trunking Encapsulation: native
Negotiation of Trunking: On
Access Mode VLAN: 16 (SU)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: enabled
Voice VLAN: none
Administrative private-vlan host-association: none
Administrative private-vlan mapping: none
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk Native VLAN tagging: enabled
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk associations: none
Administrative private-vlan trunk mappings: none
Operational private-vlan: none
Trunking VLANs Enabled: ALL
Pruning VLANs Enabled: 2-1001
Capture Mode Disabled
Capture VLANs Allowed: ALL

 

Hello


@mriksman wrote:

The only VLAN that runs across Ga1/0/4 on SAEQL2501 (which is connected to the wireless bridge) is VLAN 16The show int switchport command shows

Name: Gi1/0/4
Switchport: Enabled
Administrative Mode: dynamic auto
Operational Mode: static access
Administrative Trunking Encapsulation: negotiate
Operational Trunking Encapsulation: native
Negotiation of Trunking: On
Access Mode VLAN: 16 (SU)

If that the case make that port an access port and not a dynamic port so your not leaving it to negotiate its own operation state.


int gig0/4
switchport mode access

Also post the the following:
sh spanning-tree int gig1/0/4 detail


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Done.

SAEQL2501#show int Gi1/0/4 switchport
Name: Gi1/0/4
Switchport: Enabled
Administrative Mode: static access
Operational Mode: static access

 

Requested output;

SAEQL2501#sh spanning-tree int gig1/0/4 detail
Port 4 (GigabitEthernet1/0/4) of MST0 is designated forwarding
Port path cost 200000, Port priority 128, Port Identifier 128.4.
Designated root has priority 4096, address 000a.b84e.4600
Designated bridge has priority 4096, address 000a.b84e.4600
Designated port id is 128.4, designated path cost 0
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 2
Link type is point-to-point by default, Internal
BPDU: sent 22333, received 12

 

For completeness, on the 'other' end (SUCRL201);

SUCRL201#sh spanning-tree int fa0/1 detail
Port 1 (FastEthernet0/1) of MST0 is root forwarding
Port path cost 200000, Port priority 128, Port Identifier 128.1.
Designated root has priority 4096, address 000a.b84e.4600
Designated bridge has priority 4096, address 000a.b84e.4600
Designated port id is 128.4, designated path cost 0
Timers: message age 2, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default, Internal
BPDU: sent 6594, received 1949173

 

I'm still stuck on how Root and BPDUs work...

I saw this with debug on SAEQL2501;

Mar 27 16:46:27: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Mar 27 16:46:27: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Mar 27 16:46:27: STP: Data 000003027C8000D824BDA33300000000008000D824BDA3330080010000140001000F00
Mar 27 16:46:27: STP: MST0 Gi1/0/4:0000 03 02 7C 8000D824BDA33300 00000000 8000D824BDA33300 8001 0000 1400 0100 0F00
Mar 27 16:46:27: MST[0]: Gi1/0/4 state change forwarding -> blocking
Mar 27 16:46:27: STP SW: Gi1/0/4 new blocking req for 1 vlans
Mar 27 16:46:27: Found no corresponding dummy port for instance 0, port_id 128.4
Mar 27 16:46:27: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking

It received a BPDU. My understanding is, that a Root switch shouldn't receive BPDUs? Unless there is some Root contention?

 

So when the Wi-Max link gets disrupted, is SUCRL201 nominating itself as Root, and then when the link comes back moments later, it sends a BPDU which ends up at the proper/original Root switch, which causes both switches to work out how's Root (Blocking/Forwarding).

Device Uptime? As in it appears to have only been online a short while? That's because I restarted the SA-NSM2-STN1 and SA-NSM2-AP2 to test.

 

See attached. This is our event logger Graylog, and collects Syslog and other events from various devices. The events are in reverse order (look at the timestamps), so start from the bottom and scroll up.

The first test (at the bottom) is restarting SA-NSM2-STN1, which is connected directly to SAEQL2501 GA1/0/4. You can see the port go through the blocking/learning/forwarding. Then, we are hit with the MAC flapping events.

The second test just after, is restarting SA-NSM2-AP1. There is also the MAC flapping, but I don't see any blocking/forwarding events. Note in this second test, the devices connected to SACRL201 that SA-ESVT-SRV1 talks to suffer a communication drop out (8.39.14am)

mriksman
Level 1
Level 1

Maybe we should first focus on the Root dispute that occurs when there is a disruption in the Wi-Max link between SAEQL2501 and SUCRL201. See attached debug from SAEQL2501.

 

Apr 5 08:15:58: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Apr 5 08:15:58: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Apr 5 08:15:58: STP: Data 000003027C8000001A2FD6B380000000008000001A2FD6B38080010000140001000F00
Apr 5 08:15:58: STP: MST0 Gi1/0/4:0000 03 02 7C 8000001A2FD6B380 00000000 8000001A2FD6B380 8001 0000 1400 0100 0F00
Apr 5 08:15:58: MST[0]: Gi1/0/4 disputed
Apr 5 08:15:58: %SPANTREE-6-PORT_STATE: Port Gi1/0/4 instance 0 moving from forwarding to blocking
Apr 5 08:15:58: MST[0]:<RX- Gi1/0/4 inferior designated BPDU Prot:0 Vers:3 Type:2
Apr 5 08:15:58: MST[0]: Role :Desg Flags[AFL] Age:0 RemHops:19
Apr 5 08:15:58: MST[0]: CIST_root:32768.001a.2fd6.b380 Cost :0
Apr 5 08:15:58: MST[0]: Reg_root :32768.001a.2fd6.b380 Cost :20000
Apr 5 08:15:58: MST[0]: Bridge_ID:32768.d824.bda3.3300 Port_ID:32769
Apr 5 08:15:58: MST[0]: max_age:20 hello:1 fwdelay:15
Apr 5 08:15:58: MST[0]: V3_len:64 region: rev:0 Num_mrec: 0
Apr 5 08:15:58: STP: MST0 rx BPDU: config protocol = mstp, packet from GigabitEthernet1/0/4 , linktype IEEE_SPANNING , enctype 2, encsize 17
Apr 5 08:15:58: STP: enc 01 80 C2 00 00 00 D8 24 BD A3 33 01 00 69 42 42 03
Apr 5 08:15:58: STP: Data 00000302391000000AB84E4600000000001000000AB84E460080010000140001000F00
Apr 5 08:15:58: STP: MST0 Gi1/0/4:0000 03 02 39 1000000AB84E4600 00000000 1000000AB84E4600 8001 0000 1400 0100 0F00
Apr 5 08:15:58: MST[0]:<RX- Gi1/0/4 other BPDU Prot:0 Vers:3 Type:2
Apr 5 08:15:58: MST[0]: Role :Root Flags[FLTc] Age:0 RemHops:20
Apr 5 08:15:58: MST[0]: CIST_root: 4096.000a.b84e.4600 Cost :0
Apr 5 08:15:58: MST[0]: Reg_root : 4096.000a.b84e.4600 Cost :200000
Apr 5 08:15:58: MST[0]: Bridge_ID:32768.d824.bda3.3300 Port_ID:32769
Apr 5 08:15:58: MST[0]: max_age:20 hello:1 fwdelay:15
Apr 5 08:15:58: MST[0]: V3_len:64 region: rev:0 Num_mrec: 0
Apr 5 08:15:58: MST[0]: port Gi1/0/4 received internal tc
Apr 5 08:15:58: MST[0]: port Gi1/0/4 received internal tc

 

How can we prevent this? I can think of the following;

* Set SUCRL201/202 as a separate MST region for VLAN 16 and the Root as SUCRL201. All other VLANs to be part of the original MST region with its root as SAEQL2501. That way - there will never be a dispute if the comms are lost - each switch stays responsible for its own VLAN region. 

* Set some kind of delay before SUCRL201 decides to become Root. It seems even a drop out of a second or two is enough to cause it to decide it should become Root, and then moments later when the link comes healthy again, there is a Root dispute. So perhaps telling SUCRL201 to wait 5, 10 , 15 seconds before deciding itself the Root?

Review Cisco Networking products for a $25 gift card