cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3585
Views
0
Helpful
4
Replies

Nexus 7K Loop Detection Messages

mfarrenkopf
Level 1
Level 1

We had an incident with OTV on Sunday.  During OTV reconvergence, enough MAC addresses flopped around that the 7K (7700-series, running NX-OS 8.2(4)) decided it had had enough.  We logged the following messages:

 

2020 Apr 12 03:26:22.977 DCA-Dist-A %MTM-SLOT13-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 13 FE 1 - MAC MOVE Global threshold reached - Locking MAC Learn for 200 seconds
2020 Apr 12 03:26:22.978 DCA-Dist-A %MTM-SLOT13-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 13 FE 2 - MAC MOVE Global threshold reached - Locking MAC Learn for 210 seconds
2020 Apr 12 03:26:22.979 DCA-Dist-A %MTM-SLOT13-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 13 FE 3 - MAC MOVE Global threshold reached - Locking MAC Learn for 220 seconds
2020 Apr 12 03:26:22.979 DCA-Dist-A %MTM-SLOT13-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 13 FE 4 - MAC MOVE Global threshold reached - Locking MAC Learn for 230 seconds
2020 Apr 12 03:26:23.007 DCA-Dist-A %MTM-SLOT1-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 1 FE 2 - MAC MOVE Global threshold reached - Locking MAC Learn for 180 seconds
2020 Apr 12 03:26:23.008 DCA-Dist-A %MTM-SLOT1-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 1 FE 7 - MAC MOVE Global threshold reached - Locking MAC Learn for 190 seconds
2020 Apr 12 03:26:23.021 DCA-Dist-A %MTM-SLOT3-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 3 FE 1 - MAC MOVE Global threshold reached - Locking MAC Learn for 210 seconds
2020 Apr 12 03:26:23.022 DCA-Dist-A %MTM-SLOT3-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 3 FE 2 - MAC MOVE Global threshold reached - Locking MAC Learn for 220 seconds
2020 Apr 12 03:26:23.023 DCA-Dist-A %MTM-SLOT3-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 3 FE 3 - MAC MOVE Global threshold reached - Locking MAC Learn for 230 seconds
2020 Apr 12 03:26:23.023 DCA-Dist-A %MTM-SLOT3-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 3 FE 4 - MAC MOVE Global threshold reached - Locking MAC Learn for 240 seconds
2020 Apr 12 03:26:23.220 DCA-Dist-A %MTM-SLOT11-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 11 FE 2 - MAC MOVE Global threshold reached - Locking MAC Learn for 180 seconds
2020 Apr 12 03:26:23.221 DCA-Dist-A %MTM-SLOT11-2-MTM_MAC_MOVE_FE_LOCK_STATE: SLOT 11 FE 7 - MAC MOVE Global threshold reached - Locking MAC Learn for 190 seconds

 

I had a case open when this occurred before.  The TAC engineer told me the message applies to the entire module.  With that explanation, all MAC learning was blocked on all (presumably layer 2 only) ports for slots 1, 3, 11, and 13.

But I think the messages and responses are more nuanced than that.  I presume FE is "forwarding engine."  How can I tell what ports were actually impacted by these messages?  If FE IS "forwarding engine," is there a way to know which ports are associated with an FE?

4 Replies 4

Sergiu.Daniluk
VIP Alumni
VIP Alumni

Hello,

FE is indeed the forwarding engine (basically the ASIC). To see the mapping between FE and front panel ports, you can use this commands (X represent the module your are interested):

N7K# attach module X
module-X# show hardware internal dev-port-map
.. 
+-----------------------------------------------------------------------+
+----------------+++FRONT PANEL PORT TO ASIC INSTANCE MAP+++------------+
+-----------------------------------------------------------------------+
FP port |  PHYS | MAC_0 | L2LKP | L3LKP | QUEUE |SWICHF 
   1               0       0       0       0       0,1     
   2               0       0       0       0       0,1     
   3               1       1       1       1       0,1     
   4               1       1       1       1       0,1     

* FP port = front port

* MAC_0 = ASIC

If you are interested to see between which interfaces the MAC flapped, you can use the command:

N7K# show system internal l2fm l2dbg macdb address <MAC-address> vlan <VLAN-ID>

  Time                     If         Db Op                    Src Slot  FE-BMP  Local Remote Detail
Apr  4 14:27:13 2020:0      0x09040e4c 1  INSERT               0    0    0xffff 0     1       0x97     
Apr  4 14:27:13 2020:0      0x09040e4c 1  GWMAC_RSP_SENT_TO_AP 0    0    0xffff 0     0       0x97    

! to see the interface, translate the ifindex to interface number like this: 
N7K# show interface snmp-ifindex | grep 0x09040e4c 

Hope it helps,

Sergiu

Okay, so MAC_0 = ASIC = FE.

Thank you!

Before this there are a boatload of messages about MACs flapping between the OTV inside interfaces, so it's not a matter of trying to find out the MAC addresses that flapped, or the interfaces between which they flapped, or even why they flapped (OTV convergence).

I think my TAC engineer in November and I were talking past each other, trying to say the same thing and not succeeding.  Because it didn't seem logical that MAC flapping should affect all interfaces on the module, but a subset would make sense.

And it's my understanding this does *not* affect layer 3 interfaces, although it could potentially interrupt layer 3 communication if SVIs are used at the end of a layer 2 trunk.

We have a large VM environment, such that OTV reconvergence triggered this issue.  I'm under the impression this threshold cannot be changed.  But while I'm here, I'll ask:  can the threshold of MAC moves be changed?  If not, we can expect this to happen every time we have an OTV bump.


@mfarrenkopf wrote:

Okay, so MAC_0 = ASIC = FE.

Because it didn't seem logical that MAC flapping should affect all interfaces on the module, but a subset would make sense.

Which model of line card you have? Here is a list of how the ports distribution looks like:

Note***: for M1 and M2, the forwarding engine is a different component then MAC. F-series and M3 series uses SOC (switch on chip) architecture, which is in charge of a lot of features: from buffering, to forwarding lookups and ACLs and QoS etc.

 

m108.pngm132.png

m2.png

 

For M3, we have SOC (switch on chip) - in charge of forwarding engine:

 

m3.png

m3-2.pngm3100.png

All this images are copied from cisco live presentations from different years.

 


  can the threshold of MAC moves be changed?  If not, we can expect this to happen every time we have an OTV bump.

No. You cannot change the MAC move threshold. You can only enable/disable notification.

I think you should have a further look on what is the reason the mac move happened. Finding this, you will also be able to find a proper solution.

My expectations are, regardless of the size of your VM environment, to not have MAC moves. This is usually an indication of a loop, improper config or maybe even a bug.

 

Regards,

Sergiu


 

Hi Sergiu,

We have four data center distribution 7K-series equipment.  DCA has two 7700s.  DCB has two 7000s.  DCA is sup 2Es and all F3 cards.  DCB is sup 2Es and a mix of M2/M3 cards.  See the attached for an overview of our OTV implementation.  Po51 is the layer 2 vPC inside interface that connects the A-side OTV context with the primary (Core) context.  Po52 is the vPC for the B-side.

We implemented OTV using NX-OS 6.2.  At that time, OTV adjacencies were multicast-only.  The vPC peer link is the layer 2 interregion between distributions.  We do not have a direct layer 2 adjacency between OTV contexts.  We are now running 8.2(4) but have not otherwise updated the general design.  It has been recommended to us that we migrate to unicast adjacencies.

On Sunday, we experienced a process crash on DCB-Dist-A.  During this time, MAC addresses flapped from the Po51 vPC to Po52 due to loss of AED function on DCB-Dist-A.  When DCB-Dist-A recovered, the MAC addresses were flapping back.  The aggregate flapping caused the 7Ks to complain about the number of MAC address movements and caused the errors I included.  Although I only included messages from one 7K device, all four of them logged MAC move threshold violations.

The included design is based on the original OTV documentation and implementation back in 2014.  Cisco's design/recommendations may have been updated since then and we've not kept up.  We're looking at doing unicast adjacencies.  But regardless, as it exists today, DCB-Dist-A had a process crash, MACs flapped from Po51 to Po52 due to the AED transition, DCB-Dist-A recovered and MACs flapped back to Po51, which hit the MAC move threshold and the 7Ks logged the MAC learn locking messages.

I'm happy to listen to whatever additional input you can give me.  If you have recommendations on the design, I'm happy to listen to them.

Thank you,

Matt

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: