cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
21503
Views
40
Helpful
18
Replies

%FWM-6-MAC_MOVE_NOTIFICATION: MAC flapping between vPC host port and vPC peer-link

mahbvh
Level 1
Level 1

Hi,

This has been bugging me for some time. We have VMware ESXi connected in vPC mode on a pair of N5K (through FEX). Dozens of time per day we were seeing the following errors :

2011 Nov 18 16:24:34 Canal_auber_5548_6258 %FWM-2-STM_LOOP_DETECT: Loops detected in the network among ports Po100 and Po40 vlan 395 - Disabling dynamic learn notificationsfor 180 seconds

This used to happen only on 2 ESXi running VDI payload (where a lot of VMs are instanciated). Since this was causing a lot of disruption to others serveurs connected to the N5Ks we decided to take both ESXi out until we know why this happens.

Then we enabled mac-move notification to see whether the problem was still there. Although we don't have anymore the LOOP message, we still have this (still on an ESXi running VDI payload) :

Nov 20 07:07:08 canal_auber_5548-6258 : 2011 Nov 20 07:07:08 CET: %FWM-6-MAC_MOVE_NOTIFICATION: Host 0050.5693.0416 in vlan 395 is flapping between port Po100 and port Po31

What I don't get is why the N5K would complain about seeing a MAC address flapping between the a vPC member port and the vPC peer link (I espect seeing virtual machines MAC on both sides since the ESXi is load balancing based on IP hash on both sides of the vPC)

Here is part of the configuration (same on both N5K). po100 is the vPC link, po40 is the vPC to one of the ESXi, all ESXi have the same configuration) :

interface Ethernet104/1/1

  description Slot40-A1 ESX-vmnic

  switchport mode trunk

  switchport trunk allowed vlan 15,18,65,71,200,312-314,317-321,325-326,328,330,332-341,343,349-350,352-357,363,369,374,376-381,383-385,390-4

01,411-412,440,460,462,468-469,475,996-999,2024,2026,2701,2801

  spanning-tree port type edge trunk

  channel-group 40

interface port-channel40

  description Slot40 ESX

  switchport mode trunk

  vpc 40

  switchport trunk allowed vlan 15,18,65,71,200,312-314,317-321,325-326,328,330,332-341,343,349-350,352-357,363,369,374,376-381,383-385,390-4

01,411-412,440,460,462,468-469,475,996-999,2024,2026,2701,2801

  spanning-tree port type edge trunk

  speed 10000

interface port-channel100

  description VPC Link

  switchport mode trunk

  vpc peer-link

  spanning-tree port type network

  speed 10000

And some log output :

N5K# sho vpc brief

vPC Peer-link status

---------------------------------------------------------------------

id   Port   Status Active vlans

--   ----   ------ --------------------------------------------------

1    Po100  up     1,13,15,18,65,71,200,312-314,317-321,325-326,328,3

                   30,332-341,343,349-350,352-357,363,369,374,376-386

                   ,390-401,411-412,440,460,462,468-469,475,996-999,1

                   002-1005,2024,2026,2701,2801

vPC status

----------------------------------------------------------------------------

id     Port        Status Consistency Reason                     Active vlans

------ ----------- ------ ----------- -------------------------- -----------

--- snip ---

40     Po40        up     success     success                    15,18,65,71

                                                                 ,200,312-31

                                                                 4,317-321,3

                                                                 25-326,328,

                                                                 330,332....

Any idea would be greatly appreciated.

Regards,

Vincent.

18 Replies 18

Hello Prashanth,

This certainly looks promising, especially the related bug information part :

N5K Bcast Packets flooded out of ingress vPC, vPCM out of sync with FWM.
Symptoms On the impacted device, the port-channel belonging to the VPC is considered non-vpc internally, causing unknown unicast traffic arriving from VPC peer to be forwarded towards the local port-channel Typical symptoms include: -Broadcast traffic is seen to be flooded back out of the ingress vPC (but by the peer-device). -MAC address tables in correctly point towards a north or eastbound vPC for southbound attached hosts/devices. Conditions The specific triggers are not currently known.Workaround Currently the only way to recover from this is via a reload

Thanks for the hint !

Vincent.

Hello Prashanth,

FYI, we encountered another variant of this bug which impacted our platform even more (massive flooding because vPC wouldn't learn MAC addresses), so we finally decided to upgrade. As a consequence, it seems that MAC flapping does not occur anymore.

Thank you for your help on this case.

Cheers,

Vincent.

Hi,

A little more on this. The root cause of the problem was that although the port channels were up, the vPC status was down due to an inconsistent state as shown below.

Canal_auber_5548_6258# show vpc brief | inc Po30

30     Po30        up     success     success                    15,18,200,3


Canal_auber_5548_6258# sho int po30

port-channel30 is up

vPC Status: Down, vPC number: 30 [packets forwarded via vPC peer-link]

  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,

Canal_auber_5548_6258# show vpc consistency-parameters vpc 30


    Legend:

        Type 1 : vPC will be suspended in case of mismatch


Name                        Type  Local Value            Peer Value

-------------               ----  ---------------------- -----------------------

STP Port Type               1     Edge Trunk Port        Edge Trunk Port

STP Port Guard              1     None                   None

STP MST Simulate PVST       1     Default                Default

Shut Lan                    1     No                     No

VTP trunk status            2     Enabled                Enabled

mode                        1     -                      on

Native Vlan                 1     -                      1

Port Mode                   1     -                      trunk

MTU                         1     -                      1500

Duplex                      1     -                      full

Speed                       1     -                      10 Gb/s


Canal_auber_5548_6258# show int po30 switchport

  Operational Mode: trunk

  Access Mode VLAN: 1 (default)

  Trunking Native Mode VLAN: 1 (default)

Canal_auber_5548_6258# show port-channel summary | inc Po30

30    Po30(SU)    Eth      NONE      Eth103/1/1(P)

After the upgrade to 5.0(3)N1(1c) the problem remained I guess because the VPC were not reset during the ISSU, however a reboot of the hosts attached to the faulty VPCs solved it.

Since then our VPCs are stable and all is well !

Cheers,

Vincent.

 

 

Do you know if this solution also applies in the case that you have trunk ports instead Ethernet Channels? 

 

interface Ethernet100/1/12

switchport mode trunk

switchport access vlan 408

switchport trunk allowed vlan 408, 435, 472, 484-485

duplex full