cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

804
Views
0
Helpful
6
Replies
fly
Explorer
Explorer

Nexus 7K udld echo empty, spanning treee dispute,bridge assurance, LACP problem

Hi

     Recently we met same problems on different many nexus 7010 switches different software version sup2e

     The problem happened suddenly and we found many log ,such as:

     1 nexus 7010 receive from many interface  (portchannel) udld echo empty , and err disable these interfaces

      2 nexus 7010 set many interface into  stp dispute status.

      3 nexus 7010 bridge assurance action put interface to disable

      4 nexus 7010 can't receive LACP PDU , so put physical interface to down

     the status is not stable ,may be STP Dispute status restore to clear status and put into distue status again .

     because this problem found on many interfaces and many nexus 7010, I think that is not a physical UniDirectional condition 

    I think nexus 7010 control plane may be busy or happened something ,can't sendout stp bdpu,lacp pdu,UDLD packet we monitor session one of nexus 7010 control plane traffic , found some burst traffic during prolem happened  , I have a question :

     nexus 7010 has control plane protection , why this happened?

     is there any possible some traffic hit control plane and cause control plane busy?

    

     we didn't find loop and traffic storm during problem and nexus 7010 cpu problem(maybe)

     cu lose some confidence on nexus 7010 , we try to solve this prolem  but happend randomly.

Thank you

Tom

      

6 REPLIES 6
Sergiu.Daniluk
VIP Advisor

What can happen is that high amount of traffic hits the CoPP (control plane protection). Once a class of traffic exceeds the threshold, then all traffic matching that class will be dropped.  You should monitor the CoPP and see if you see violated counter, in the problematic VDC:

N7K# show policy-map interface control-plane | i "class|conform|violated"
<snip>
class-map copp-class-critical(match-any)
conformed 123126534 bytes; action: transmit
violated 0 bytes; action: drop
</snip

To clear the statistics, use " clear copp statistics " command.

I would recommend monitoring the value before and after the issue occurs to see which class of traffic is impacted.

 

Stay safe.

Sergiu

I see many drop in default class , but we didn't find drop packet in  udld class on both 2 nexus 7k  . why the other side nexus 7k seems lost udld packet . if udld packet didn't violate copp udld class , I think there were no lose of udld packets.

 

  

what NXOS version you have on your 7k?

sorry for the delay

nx-os version is 7.2(0)D1(1)

we just change lacp to mode on between two nexus 7K,  but there is still a access-switch running lacp mode active to nexus 7k , still has problem , 

I had capture packet on nexus 7k problem found some weired behaviour of nexus 7k , please see attatchment 1

nexus 7kA 17:d1:85 is vlan 998 stp root  , he send bpdu , you can see packet number 778 send from nexus 7kA control plane to downside switch, but after that  suddenly, nexus 7kA can't send bpdu for a moment , from packet 778 to packet 34354 , from 1.13884 second to 53.205664 second , we can't find bpdu for vlan 892 send out from nexus 7kA control plane  ,so at mean time ,there are many stp dispute found on many interfaces

we want to figure out the reason  , why nexus 7kA can't send bpdu during problem  

but because we are using F2e line card , F2E line card insert a shim header we can't decode this 0xf001 message , we found after packer 778 nexus 7k A control plane receive some bpdu from downside switch , bu these bpdu in wireshark mark as protocol 0xf001,   f2e line card insert a shim header , we can't read these packet  you can read this from attachment picture 0xf001

I will write a new message on this board 

 

thank you

Tom

I  try to figure out the reason of this prolem , I summary this case

1 two nexus 7k switch connect by one portchannel  1 (E1/3 1/4 3/1),  not vpc , just normal portchannel

2  connect many cisco access-switch , and swtich  and switch( connect and manage by  another customer, we can't login) I found at least 3 layer layer2 trunk access switch ( I don't like it)

    many stp vlan root config on nexus 7ka  ,  7kB config some vlan stp root

 

3 sometimes usually nexus 7KA (because many stp root config on nexus 7kA) report

  •       no lacp  from other nexus 7K B or access swtich, and shutdown physical interface , 
  • found many interface report dispute stp dispute  at same time and then clear and dectect dispute again 
  • the problem past for 5-10 minutes , and problem gone
  • found mac flapping on access-switch during problem and root disappear and found new root 
  • found  arp duplicate for vrrp hsrp vip, and from nexus show spanning-tree internal event-history found nexus 7kB lost spanning tree root  and receive new root from downside access-switch
  • above spanning tree problem happend for vlan root on nexus 7kA,  some vlan root on nexus 7kB haven't found root roaming.

 

4  we capture some packet on nexus 7kA control plane  found some thing: first lacp

    (1) from atthachment lacp1 picture you can find , nexus 7kA (mac c2:d8 is nexus 7k A 3/1 d16e is 1/3 d16f is 1/4) send lacp many times , but didn't receive any reply

    (2)after 219 second  ( attachment lacp2), nexus 7kB send a lacp packet d63e is nexus 7B E1/3 interface mac but from this picture you can see this packet mark protocol is 0xf001, because we use f2e line card , nexus 7k instert a f2e shim header , we can't decode this packet , I guess is a lacp pdu

   (3) we just put lacp to mode on ,not mode active , 

   (4) we found input discard increment on portchannel 1 interface (10g), and we replace downside access swtich from 1G to 10G 

   (5)but actually haven't solve this prolem

5 now see spanning tree dispute problem

    (1) form attachment bpdu1  , you can see nexus 7k(stp root) send bpdu to other switches.but sometimes , nexus 7kA stop send root bpdu(you can see from packet 778 to packet 34353 ( mac 17:d1:85 is nexus 7ka interface mac 1.13884 second to 53.205664second),  there are no bpdu packet send out from nexus 7k A control plane

    (2)from attachment bpdu2 , you can see after packet 778 ,before packet 34353 , there are many bpdu packet send from other switch  , but these packet (again) marked as protocol 0xf001, I can't decode these 0xf001 packet 

     (3) in the same time , nexus 7ka report many spanningtree lost root and receive sup bpdu from downside access switch,  and nexus 7kA report many interface (incould connect port channle1 between two nexus 7K) detect spanning tree dispute

 

    (4) spanning tree dispute probolem  last for 5- 10 miniutes  , and gone , and happend 1 month later again, may be some days

   (5) we can see process netstack stp tx drop

        show system internal pktmgr client

       Client uuid: 303, 2 filters, pid 11909
Filter 0: EthType 0x4242, Dmac 0180.c200.0000,
Rx: 8817170, Drop: 0
Filter 0: EthType 0x010b, Snap 267,Dmac 0100.0ccc.cccd,
Rx: 965892481, Drop: 0
Options: TO 0, Flags 0x1, AppId 0, Epid 0
Ctrl SAP: 171
Total Data tags : 1 Data tag 1: 131088
Total Rx: 974709651, Drop: 0, Tx: 1599926009, Drop: 924505

6  this is weired ,

    (1) lacp problem,  it seemed nexus 7ka control plane send out lacp but not received

    (2) stp dispute prolem:  nexus 7kA controlplane didn't send  bpdu for sometime.

 

 I think maybe there are some stp event happened during problem ,   this problem afftect vlan stp root on nexus 7kA  ,not affect vlan stp root on nexus 7kB, I think vlan number is more on nexus 7kA    and spanning bpdu hit nexus 7kA switch and cause nexus 7kA stp process stuck for sometime  , I found interface on nexus 7K B interface stp role and status changed rapdily ,     but I can't read 0xf001 don't know these packet 

      but why affect lacp , spanning block status affect LACP send from nexus 7Kb to 7KA

 

 

 

 

   by the way we found many cisco access 3750 switch running in high cpu  process hulc led process is high, we found different ios maybe lower hulc led process cpu , we try to figure out this problem , may access swtich affect spanning tree running , and maybe nexus7ka-switca-swtiechb-switchc,   that switchc has some problem ,  afftect stp in the whole network .

 

   How can I decode protocl 0xf001 in wireshark?

 Thank you

Tom

      

Hi Sergiu,

    After many months check , we found the reason why Nexus 7k stp dispute prolem  

  I found micro burst during proble, please see attachment

   we have seen different attack method during several cases recently ,such as

   1 tcp sync  and firewall didn't config null route  ,and cause little layes3 loop between nexus 7k and fw

    2 ntp monlist attack

    3 icmp scan 

 

   during micro burst we can find above attack traffic hitting control plane  and nexus 7k can't send such as bpdu , receive lacp nomarlly

 

after micro burst  attack traffic disappear   , everythinkg is working fine .

    we are trying to isolate attack traffic  by using acl  to deny layer3 loop, ntp monlist ,icmp hit nexus 7k self interface

  I have a question  why nexus copp can't protect cpu resource on nexus 7k control plane and cause process stuck on nexus 7k control plane , and I haven't found nexus 7k high cpu situation .

   thank you

Tom