07-19-2021 06:22 PM
Hi
Recently we met same problems on different many nexus 7010 switches different software version sup2e
The problem happened suddenly and we found many log ,such as:
1 nexus 7010 receive from many interface (portchannel) udld echo empty , and err disable these interfaces
2 nexus 7010 set many interface into stp dispute status.
3 nexus 7010 bridge assurance action put interface to disable
4 nexus 7010 can't receive LACP PDU , so put physical interface to down
the status is not stable ,may be STP Dispute status restore to clear status and put into distue status again .
because this problem found on many interfaces and many nexus 7010, I think that is not a physical UniDirectional condition
I think nexus 7010 control plane may be busy or happened something ,can't sendout stp bdpu,lacp pdu,UDLD packet we monitor session one of nexus 7010 control plane traffic , found some burst traffic during prolem happened , I have a question :
nexus 7010 has control plane protection , why this happened?
is there any possible some traffic hit control plane and cause control plane busy?
we didn't find loop and traffic storm during problem and nexus 7010 cpu problem(maybe)
cu lose some confidence on nexus 7010 , we try to solve this prolem but happend randomly.
Thank you
Tom
07-19-2021 11:15 PM
What can happen is that high amount of traffic hits the CoPP (control plane protection). Once a class of traffic exceeds the threshold, then all traffic matching that class will be dropped. You should monitor the CoPP and see if you see violated counter, in the problematic VDC:
N7K# show policy-map interface control-plane | i "class|conform|violated"
<snip>
class-map copp-class-critical(match-any)
conformed 123126534 bytes; action: transmit
violated 0 bytes; action: drop
</snip
To clear the statistics, use " clear copp statistics " command.
I would recommend monitoring the value before and after the issue occurs to see which class of traffic is impacted.
Stay safe.
Sergiu
07-19-2021 11:34 PM
I see many drop in default class , but we didn't find drop packet in udld class on both 2 nexus 7k . why the other side nexus 7k seems lost udld packet . if udld packet didn't violate copp udld class , I think there were no lose of udld packets.
07-20-2021 11:33 AM
what NXOS version you have on your 7k?
12-28-2021 04:48 PM
sorry for the delay
nx-os version is 7.2(0)D1(1)
we just change lacp to mode on between two nexus 7K, but there is still a access-switch running lacp mode active to nexus 7k , still has problem ,
I had capture packet on nexus 7k problem found some weired behaviour of nexus 7k , please see attatchment 1
nexus 7kA 17:d1:85 is vlan 998 stp root , he send bpdu , you can see packet number 778 send from nexus 7kA control plane to downside switch, but after that suddenly, nexus 7kA can't send bpdu for a moment , from packet 778 to packet 34354 , from 1.13884 second to 53.205664 second , we can't find bpdu for vlan 892 send out from nexus 7kA control plane ,so at mean time ,there are many stp dispute found on many interfaces
we want to figure out the reason , why nexus 7kA can't send bpdu during problem
but because we are using F2e line card , F2E line card insert a shim header we can't decode this 0xf001 message , we found after packer 778 nexus 7k A control plane receive some bpdu from downside switch , bu these bpdu in wireshark mark as protocol 0xf001, f2e line card insert a shim header , we can't read these packet you can read this from attachment picture 0xf001
I will write a new message on this board
thank you
Tom
12-28-2021 06:07 PM
I try to figure out the reason of this prolem , I summary this case
1 two nexus 7k switch connect by one portchannel 1 (E1/3 1/4 3/1), not vpc , just normal portchannel
2 connect many cisco access-switch , and swtich and switch( connect and manage by another customer, we can't login) I found at least 3 layer layer2 trunk access switch ( I don't like it)
many stp vlan root config on nexus 7ka , 7kB config some vlan stp root
3 sometimes usually nexus 7KA (because many stp root config on nexus 7kA) report
4 we capture some packet on nexus 7kA control plane found some thing: first lacp
(1) from atthachment lacp1 picture you can find , nexus 7kA (mac c2:d8 is nexus 7k A 3/1 d16e is 1/3 d16f is 1/4) send lacp many times , but didn't receive any reply
(2)after 219 second ( attachment lacp2), nexus 7kB send a lacp packet d63e is nexus 7B E1/3 interface mac but from this picture you can see this packet mark protocol is 0xf001, because we use f2e line card , nexus 7k instert a f2e shim header , we can't decode this packet , I guess is a lacp pdu
(3) we just put lacp to mode on ,not mode active ,
(4) we found input discard increment on portchannel 1 interface (10g), and we replace downside access swtich from 1G to 10G
(5)but actually haven't solve this prolem
5 now see spanning tree dispute problem
(1) form attachment bpdu1 , you can see nexus 7k(stp root) send bpdu to other switches.but sometimes , nexus 7kA stop send root bpdu(you can see from packet 778 to packet 34353 ( mac 17:d1:85 is nexus 7ka interface mac 1.13884 second to 53.205664second), there are no bpdu packet send out from nexus 7k A control plane
(2)from attachment bpdu2 , you can see after packet 778 ,before packet 34353 , there are many bpdu packet send from other switch , but these packet (again) marked as protocol 0xf001, I can't decode these 0xf001 packet
(3) in the same time , nexus 7ka report many spanningtree lost root and receive sup bpdu from downside access switch, and nexus 7kA report many interface (incould connect port channle1 between two nexus 7K) detect spanning tree dispute
(4) spanning tree dispute probolem last for 5- 10 miniutes , and gone , and happend 1 month later again, may be some days
(5) we can see process netstack stp tx drop
show system internal pktmgr client
Client uuid: 303, 2 filters, pid 11909
Filter 0: EthType 0x4242, Dmac 0180.c200.0000,
Rx: 8817170, Drop: 0
Filter 0: EthType 0x010b, Snap 267,Dmac 0100.0ccc.cccd,
Rx: 965892481, Drop: 0
Options: TO 0, Flags 0x1, AppId 0, Epid 0
Ctrl SAP: 171
Total Data tags : 1 Data tag 1: 131088
Total Rx: 974709651, Drop: 0, Tx: 1599926009, Drop: 924505
6 this is weired ,
(1) lacp problem, it seemed nexus 7ka control plane send out lacp but not received
(2) stp dispute prolem: nexus 7kA controlplane didn't send bpdu for sometime.
I think maybe there are some stp event happened during problem , this problem afftect vlan stp root on nexus 7kA ,not affect vlan stp root on nexus 7kB, I think vlan number is more on nexus 7kA and spanning bpdu hit nexus 7kA switch and cause nexus 7kA stp process stuck for sometime , I found interface on nexus 7K B interface stp role and status changed rapdily , but I can't read 0xf001 don't know these packet
but why affect lacp , spanning block status affect LACP send from nexus 7Kb to 7KA
by the way we found many cisco access 3750 switch running in high cpu process hulc led process is high, we found different ios maybe lower hulc led process cpu , we try to figure out this problem , may access swtich affect spanning tree running , and maybe nexus7ka-switca-swtiechb-switchc, that switchc has some problem , afftect stp in the whole network .
How can I decode protocl 0xf001 in wireshark?
Thank you
Tom
01-04-2022 12:47 AM
Hi Sergiu,
After many months check , we found the reason why Nexus 7k stp dispute prolem
I found micro burst during proble, please see attachment
we have seen different attack method during several cases recently ,such as
1 tcp sync and firewall didn't config null route ,and cause little layes3 loop between nexus 7k and fw
2 ntp monlist attack
3 icmp scan
during micro burst we can find above attack traffic hitting control plane and nexus 7k can't send such as bpdu , receive lacp nomarlly
after micro burst attack traffic disappear , everythinkg is working fine .
we are trying to isolate attack traffic by using acl to deny layer3 loop, ntp monlist ,icmp hit nexus 7k self interface
I have a question why nexus copp can't protect cpu resource on nexus 7k control plane and cause process stuck on nexus 7k control plane , and I haven't found nexus 7k high cpu situation .
thank you
Tom
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide