10-26-2022 04:59 AM
Hi All,
We are having a weird issue and while we wait on Cisco TAC to evaluate I thought of polling the community.
In this set up we have two ISR router in a HSRP group. They are connected to two Catalyst 9300 with a port channel between them.
We configured an SVI on both switches and added it to the HSRP group. Immediately after adding the HSRP you can feel the terminal getting laggy and slow.
We are connected to the switches via SSH on the management port.
We also had an event where one of the switch became active, but there were other routers with higher priority which was active. During this time we couldn’t poll the switch in question with SNMP and ICMP was not responding (management port). The device did not reboot.
10-28-2022 07:41 AM - edited 10-28-2022 07:41 AM
2 14 Forus traffic Yes 4000 4000 4223501 5320
there is drop in forus traffic, forus is any traffic direct to CPU
dmac = Router_MAC
DIP = Router_IP
so I ask you make double check
IP you add in each VLAN and also mac address of SVI (show standby)
check this in both SW
10-28-2022 08:14 AM
I confirmed the IP I added to the vlan is correct and unused
! SWITCH1
interface Vlan100
ip address 3.3.3.252 255.255.255.0
standby 0 ip 3.3.3.254
standby 0 priority 50
standby 0 preempt delay minimum 120
standby 0 authentication md5 key-string 7 BlahBlahBlah
load-interval 30
shutdown
end
!SWITCH2
interface Vlan100
ip address 3.3.3.253 255.255.255.0
standby 0 ip 3.3.3.254
standby 0 priority 40
standby 0 preempt delay minimum 120
standby 0 authentication md5 key-string 7 BlahBlahBlah
load-interval 30
shutdown
end
!RTR1
interface GigabitEthernet0/0/3
ip address 3.3.3.250 255.255.255.0
no ip proxy-arp
standby 0 ip 3.3.3.254
standby 0 priority 130
standby 0 preempt
standby 0 authentication md5 key-string 7 BlahBlahBlah
standby 0 track 11 decrement 40
standby 0 track 22 decrement 40
load-interval 30
negotiation auto
!RTR2
interface GigabitEthernet0/0/3
ip address 3.3.3.251 255.255.255.0
no ip proxy-arp
standby 0 ip 3.3.3.254
standby 0 priority 120
standby 0 preempt delay minimum 120
standby 0 authentication md5 key-string 7 BlahBlahBlah
standby 0 track 11 decrement 40
standby 0 track 22 decrement 40
load-interval 30
negotiation auto
10-28-2022 11:39 AM
sh int vlan x | i bia
Hardware is EtherSVI, address is xxxx.xxxx.xxxx <<- check this MAC address for each VLAN SVI
10-28-2022 11:58 AM
They all have unique mac address
switch1#show int vlan 100 | i bia
Hardware is Ethernet SVI, address is cc7f.7649.8951 (bia cc7f.7649.8951)
switch2#show int vlan 100 | i bia
Hardware is Ethernet SVI, address is cc7f.7653.d451 (bia cc7f.7653.d451)
rtr1#show int GigabitEthernet0/0/1 | i bia
Hardware is ISR4451-X-4x1GE, address is c4f7.d59d.bd21 (bia c4f7.d59d.bd21)
rtr2#show int GigabitEthernet0/0/1 | i bia
Hardware is ISR4451-X-4x1GE, address is d478.9b22.62e1 (bia d478.9b22.62e1)
10-28-2022 12:46 PM
one more step before find solution
A protocol with a huge number of CPU-bound packets may impact other protocols in the same class, as some of these protocols share the same policer. For example, Address Resolution Protocol (ARP) shares 4000 hardware policers with an array of host protocols like Telnet, Internet Control Message Protocol (ICMP), SSH, FTP, and SNMP in the system-cpp-police-forus class. If there is an ARP poisoning or an ICMP attack, hardware policers start throttling any incoming traffic that exceeds 4000 packets per second to protect the CPU and the overall integrity of the system. As a result, ARP and ICMP host protocols are dropped, along with any other host protocols that share the same class.
sorry to say that but some cisco info. is not arrange in way that make Engineer easy to handle
anyway
this from cisco Doc. it descripe what you face, loss SNMP loss SSH, and we already see that forus traffic Queue have drop.
but still last pieces what make HSRP make this Queue full ??
can this because not config GW for host ? this make all host send arp ask MAC address, and this make Queue full this fast.
can we check this point.
and still since he have HSRP flapping we need to check also HSRP config again. (deep look this time)
10-28-2022 12:53 PM
I'm not following the last request/post
10-28-2022 01:12 PM
the CPU have Queue called forus traffic, we see drop in this Queue,
the Queue receive traffic to CPU for any protocol destination as I mention before is Router/L3SW itself.
2 14 Forus traffic Yes 4000 4000 4223501 5320
the Queue when it full it start drop frame include HSRP & SNMP (you mention that you face issue with SNMP),
BUT
here is Q, what make Queue Full ??
10-28-2022 02:10 PM
there is debug command but it effect SW CPU (high CPU utilize)
and I would like use debug as last option.
but
Switch#show platform software fed [switch] active punt rates interfaces
can give use which interface have punt huge traffic rate to CPU.
10-28-2022 02:27 PM
"show platform software fed switch active punt rates interfaces" does not return any output.
11-01-2022 09:49 AM
Just an update on this. We did a packet capture on the CPU for ~1 minute (actually 54 seconds), During this time we saw over 1000 ARP request to the CPU. I think 1000 arp for a /24 address is somewhat high.
Would arp anit-flood mitigate this behavior? Does discarded packets still make it to the CPU?
11-01-2022 09:56 AM
so you have ARP poisoning attack,
i think DAI is work here, check below link
https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9600/software/release/16-12/configuration_guide/sec/b_1612_sec_9600_cg/configuring_dynamic_arp_inspection.html
11-01-2022 10:55 AM - edited 11-01-2022 11:05 AM
Why would this affect only the switches and not routers? I get that routers are more capable devices but we don't even see a slight jump in the CPU.
Also for DAI, it places the port in error disable mode, which would be bad. wouldn't arp anit-flood be better?
11-01-2022 12:53 PM
""DAI place port to error disable""
this how DAI protect your network, if it allow it then other your network will face connectivity issue.
""we don't even see a slight jump in the CPU.""
that because CoPP protect your CPU from high rate traffic and drop it, it drop arp and snmp and HSRP packet, because the CoPP can not more classify frame for same Queue.
but
you mention you use debug (my last option as I mention above) can you get the source MAC-address of ARP ??
this help us alot,
1-
you can follow mac-address to connect device and disconnect it
2-
if the mac-address is the real mac-address of one L3SW port then sure there is something in your L2 design, either there is L2 loop or there is bug in HSRP make L3SW send GARP with this high rate.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide