cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1947
Views
25
Helpful
27
Replies

HSRP Causes Terminal Lag

SniffingPackets
Level 1
Level 1

Hi All,

We are having a weird issue and while we wait on Cisco TAC to evaluate I thought of polling the community.

In this set up we have two ISR router in a HSRP group. They are connected to two Catalyst 9300 with a port channel between them.

We configured an SVI on both switches and added it to the HSRP group. Immediately after adding the HSRP you can feel the terminal getting laggy and slow.

We are connected to the switches via SSH on the management port.

We also had an event where one of the switch became active, but there were other routers with higher priority which was active. During this time we couldn’t poll the switch in question with SNMP and ICMP was not responding (management port). The device did not reboot.

27 Replies 27

 

 

2    14     Forus traffic               Yes     4000      4000     4223501      5320

 

 

there is drop in forus traffic, forus is any traffic direct to CPU

dmac = Router_MAC
DIP = Router_IP

vccvbcvbcv.png

so I ask you make double check 
IP you add in each VLAN and also mac address of SVI (show standby)
check this in both SW 

 

I confirmed the IP I added to the vlan is correct and unused

! SWITCH1
interface Vlan100
 ip address 3.3.3.252 255.255.255.0
 standby 0 ip 3.3.3.254
 standby 0 priority 50
 standby 0 preempt delay minimum 120
 standby 0 authentication md5 key-string 7 BlahBlahBlah
 load-interval 30
 shutdown
end

!SWITCH2
interface Vlan100
 ip address 3.3.3.253 255.255.255.0
 standby 0 ip 3.3.3.254
 standby 0 priority 40
 standby 0 preempt delay minimum 120
 standby 0 authentication md5 key-string 7 BlahBlahBlah
 load-interval 30
 shutdown
end

!RTR1
interface GigabitEthernet0/0/3
 ip address 3.3.3.250 255.255.255.0
 no ip proxy-arp
 standby 0 ip 3.3.3.254
 standby 0 priority 130
 standby 0 preempt
 standby 0 authentication md5 key-string 7 BlahBlahBlah
 standby 0 track 11 decrement 40
 standby 0 track 22 decrement 40
 load-interval 30
 negotiation auto

!RTR2
interface GigabitEthernet0/0/3
 ip address 3.3.3.251 255.255.255.0
 no ip proxy-arp
 standby 0 ip 3.3.3.254
 standby 0 priority 120
 standby 0 preempt delay minimum 120
 standby 0 authentication md5 key-string 7 BlahBlahBlah
 standby 0 track 11 decrement 40
 standby 0 track 22 decrement 40
 load-interval 30
 negotiation auto


 

sh int vlan x | i bia
Hardware is EtherSVI, address is xxxx.xxxx.xxxx <<- check this MAC address for each VLAN SVI 

They all have unique mac address

switch1#show int vlan 100 | i bia
  Hardware is Ethernet SVI, address is cc7f.7649.8951 (bia cc7f.7649.8951)

switch2#show int vlan 100 | i bia
  Hardware is Ethernet SVI, address is cc7f.7653.d451 (bia cc7f.7653.d451)

rtr1#show int  GigabitEthernet0/0/1 | i bia
  Hardware is ISR4451-X-4x1GE, address is c4f7.d59d.bd21 (bia c4f7.d59d.bd21)

rtr2#show int  GigabitEthernet0/0/1 | i bia
  Hardware is ISR4451-X-4x1GE, address is d478.9b22.62e1 (bia d478.9b22.62e1)

 

one more step before find solution 

A protocol with a huge number of CPU-bound packets may impact other protocols in the same class, as some of these protocols share the same policer. For example, Address Resolution Protocol (ARP) shares 4000 hardware policers with an array of host protocols like Telnet, Internet Control Message Protocol (ICMP), SSH, FTP, and SNMP in the system-cpp-police-forus class. If there is an ARP poisoning or an ICMP attack, hardware policers start throttling any incoming traffic that exceeds 4000 packets per second to protect the CPU and the overall integrity of the system. As a result, ARP and ICMP host protocols are dropped, along with any other host protocols that share the same class.

sorry to say that but some cisco info. is not arrange in way that make Engineer easy to handle
anyway 
this from cisco Doc. it descripe what you face, loss SNMP loss SSH, and we already see that forus traffic Queue have drop.
but still last pieces what make HSRP make this Queue full ??
can this because not config GW for host ? this make all host send arp ask MAC address, and this make Queue full this fast. 
can we check this point. 

and still since he have HSRP flapping we need to check also HSRP config again. (deep look this time)

I'm not following the last request/post

the CPU have Queue called forus traffic, we see drop in this Queue, 
the Queue receive traffic to CPU for any protocol destination as I mention before is Router/L3SW itself. 

2    14     Forus traffic               Yes     4000      4000     4223501      5320

the Queue when it full it start drop frame include HSRP & SNMP (you mention that you face issue with SNMP), 
BUT 
here is Q, what make Queue Full ??


there is debug command but it effect SW CPU (high CPU utilize) 
and I would like use debug as last option.
but 
Switch#show platform software fed [switch] active punt rates interfaces

can give use which interface have punt huge traffic rate to CPU.

 

"show platform software fed switch active punt rates interfaces" does not return any output.

Just an update on this. We did a packet capture on the CPU for ~1 minute (actually 54 seconds), During this time we saw over 1000 ARP request to the CPU. I think 1000 arp for a /24 address is somewhat high. 

Would arp anit-flood mitigate this behavior? Does discarded packets still make it to the CPU? 

Why would this affect only the switches and not routers? I get that routers are more capable devices but we don't even see a slight jump in the CPU.

Also for DAI, it places the port in error disable mode, which would be bad. wouldn't arp anit-flood  be better?

""DAI place port to error disable""
this how DAI protect your network, if it allow it then other your network will face connectivity issue. 
""we don't even see a slight jump in the CPU.""
that because CoPP protect your CPU from high rate traffic and drop it, it drop arp and snmp and HSRP packet, because the CoPP can not more classify frame for same Queue. 

but 
you mention you use debug (my last option as I mention above) can you get the source MAC-address of ARP ??
this help us alot, 
1-
you can follow mac-address to connect device and disconnect it 
2-
if the mac-address is the real mac-address of one L3SW port then sure there is something in your L2 design, either there is L2 loop or there is bug in HSRP make L3SW send GARP with this high rate. 

Review Cisco Networking for a $25 gift card