10-19-2019 11:50 AM - edited 10-19-2019 11:51 AM
Because of business needs, the Cisco 4507 and Huawei 7706 are connected, Cisco stp pvst+, Huawei stp mstp, and the interface is mode access.
The LAN connection is successful, but there is a problem when measuring the speed. The traffic from Cisco to Huawei is only 50Mb, and the traffic from Huawei to Cisco reaches the normal value.
After some debugging, I found that when data is transmitted between two LANs, the CPU utilization of the 4507 will be very high, and Huawei 7706 has no problem.
Here are some show information, please help me analyze.
--------- -------- ------------------------------------------------------------------------------------------
CORE>show processes cpu sorted
Core 0: CPU utilization for five seconds: 7%; one minute: 12%; five minutes: 16%
Core 1: CPU utilization for five seconds: 3%; one minute: 31%; five minutes: 12%
Core 2: CPU utilization for five seconds: 96%; one minute: 18%; five minutes: 6%
Core 3: CPU utilization for five seconds: 8%; one minute: 6%; five minutes: 7%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5227 285221 19150227 218 26.04 16.43 10.57 34816 iosd
5198 690665 41251304 16 2.10 1.89 1.99 0 cli_agent
5163 691879 41200900 16 0.32 0.09 0.03 0 osinfo-provider
5174 717894 41343266 17 0.30 0.26 0.27 0 eicored
4872 2190534 74836837 86 0.02 0.01 0.01 0 system_mgr
5172 1397368 95642400 64 0.02 0.03 0.03 0 ffm
CORE#show processes cpu detailed process iosd | ex 0.0
Core 0: CPU utilization for five seconds: 5%; one minute: 5%; five minutes: 7%
Core 1: CPU utilization for five seconds: 1%; one minute: 38%; five minutes: 25%
Core 2: CPU utilization for five seconds: 97%; one minute: 60%; five minutes: 49%
Core 3: CPU utilization for five seconds: 0%; one minute: 1%; five minutes: 17%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
5227 L 2 5227 2442927 367658931 0 24.4 24.5 22.4 34816 iosd
5227 L 0 8091 3759511 757422881 0 1.54 1.46 1.56 0 iosd.fastpath
68 I 516166 32748015 0 0.11 0.11 0.11 0 IDB Work
91 I 1842433 422001761 0 7.66 7.44 7.33 0 Cat4k Mgmt HiPri
92 I 995370 292727936 0 38.6 36.4 33.5 0 Cat4k Mgmt LoPri
142 I 850871 176993027 0 0.33 0.44 0.44 0 Ethernet Msec Tim
193 I 2071591 260819298 0 44.3 46.2 40.9 0 IP Input
202 I 821621 243065941 0 2.44 2.33 2.22 0 Spanning Tree
Found: Cat4k Mgmt LoPri, IP Input These two processes cpu occupied very high
Because I know the traffic port, I can directly view the port forwarding information.
CD_CORE#clear ip traffic
Clear "show ip traffic" counters [confirm]
CD_CORE#show ip traffic
After comparison, the biggest change is found to be forwarding data.
CD_CORE#show ip traffic
IP statistics:
Sent: 4329 generated, 791559 forwarded
CORE#debug ip packet detail
CORE#show logging
CD_CORE#show logging
d, filtering disabled)
No Active Message Discriminator.
No Inactive Message Discriminator.
Console logging: level debugging, 1122949 messages logged, xml disabled,
filtering disabled
Monitor logging: level debugging, 726814 messages logged, xml disabled,
filtering disabled
Buffer logging: level debugging, 1098603 messages logged, xml disabled,
filtering disabled
Exception Logging: size (8192 bytes)
Count and timestamp logging messages: disabled
Persistent logging: disabled
No active filter modules.
Trap logging: level informational, 91604 message lines logged
Logging Source-Interface: VRF Name:
Log Buffer (32768 bytes):
36:03.140: FIBfwd-proc: Default:192.168.226.0/24 process level forwarding
*Oct 19 11:36:03.140: FIBfwd-proc: depth 0 first_idx 0 paths 1 long 0(0)
*Oct 19 11:36:03.140: FIBfwd-proc: try path 0 (of 1) v4-rcrsv-192.168.96.100 first short ext 0(-1)
*Oct 19 11:36:03.140: FIBfwd-proc: v4-rcrsv-192.168.96.100 valid
5FEA47E4 path type recursive
*Oct 19 11:36:03.140: FIBfwd-proc: depth 1 first_idx 0 paths 1 long 0(0)
*Oct 19 11:36:03.140: FIBfwd-proc: try path 0 (of 1) v4-adp-192.168.96.100-Vl96 first short ext 0(-1)
*Oct 19 11:36:03.140: FIBfwd-proc: v4-adp-192.168.96.100-Vl96 valid
ia fib 0 path type adjacency prefix
*Oct 19 11:36:03.140: FIBfwd-proc: packet routed to Vlan96 192.168.96.100(0)
*Oct 19 11:36:03.140: FIBipv4-packet-proc: packet routing succeeded
*Oct 19 11:36:03.140: IP: tableid=0, s=192.168.99.190 (Vlan2), d=192.168.226.251 (Vlan96), routed via FIB
xp 0
hp 1 deag 0 chgif 0 ttlexp 0 rec 0
rd
*Oct 19 11:36:03.140: TCP src=49754, dst=64656, seq=3721689685, ack=1622375139, win=256 ACK
*Oct 19 11:36:03.140: IP: s=192.168.99.190 (Vlan2), d=192.168.226.251 (Vlan96), len 1500, sending full packet
*Oct 19 11:36:03.141: TCP src=49754, dst=64656, seq=3721689685, ack=1622375139, win=256 ACK
*Oct 19 11:36:03.141: IP: s=192.168.99.190 (Vlan2), d=192.168.226.251, len 1500, input feature
rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
*Oct 19 11:36:03.141: IP: s=192.168.99.190 (Vlan2), d=192.168.226.251, len 1500, input feature
4), rtype 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
*Oct 19 11:36:03.141: IP: s=192.168.99.190 (Vlan2), d=192.168.226.251, len 1500, input feature
type 0, forus FALSE, sendself FALSE, mtu 0, fwdchk FALSE
*Oct 19 11:36:03.141: FIBipv4-packet-proc: route packet from Vlan2 src 192.168.99.190 dst 192.168.226.251
*Oct 19 11:36:03.141: FIBfwd-proc: Default:192.168.226.0/24 process level forwarding
*Oct 19 11:36:03.141: FIBfwd-proc: depth 0 first_idx 0 paths 1 long 0(0)
*Oct 19 11:36:03.141: FIBfwd-proc: try path 0 (of 1) v4-rcrsv-192.168.96.100 first short ext 0(-1)
*Oct 19 11:36:03.141: FIBfwd-proc: v4-rcrsv-192.168.96.100 valid
5FEA47E4 path type recursive
*Oct 19 11:36:03.141: FIBfwd-proc: depth 1 first_idx 0 paths 1 long 0(0)
*Oct 19 11:36:03.141: FIBfwd-proc: try path 0 (of 1) v4-adp-192.168.96.100-Vl96 first short ext 0(-1)
*Oct 19 11:36:03.141: FIBfwd-proc: v4-adp-192.168.96.100-Vl96 valid
ia fib 0 path type adjacency prefix
*Oct 19 11:36:03.141: FIBfwd-proc: packet routed to Vlan96 192.168.96.100(0)
*Oct 19 11:36:03.141: FIBipv4-packet-proc: packet routing succeeded
*Oct 19 11:36:03.141: IP: tableid=0, s=192.168.99.190 (Vlan2), d=192.168.226.251 (Vlan96), routed via FIB
xp 0
hp 1 deag 0 chgif 0 ttlexp 0 rec 0
---------------------------------------------------------------------------------------------------
I don't understand these message information. Who can analyze what it means, can you find the fault point?
please
Solved! Go to Solution.
10-20-2019 11:16 AM - edited 10-20-2019 11:17 AM
Hello 1146591025@qq.c
Thanks for collecting these outputs. The management process "K5CpuMan Review" exceeding target percentage indicates there is high rate of traffic being processed at CPU for some reason. The CPU queues would be checked next for more information.
For the CPU queues that you posted, I see that "L3 Glean" queue is having abnormally high rate of traffic. This category is used for any traffic that needs ARP resolution to be forwarded. Examples would be routing next-hop ARP entry not resolved or switch has directly-connected subnet and ARP needs to be resolved for final destination. Please check if there are any issues with ARP resolution for the traffic hitting CPU.
Packets Received by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
L3 Glean 164838673 5953 5151 3088 923
There is also a bug on C4500 that can cause this type of issue when static routing is being used and the route gets mis-programmed into hardware after network topology change. Please see CSCvd32541 for more information.
To check for this bug, please check for following:
10-19-2019 12:45 PM - edited 10-19-2019 12:46 PM
Hello 1146591025@qq.c
Please see these helpful C4500 high CPU troubleshooting guides for reference:
The top CPU processes under iosd are Cat4k Mgmt LoPri and IP Input. Cat4k Mgmt LoPri refers to management process that is exceeding target percentage. Please check "show platform health" and look for processes where actual percentage is exceeding target percentage. For IP input process, this indicates IP traffic that is getting CPU processed for some reason.
The next thing that should be checked is which CPU queue is receiving the traffic. Please check "show platform cpu packet statistics all" and look for category named "packets received by packet queue". This will show which CPU queue is receiving high rate of traffic when issue is occurring. Once the queue has been identified, it will give good evidence for reason for CPU punt.
For the debugs, It looks like traffic in debug is being routed and FIB lookup is done. Not sure if this is related to the high CPU or not.
Can you please check the above items for high CPU troubleshooting? Also, can you check for following things:
10-19-2019 11:38 PM - edited 10-19-2019 11:48 PM
10-20-2019 11:16 AM - edited 10-20-2019 11:17 AM
Hello 1146591025@qq.c
Thanks for collecting these outputs. The management process "K5CpuMan Review" exceeding target percentage indicates there is high rate of traffic being processed at CPU for some reason. The CPU queues would be checked next for more information.
For the CPU queues that you posted, I see that "L3 Glean" queue is having abnormally high rate of traffic. This category is used for any traffic that needs ARP resolution to be forwarded. Examples would be routing next-hop ARP entry not resolved or switch has directly-connected subnet and ARP needs to be resolved for final destination. Please check if there are any issues with ARP resolution for the traffic hitting CPU.
Packets Received by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
L3 Glean 164838673 5953 5151 3088 923
There is also a bug on C4500 that can cause this type of issue when static routing is being used and the route gets mis-programmed into hardware after network topology change. Please see CSCvd32541 for more information.
To check for this bug, please check for following:
10-20-2019 12:02 AM
I have referred to https://www.cisco.com/c/en/us/support/docs/switches/catalyst-4000-series-switches/65591-cat4500-high-cpu.html.
However, it seems that I don't see any useful information.
CORE#show platform health
K5CpuMan Review 30.00 37.15 30 378 100 500 34 37 9 40751:59
K5FlowStatsFifoSearc 25.00 0.00 2 0 100 500 0 0 0 0:00
CORE#show platform cpu packet statistics
Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)
CPU Subport TxQueue 0 TxQueue 1 TxQueue 2 TxQueue 3
------------ --------------- --------------- --------------- ---------------
3 0 972 0 0
RkGenericPacketMan:
Packet allocation failures: 0
Packet Buffer(SW Common) allocation failures: 0
Packet Buffer(SW ESMP) allocation failures: 0
Packet Buffer(SW EOBC) allocation failures: 0
Packet Buffer(SW SupToSup) allocation failures: 0
Packets Dropped In Processing Overall
Total 5 sec avg 1 min avg 5 min avg 1 hour avg
-------------------- --------- --------- --------- ----------
596879445 74 63 56 51
Packets Dropped In Processing by CPU event
Event Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Sa Miss 634021 0 0 0 0
L2 Router 80 0 0 0 0
Input Acl Fwd 33 0 0 0 0
Input ACl Copy 596245296 74 63 56 51
Sw Packet for Bridge 15 0 0 0 0
Packets Dropped In Processing by Priority
Priority Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Normal 590970257 74 63 56 51
Medium 642499 0 0 0 0
High 5264650 0 0 0 0
Crucial 2039 0 0 0 0
Packets Dropped In Processing by Reason
Reason Total 5 sec avg 1 min avg 5 min avg 1 hour avg
------------------ -------------------- --------- --------- --------- ----------
STPDrop 2645 0 0 0 0
NoDstPorts 15 0 0 0 0
Tx Mode Drop 596876785 74 63 56 51
Total packet queues 64
Packets Received by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Input ACL fwd(snooping) 7440370 2 0 0 0
Host Learning 633614 0 0 0 0
L2 Control 257086541 2 1 2 2
Input ACL log, unreach 596231538 73 64 56 51
L3 Glean 164838673 5953 5151 3088 923
Ip Option 13 0 0 0 0
L3 Receive 5956586627 5 1 2 0
Ttl Expired 396835 0 0 0 0
Bfd 23171 0 0 0 0
Adj SameIf Fail 254363 0 0 0 0
L2 router to CPU, 7 1100728589 112 94 73 62
L3 Fwd 5448683511 0 0 0 0
Packets Dropped by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Host Learning 6693 0 0 0 0
Input ACL log, unreach 125517 0 0 0 0
L3 Glean 19206 0 0 0 0
L2 router to CPU, 7 5163397 0 0 0 0
L3 Fwd 1168135 0 0 0 0
4507 turns on CEF by default
ACL has no traffic matching
The next hop is also ok
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide