06-29-2016 12:09 AM - edited 03-05-2019 04:19 AM
Hi Guys
I am having intermittent high cpu on the 7206VXR (NPE-G1) router which goes above 70% and comes back normal to 40 % after sometime
when it goes high Per-Second Jobs shows 1.35% and on interface showing output and input drops both WAN and Gi0/1 as well
Router ISP BW is 600Mbps .need to know if its a traffic /QOS which router is not able to handle the load as in BW monitoring of WAN link goes avg on 220 mbps and Rx and Tx load is also not high
need to know what could be the issue or could be oversubscription of this input errors and unknown pro drop/output drops
below are the output of the interface and WAN utilization
sh proc cpu sort
CPU utilization for five seconds: 67%/64%; one minute: 71%; five minutes: 68%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
42 1722308028 99901457 17240 1.35% 1.37% 1.36% 0 Per-Second Jobs
157 4106996 2780438948 1 1.19% 1.11% 1.17% 0 HQF Shaper Backg
301 88 139 633 0.31% 0.02% 0.00% 3 Virtual Exec
2 631196 19798863 31 0.07% 0.07% 0.07% 0 Load Meter
169 87348 193259231 0 0.07% 0.00% 0.00% 0 CCE DP URLF cach
132 35960 93966142 0 0.07% 0.01% 0.00% 0 ILMI Timer Proce
118 998340 333361286 2 0.07% 0.02% 0.00% 0 TCP Timer
IOS:c7200-adventerprisek9-mz.124-24.T5.bin"
ID: CISCO7206VXR
interface GigabitEthernet0/3
description IP-CONNECT ISP WAN LINK
ip address 10.x.x.x.x 255.255.255.252
ip nbar protocol-discovery
ip flow ingress
ip flow egress
duplex auto
speed auto
media-type gbic
negotiation auto
service-policy output QOS
interface GigabitEthernet0/1
description Link to HQ
ip address 10..x.x.x.x 255.255.255.0
ip access-group 101 out
duplex full
speed 1000
media-type rj45
no negotiation auto
GigabitEthernet0/1 is up, line protocol is up
Description: Link to HQ
Internet address is 10.1.250.200/24
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 15/255, rxload 52/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is RJ45
output flow-control is unsupported, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 8/75/0/113 (size/max/drops/flushes); Total output drops: 3910
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 205578000 bits/sec, 27942 packets/sec
5 minute output rate 61682000 bits/sec, 19778 packets/sec
3741785087 packets input, 3853071941 bytes, 0 no buffer
Received 91556612 broadcasts, 0 runts, 0 giants, 0 throttles
7029005 input errors, 0 CRC, 0 frame, 7029005 overrun, 0 ignored
0 watchdog, 209497288 multicast, 0 pause input
0 input packets with dribble condition detected
921491430 packets output, 585150574 bytes, 0 underruns
9 output errors, 0 collisions, 3 interface resets
3295813 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
9 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
=============================================
GigabitEthernet0/3 is up, line protocol is up
Description: IP-CONNECT to ISP
Internet address is 10.x.x.x./30
MTU 1500 bytes, BW 600000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 83/255, rxload 24/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is autonegotiation, media type is LX
output flow-control is XON, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:24, output 00:00:00, output hang never
Last clearing of "show interface" counters 1y31w
Input queue: 0/75/16/252 (size/max/drops/flushes); Total output drops: 163912473
Queueing strategy: Class-based queueing
Output queue: 63/1000/0 (size/max total/drops)
5 minute input rate 56912000 bits/sec, 18717 packets/sec
5 minute output rate 195545000 bits/sec, 27055 packets/sec
3209462578 packets input, 1125360610 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
21972 input errors, 0 CRC, 0 frame, 21972 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
3899036769 packets output, 3478522230 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped ou
QOS:
policy-map QOS
class APPS_CLASS
priority percent 20
class TRAFFIC_CLASS
bandwidth percent 10
class UPDATE_CLASS
bandwidth percent 20
shape average percent 15
class class-default
bandwidth percent 20
100
90
80 ***** ***** ***** **********
70 ************************************************************
60 ************************************************************
50 ************************************************************
40 ************************************************************
30 ************************************************************
20 ************************************************************
10 ************************************************************
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per second (last 60 seconds)
797776766767777667766777667787877776667777677767666656655665
938375166661111843955742989804077702451301324643831274463414
--More-- 100
90 *
80 *** * * * * *** **** *
70 #####*#********* ##**###**######### ***** *** **
60 ####################################################**** **
50 ############################################################
40 ############################################################
30 ############################################################
20 ############################################################
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
06-29-2016 12:20 AM
The 7206 is pretty old now. I think if you are getting 600Mb/s through it you should be pretty happy.
Your "sh proc cpu sort" output doesn't show any process using a lot of CPU. So it is likely to be interrupts. If it is interrupts then it is more than likely the platform is simply running out of punch.
06-29-2016 12:35 AM
thanks for the quick inputs
if you see that on WAN interface as Gi0/1 which is connected to branch interface (Gi0/1 and Gi0/3) both are having output and input drops and Gi 0/1 also having unknown protocol drops connected to branch
need to have some evidence so can check further with end client which shows clearly that its due to limitation of HW /which traffic or any other factors
what is the throughput or capacity of this NPE-G1
http://www.gossamer-threads.com/lists/cisco/nsp/131832
06-29-2016 07:19 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
If your show proc cpu keeps showing interrupt CPU within a few percent of total CPU, as Philip already noted, your "platform is simply running out of punch" with your combination of traffic and your configuration.
There might be some configuration changes that will reduce CPU usage. For example, I notice you're using NBAR protocol discovery and an egress ACL. Do you need the former, and might ingress ACL(s) replace the latter?
Do you use the turbo ACL feature?
Do you really need to shape within your UPDATE_CLASS?
You queue drops, both ingress and egress, might be mitigated by increasing queue sizes.
What do your buffer stats look like? Do you use the auto buffers adjust feature?
What kind of device is on the other end of the interface showing the unknown protocol drops (a L3 switch)?
06-29-2016 10:33 PM
please find below comments and the attached configuration
There might be some configuration changes that will reduce CPU usage. For example, I notice you're using NBAR protocol discovery and an egress ACL. Do you need the former, and might ingress ACL(s) replace the latter? -yes we need it
Do you use the turbo ACL feature?--NO
Do you really need to shape within your UPDATE_CLASS?-Yes its require
You queue drops, both ingress and egress, might be mitigated by increasing queue sizes.--what should be the recommended kindly suggest ,also you are referring to GI 1/0/1 and 0/3 right
What do your buffer stats look like? Do you use the auto buffers adjust feature?-output is below
What kind of device is on the other end of the interface showing the unknown protocol drops (a L3 switch)?Its 6500 L3
===========================
------------------ show buffers ------------------
Buffer elements:
1103 in free list (1119 max allowed)
690553244 hits, 0 misses, 619 created
Public buffer pools:
Small buffers, 104 bytes (total 91, permanent 50, peak 347 @ 7w0d):
69 in free list (20 min, 150 max allowed)
731911947 hits, 3062 misses, 7128 trims, 7169 created
150 failures (0 no memory)
Middle buffers, 600 bytes (total 62, permanent 25, peak 62 @ 00:02:56):
45 in free list (10 min, 150 max allowed)
164689267 hits, 5978 misses, 9270 trims, 9307 created
294 failures (0 no memory)
Big buffers, 1536 bytes (total 50, permanent 50, peak 77 @ 7w0d):
49 in free list (5 min, 150 max allowed)
3664817932 hits, 9 misses, 27 trims, 27 created
0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 10, permanent 10):
10 in free list (0 min, 100 max allowed)
109460 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
Large buffers, 5024 bytes (total 1, permanent 0, peak 2 @ 7w0d):
1 in free list (0 min, 10 max allowed)
2928 hits, 1 misses, 1932 trims, 1933 created
0 failures (0 no memory)
Huge buffers, 18024 bytes (total 1, permanent 0, peak 14 @ 7w0d):
1 in free list (0 min, 4 max allowed)
36713 hits, 13 misses, 9270 trims, 9271 created
0 failures (0 no memory)
Interface buffer pools:
Syslog ED Pool buffers, 600 bytes (total 282, permanent 282):
250 in free list (282 min, 282 max allowed)
39633 hits, 0 misses
IPC buffers, 4096 bytes (total 2, permanent 2):
2 in free list (1 min, 8 max allowed)
0 hits, 0 fallbacks, 0 trims, 0 created
0 failures (0 no memory)
Header pools:
Header buffers, 0 bytes (total 511, permanent 256, peak 511 @ 7w0d):
255 in free list (256 min, 1024 max allowed)
171 hits, 85 misses, 0 trims, 255 created
0 failures (0 no memory)
256 max cache size, 256 in cache
3706689410 hits in cache, 0 misses in cache
Particle Clones:
1024 clones, 2 hits, 0 misses
Public particle pools:
F/S buffers, 128 bytes (total 512, permanent 512):
0 in free list (0 min, 512 max allowed)
512 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
512 max cache size, 512 in cache
0 hits in cache, 0 misses in cache
Normal buffers, 512 bytes (total 2048, permanent 2048):
2048 in free list (1024 min, 4096 max allowed)
46 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
Private particle pools:
HQF buffers, 0 bytes (total 2000, permanent 2000):
2000 in free list (500 min, 2000 max allowed)
61966429 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
GigabitEthernet0/1 buffers, 512 bytes (total 1000, permanent 1000):
0 in free list (0 min, 1000 max allowed)
1000 hits, 4 fallbacks
1000 max cache size, 697 in cache
3078789427 hits in cache, 0 misses in cache
14 buffer threshold, 0 threshold transitions
GigabitEthernet0/2 buffers, 512 bytes (total 1000, permanent 1000):
0 in free list (0 min, 1000 max allowed)
1000 hits, 0 fallbacks
1000 max cache size, 872 in cache
157327892 hits in cache, 0 misses in cache
14 buffer threshold, 0 threshold transitions
GigabitEthernet0/3 buffers, 512 bytes (total 1000, permanent 1000):
0 in free list (0 min, 1000 max allowed)
1000 hits, 42 fallbacks
1000 max cache size, 867 in cache
1702287216 hits in cache, 0 misses in cache
14 buffer threshold, 0 threshold transitions
ATM1/0 buffers, 512 bytes (total 1200, permanent 1200):
0 in free list (0 min, 1200 max allowed)
1200 hits, 1 misses
ATM2/0 buffers, 512 bytes (total 1200, permanent 1200):
0 in free list (0 min, 1200 max allowed)
1200 hits, 1 misses
ATM4/0 buffers, 512 bytes (total 4000, permanent 4000):
0 in free list (0 min, 4000 max allowed)
4000 hits, 1 misses
------------------ show buffers usage ------------------
Statistics for the Small pool
Caller pc : 0x6021395C count: 11
Resource User: IP Input count: 11
Output IDB : AT4/0.1 count: 4
Caller pc : 0x618614AC count: 4
Resource User: IP-EIGRP: count: 4
Caller pc : 0x614CF574 count: 3
Resource User: Init count: 3
Input IDB : Gi0/1 count: 4
Output IDB : Gi0/1 count: 1
Caller pc : 0x620769CC count: 1
Resource User: BGP Open count: 1
Number of Buffers used by packets generated by system: 80
Number of Buffers used by incoming packets: 11
Statistics for the Middle pool
Output IDB : Gi0/1 count: 10
Caller pc : 0x620769CC count: 11
Resource User: Virtual Ex count: 11
Number of Buffers used by packets generated by system: 62
Number of Buffers used by incoming packets: 0
Statistics for the Big pool
Caller pc : 0x61AAA718 count: 1
Resource User: Per-Second count: 1
Number of Buffers used by packets generated by system: 50
Number of Buffers used by incoming packets: 0
Statistics for the VeryBig pool
Number of Buffers used by packets generated by system: 10
Number of Buffers used by incoming packets: 0
Statistics for the Large pool
Number of Buffers used by packets generated by system: 1
Number of Buffers used by incoming packets: 0
Statistics for the Huge pool
Number of Buffers used by packets generated by system: 1
Number of Buffers used by incoming packets: 0
Statistics for the Syslog ED Pool pool
Caller pc : 0x634C1A60 count: 32
Resource User: EEM ED Sys count: 32
Number of Buffers used by packets generated by system: 282
Number of Buffers used by incoming packets: 0
Statistics for the IPC pool
Number of Buffers used by packets generated by system: 2
Number of Buffers used by incoming packets: 0
Statistics for the Header pool
Number of Buffers used by packets generated by system: 511
Number of Buffers used by incoming packets: 0
Statistics for the FS Header pool
Caller pc : 0x6063BD38 count: 3
Resource User: Init count: 12
Caller pc : 0x601686D4 count: 3
Caller pc : 0x6083E290 count: 1
Caller pc : 0x61A1C584 count: 1
Caller pc : 0x61B4134C count: 1
Caller pc : 0x62B2A4B8 count: 1
Caller pc : 0x61A0E608 count: 1
Resource User: IP ARP Adj count: 1
Caller pc : 0x6006B7EC count: 1
Caller pc : 0x62B29804 count: 1
Number of Buffers used by packets generated by system: 28
Number of Buffers used by incoming packets: 0
Statistics for the l2frag pak pool pool
Number of Buffers used by packets generated by system: 0
Number of Buffers used by incoming packets: 0
Statistics for the SW Crypto Header pool
Caller pc : 0x63956440 count: 1
Resource User: Init count: 1
Number of Buffers used by packets generated by system: 1
Number of Buffers used by incoming packets: 0
Statistics for the Crypto Fragmentation Header pool
Caller pc : 0x63AB46D4 count: 1
Resource User: Init count: 1
Number of Buffers used by packets generated by system: 1
Number of Buffers used by incoming packets: 0
06-30-2016 02:49 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
You don't have a huge number of trims and creates, but both burn a bit of CPU. So, if you're trying to get every CPU cycle for packet forwarding, you'll want to try to reduce those. You might manually buffer tune, or I believe your IOS image support auto buffer tuning.
06-30-2016 03:44 AM
Please find my comments below
every CPU cycle for packet forwarding, you'll want to try to reduce those -How can i reduce it ?
do you suspect any IOS related issue or any other?
06-30-2016 05:50 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
every CPU cycle for packet forwarding, you'll want to try to reduce those -How can i reduce it ?
As I previously noted, by not using features w/o a truly needed purpose or using features as optimally as possible. Again, for example, do you need to use NBAR discovery? Must you have Netflow stats. If not, remove these features. For optimal performance, try the turbo ACL feature or buffer tune feature, etc.
do you suspect any IOS related issue or any other?
No. The NPE-G1 is just a 1 Mpps processor. A pair of active gig ports is enough to overrun the capacity of the processor. Besides the gig ports, I see you have a couple of active ATM ports too.
06-30-2016 01:59 PM
OK, thanks ..
Need your suggestion to submit report that the hardware has limitation and either upgrade the hardware or reduce the feature .
For upgrading the HW is there any enough information/logs which can justify that gives some more full proof answer( like if we say HW upgrade there may be concerns like wan BW is not fully choked in monitoring tool and going upto 220mbps only etc )
06-30-2016 05:45 PM
Reducing the feature usage will probably only have a small impact.
You need to upgrade the hardware - or reduce the amount of data.
07-01-2016 03:04 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Reducing the feature usage will probably only have a small impact.
Usually that's true. However, there can be exceptions.
For example, a couple years ago, one of engineers had a policy to count every ToS marking passing out the WAN facing interface. This on a 7200 with a NPE-G1.
Then came the day, our telecom folk activated a new VoIP gateway that generated about 200 Mbps of VoIP traffic. That traffic crushed the CPU. (NB: CPU was showing almost all "interrupt" usage.) Removing the "counting" policy, dropped CPU usage by about 40%.
07-01-2016 07:20 AM
OK, thanks for the inputs
Regarding the HW limit which causes this cpu ,I need to see if anything that gives a good justification proves in the report for upgrading HW should resolve this issue .
If you both can give your inputs will be grate
Thanks in advance
07-01-2016 09:08 AM
I already did - i.e. your CPU usage and kind of usage.
07-01-2016 12:04 PM
Yes I know, like you did explain NBAR usage and other QOS features,ACL etc (traffic ) set however just need to know if any information or doc on HARDWARE point view based on current capacity/ configuration/logs I provided (which can clearly say that it's crossing the HW capacity and due to this logs we need to upgrade HW )that this device is not capable to handle the load would be a good to submit as a Good justification report (in terms of this HW) in order to go further with a new HW upgrade...
As even on gig port the rx and tx load is not much(although drops were there ) during the spike so eventually taking that decision won't much help I guess)
So need any few bullet points on HW limitation can help to proves that to go further
Thanks for your understating
07-01-2016 01:07 PM
You already provided this evidence yourself, in your first post, with the graph showing the platform running out of CPU. Can't get better evidence than that. I don't know what more the router could do to convince you it can't handle the load.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide