High CPU with unknown process cisco 7206VXR (NPE-G1) router

dnsroot13 · ‎06-29-2016

Hi Guys

I am having intermittent high cpu on the 7206VXR (NPE-G1) router which goes above 70% and comes back normal to 40 % after sometime

when it goes high Per-Second Jobs shows 1.35% and on interface showing output and input drops both WAN and Gi0/1 as well

Router ISP BW is 600Mbps .need to know if its a traffic /QOS which router is not able to handle the load as in BW monitoring of WAN link goes avg on 220 mbps and Rx and Tx load is also not high

need to know what could be the issue or could be oversubscription of this input errors and unknown pro drop/output drops

below are the output of the interface and WAN utilization

sh proc cpu sort
CPU utilization for five seconds: 67%/64%; one minute: 71%; five minutes: 68%
PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
42 1722308028    99901457      17240 1.35% 1.37% 1.36%   0 Per-Second Jobs
157     4106996 2780438948          1 1.19% 1.11% 1.17%   0 HQF Shaper Backg
301          88         139        633 0.31% 0.02% 0.00%   3 Virtual Exec
   2      631196    19798863         31 0.07% 0.07% 0.07%   0 Load Meter
169       87348   193259231          0 0.07% 0.00% 0.00%   0 CCE DP URLF cach
132       35960    93966142          0 0.07% 0.01% 0.00%   0 ILMI Timer Proce
118      998340   333361286          2 0.07% 0.02% 0.00%   0 TCP Timer

IOS:c7200-adventerprisek9-mz.124-24.T5.bin"

ID: CISCO7206VXR

interface GigabitEthernet0/3
description IP-CONNECT ISP WAN LINK
ip address 10.x.x.x.x 255.255.255.252
ip nbar protocol-discovery
ip flow ingress
ip flow egress
duplex auto
speed auto
media-type gbic
negotiation auto
service-policy output QOS

interface GigabitEthernet0/1
description Link to HQ
ip address 10..x.x.x.x 255.255.255.0
ip access-group 101 out
duplex full
speed 1000
media-type rj45
no negotiation auto

GigabitEthernet0/1 is up, line protocol is up
   Description: Link to HQ
Internet address is 10.1.250.200/24
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 15/255, rxload 52/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is RJ45
output flow-control is unsupported, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 8/75/0/113 (size/max/drops/flushes); Total output drops: 3910
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 205578000 bits/sec, 27942 packets/sec
5 minute output rate 61682000 bits/sec, 19778 packets/sec
     3741785087 packets input, 3853071941 bytes, 0 no buffer
     Received 91556612 broadcasts, 0 runts, 0 giants, 0 throttles
     7029005 input errors, 0 CRC, 0 frame, 7029005 overrun, 0 ignored
     0 watchdog, 209497288 multicast, 0 pause input
     0 input packets with dribble condition detected
     921491430 packets output, 585150574 bytes, 0 underruns
     9 output errors, 0 collisions, 3 interface resets
     3295813 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     9 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

=============================================

GigabitEthernet0/3 is up, line protocol is up

Description: IP-CONNECT to ISP
Internet address is 10.x.x.x./30
MTU 1500 bytes, BW 600000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 83/255, rxload 24/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is autonegotiation, media type is LX
output flow-control is XON, input flow-control is XON
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:24, output 00:00:00, output hang never
Last clearing of "show interface" counters 1y31w
Input queue: 0/75/16/252 (size/max/drops/flushes); Total output drops: 163912473
Queueing strategy: Class-based queueing
Output queue: 63/1000/0 (size/max total/drops)
5 minute input rate 56912000 bits/sec, 18717 packets/sec
5 minute output rate 195545000 bits/sec, 27055 packets/sec
     3209462578 packets input, 1125360610 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
     21972 input errors, 0 CRC, 0 frame, 21972 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     3899036769 packets output, 3478522230 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped ou

QOS:

policy-map QOS
class APPS_CLASS
    priority percent 20
class TRAFFIC_CLASS
    bandwidth percent 10
class UPDATE_CLASS
    bandwidth percent 20
    shape average percent 15
class class-default
    bandwidth percent 20

100
90
80 *****                         *****     *****     **********
70 ************************************************************
60 ************************************************************
50 ************************************************************
40 ************************************************************
30 ************************************************************
20 ************************************************************
10 ************************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0    5    0    5    0    5    0    5    0    5    0
               CPU% per second (last 60 seconds)


    797776766767777667766777667787877776667777677767666656655665
    938375166661111843955742989804077702451301324643831274463414
--More--         100
90 *
80 *** *    *        * *    *** ****           *
70 #####*#********* ##**###**######### ***** *** **
60 ####################################################**** **
50 ############################################################
40 ############################################################
30 ############################################################
20 ############################################################
10 ############################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0    5    0    5    0    5    0    5    0    5    0
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%

Philip D'Ath · ‎06-29-2016

The 7206 is pretty old now. I think if you are getting 600Mb/s through it you should be pretty happy.

Your "sh proc cpu sort" output doesn't show any process using a lot of CPU. So it is likely to be interrupts. If it is interrupts then it is more than likely the platform is simply running out of punch.

dnsroot13 · ‎06-29-2016

thanks for the quick inputs

if you see that on WAN interface as Gi0/1 which is connected to branch interface (Gi0/1 and Gi0/3) both are having output and input drops and Gi 0/1 also having unknown protocol drops connected to branch

need to have some evidence so can check further with end client which shows clearly that its due to limitation of HW /which traffic or any other factors

what is the throughput or capacity of this NPE-G1

http://www.gossamer-threads.com/lists/cisco/nsp/131832

Joseph W. Doherty · ‎06-29-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

If your show proc cpu keeps showing interrupt CPU within a few percent of total CPU, as Philip already noted, your "platform is simply running out of punch" with your combination of traffic and your configuration.

There might be some configuration changes that will reduce CPU usage. For example, I notice you're using NBAR protocol discovery and an egress ACL. Do you need the former, and might ingress ACL(s) replace the latter?

Do you use the turbo ACL feature?

Do you really need to shape within your UPDATE_CLASS?

You queue drops, both ingress and egress, might be mitigated by increasing queue sizes.

What do your buffer stats look like? Do you use the auto buffers adjust feature?

What kind of device is on the other end of the interface showing the unknown protocol drops (a L3 switch)?

dnsroot13 · ‎06-29-2016

please find below comments and the attached configuration

There might be some configuration changes that will reduce CPU usage. For example, I notice you're using NBAR protocol discovery and an egress ACL. Do you need the former, and might ingress ACL(s) replace the latter? -yes we need it

Do you use the turbo ACL feature?--NO

Do you really need to shape within your UPDATE_CLASS?-Yes its require

You queue drops, both ingress and egress, might be mitigated by increasing queue sizes.--what should be the recommended kindly suggest ,also you are referring to GI 1/0/1 and 0/3 right

What do your buffer stats look like? Do you use the auto buffers adjust feature?-output is below

What kind of device is on the other end of the interface showing the unknown protocol drops (a L3 switch)?Its 6500 L3

===========================

------------------ show buffers ------------------

Buffer elements:
     1103 in free list (1119 max allowed)
     690553244 hits, 0 misses, 619 created

Public buffer pools:
Small buffers, 104 bytes (total 91, permanent 50, peak 347 @ 7w0d):
     69 in free list (20 min, 150 max allowed)
     731911947 hits, 3062 misses, 7128 trims, 7169 created
     150 failures (0 no memory)
Middle buffers, 600 bytes (total 62, permanent 25, peak 62 @ 00:02:56):
     45 in free list (10 min, 150 max allowed)
     164689267 hits, 5978 misses, 9270 trims, 9307 created
     294 failures (0 no memory)
Big buffers, 1536 bytes (total 50, permanent 50, peak 77 @ 7w0d):
     49 in free list (5 min, 150 max allowed)
     3664817932 hits, 9 misses, 27 trims, 27 created
     0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 10, permanent 10):
     10 in free list (0 min, 100 max allowed)
     109460 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Large buffers, 5024 bytes (total 1, permanent 0, peak 2 @ 7w0d):
     1 in free list (0 min, 10 max allowed)
     2928 hits, 1 misses, 1932 trims, 1933 created
     0 failures (0 no memory)
Huge buffers, 18024 bytes (total 1, permanent 0, peak 14 @ 7w0d):
     1 in free list (0 min, 4 max allowed)
     36713 hits, 13 misses, 9270 trims, 9271 created
     0 failures (0 no memory)

Interface buffer pools:
Syslog ED Pool buffers, 600 bytes (total 282, permanent 282):
     250 in free list (282 min, 282 max allowed)
     39633 hits, 0 misses
IPC buffers, 4096 bytes (total 2, permanent 2):
     2 in free list (1 min, 8 max allowed)
     0 hits, 0 fallbacks, 0 trims, 0 created
     0 failures (0 no memory)

Header pools:
Header buffers, 0 bytes (total 511, permanent 256, peak 511 @ 7w0d):
     255 in free list (256 min, 1024 max allowed)
     171 hits, 85 misses, 0 trims, 255 created
     0 failures (0 no memory)
     256 max cache size, 256 in cache
     3706689410 hits in cache, 0 misses in cache

Particle Clones:
     1024 clones, 2 hits, 0 misses

Public particle pools:
F/S buffers, 128 bytes (total 512, permanent 512):
     0 in free list (0 min, 512 max allowed)
     512 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
     512 max cache size, 512 in cache
     0 hits in cache, 0 misses in cache
Normal buffers, 512 bytes (total 2048, permanent 2048):
     2048 in free list (1024 min, 4096 max allowed)
     46 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)

Private particle pools:
HQF buffers, 0 bytes (total 2000, permanent 2000):
     2000 in free list (500 min, 2000 max allowed)
     61966429 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
GigabitEthernet0/1 buffers, 512 bytes (total 1000, permanent 1000):
     0 in free list (0 min, 1000 max allowed)
     1000 hits, 4 fallbacks
     1000 max cache size, 697 in cache
     3078789427 hits in cache, 0 misses in cache
     14 buffer threshold, 0 threshold transitions
GigabitEthernet0/2 buffers, 512 bytes (total 1000, permanent 1000):
     0 in free list (0 min, 1000 max allowed)
     1000 hits, 0 fallbacks
     1000 max cache size, 872 in cache
     157327892 hits in cache, 0 misses in cache
     14 buffer threshold, 0 threshold transitions
GigabitEthernet0/3 buffers, 512 bytes (total 1000, permanent 1000):
     0 in free list (0 min, 1000 max allowed)
     1000 hits, 42 fallbacks
     1000 max cache size, 867 in cache
     1702287216 hits in cache, 0 misses in cache
     14 buffer threshold, 0 threshold transitions
ATM1/0 buffers, 512 bytes (total 1200, permanent 1200):
     0 in free list (0 min, 1200 max allowed)
     1200 hits, 1 misses
ATM2/0 buffers, 512 bytes (total 1200, permanent 1200):
     0 in free list (0 min, 1200 max allowed)
     1200 hits, 1 misses
ATM4/0 buffers, 512 bytes (total 4000, permanent 4000):
     0 in free list (0 min, 4000 max allowed)
     4000 hits, 1 misses

------------------ show buffers usage ------------------

Statistics for the Small pool
Caller pc    : 0x6021395C count:       11
Resource User:   IP Input count:       11
Output IDB   :    AT4/0.1 count:        4
Caller pc    : 0x618614AC count:        4
Resource User: IP-EIGRP: count:        4
Caller pc    : 0x614CF574 count:        3
Resource User:       Init count:        3
Input IDB    :      Gi0/1 count:        4
Output IDB   :      Gi0/1 count:        1
Caller pc    : 0x620769CC count:        1
Resource User:   BGP Open count:        1
Number of Buffers used by packets generated by system:   80
Number of Buffers used by incoming packets:              11

Statistics for the Middle pool
Output IDB   :      Gi0/1 count:       10
Caller pc    : 0x620769CC count:       11
Resource User: Virtual Ex count:       11
Number of Buffers used by packets generated by system:   62
Number of Buffers used by incoming packets:               0

Statistics for the Big pool
Caller pc    : 0x61AAA718 count:        1
Resource User: Per-Second count:        1
Number of Buffers used by packets generated by system:   50
Number of Buffers used by incoming packets:               0

Statistics for the VeryBig pool
Number of Buffers used by packets generated by system:   10
Number of Buffers used by incoming packets:               0

Statistics for the Large pool
Number of Buffers used by packets generated by system:    1
Number of Buffers used by incoming packets:               0

Statistics for the Huge pool
Number of Buffers used by packets generated by system:    1
Number of Buffers used by incoming packets:               0

Statistics for the Syslog ED Pool pool
Caller pc    : 0x634C1A60 count:       32
Resource User: EEM ED Sys count:       32
Number of Buffers used by packets generated by system: 282
Number of Buffers used by incoming packets:               0

Statistics for the IPC pool
Number of Buffers used by packets generated by system:    2
Number of Buffers used by incoming packets:               0

Statistics for the Header pool
Number of Buffers used by packets generated by system: 511
Number of Buffers used by incoming packets:               0

Statistics for the FS Header pool
Caller pc    : 0x6063BD38 count:        3
Resource User:       Init count:       12
Caller pc    : 0x601686D4 count:        3
Caller pc    : 0x6083E290 count:        1
Caller pc    : 0x61A1C584 count:        1
Caller pc    : 0x61B4134C count:        1
Caller pc    : 0x62B2A4B8 count:        1
Caller pc    : 0x61A0E608 count:        1
Resource User: IP ARP Adj count:        1
Caller pc    : 0x6006B7EC count:        1
Caller pc    : 0x62B29804 count:        1
Number of Buffers used by packets generated by system:   28
Number of Buffers used by incoming packets:               0

Statistics for the l2frag pak pool pool
Number of Buffers used by packets generated by system:    0
Number of Buffers used by incoming packets:               0

Statistics for the SW Crypto Header pool
Caller pc    : 0x63956440 count:        1
Resource User:       Init count:        1
Number of Buffers used by packets generated by system:    1
Number of Buffers used by incoming packets:               0

Statistics for the Crypto Fragmentation Header pool
Caller pc    : 0x63AB46D4 count:        1
Resource User:       Init count:        1
Number of Buffers used by packets generated by system:    1
Number of Buffers used by incoming packets:               0

Joseph W. Doherty · ‎06-30-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

You don't have a huge number of trims and creates, but both burn a bit of CPU. So, if you're trying to get every CPU cycle for packet forwarding, you'll want to try to reduce those. You might manually buffer tune, or I believe your IOS image support auto buffer tuning.

dnsroot13 · ‎06-30-2016

Please find my comments below

every CPU cycle for packet forwarding, you'll want to try to reduce those -How can i reduce it ?

do you suspect any IOS related issue or any other?

Joseph W. Doherty · ‎06-30-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

every CPU cycle for packet forwarding, you'll want to try to reduce those -How can i reduce it ?

As I previously noted, by not using features w/o a truly needed purpose or using features as optimally as possible. Again, for example, do you need to use NBAR discovery? Must you have Netflow stats. If not, remove these features. For optimal performance, try the turbo ACL feature or buffer tune feature, etc.

do you suspect any IOS related issue or any other?

No. The NPE-G1 is just a 1 Mpps processor. A pair of active gig ports is enough to overrun the capacity of the processor. Besides the gig ports, I see you have a couple of active ATM ports too.

dnsroot13 · ‎06-30-2016

OK, thanks ..

Need your suggestion to submit report that the hardware has limitation and either upgrade the hardware or reduce the feature .

For upgrading the HW is there any enough information/logs which can justify that gives some more full proof answer( like if we say HW upgrade there may be concerns like wan BW is not fully choked in monitoring tool and going upto 220mbps only etc )

Philip D'Ath · ‎06-30-2016

Reducing the feature usage will probably only have a small impact.

You need to upgrade the hardware - or reduce the amount of data.

Joseph W. Doherty · ‎07-01-2016

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Reducing the feature usage will probably only have a small impact.

Usually that's true. However, there can be exceptions.

For example, a couple years ago, one of engineers had a policy to count every ToS marking passing out the WAN facing interface. This on a 7200 with a NPE-G1.

Then came the day, our telecom folk activated a new VoIP gateway that generated about 200 Mbps of VoIP traffic. That traffic crushed the CPU. (NB: CPU was showing almost all "interrupt" usage.) Removing the "counting" policy, dropped CPU usage by about 40%.

dnsroot13 · ‎07-01-2016

OK, thanks for the inputs

Regarding the HW limit which causes this cpu ,I need to see if anything that gives a good justification proves in the report for upgrading HW should resolve this issue .

If you both can give your inputs will be grate

Thanks in advance

Joseph W. Doherty · ‎07-01-2016

I already did - i.e. your CPU usage and kind of usage.

dnsroot13 · ‎07-01-2016

Yes I know, like you did explain NBAR usage and other QOS features,ACL etc (traffic ) set however just need to know if any information or doc on HARDWARE point view based on current capacity/ configuration/logs I provided (which can clearly say that it's crossing the HW capacity and due to this logs we need to upgrade HW )that this device is not capable to handle the load would be a good to submit as a Good justification report (in terms of this HW) in order to go further with a new HW upgrade...

As even on gig port the rx and tx load is not much(although drops were there ) during the spike so eventually taking that decision won't much help I guess)

So need any few bullet points on HW limitation can help to proves that to go further

Thanks for your understating

Philip D'Ath · ‎07-01-2016

You already provided this evidence yourself, in your first post, with the graph showing the platform running out of CPU. Can't get better evidence than that. I don't know what more the router could do to convince you it can't handle the load.