cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5728
Views
0
Helpful
18
Replies

7600 process high cpu

etn
Level 1
Level 1

i try newest ios 15.1(3)S0a and 12.2(33)SRE

in both case some times i obtain lowing traffic on interface and highest cpu - to 100%

after clear cef linicard i obtain growing traffic and cpu 0%

#sh proc cpu s

CPU utilization for five seconds: 87%/83%; one minute: 91%; five minutes: 96%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

   7     1711512       87883      19474  4.06%  0.84%  0.89%   0 Check heaps     

245        5688    10115351          0  0.16%  0.12%  0.13%   0 Ethernet Msec Ti

210        2276     2538856          0  0.08%  0.02%  0.01%   0 IP ARP Retry Age

211       44772      267882        167  0.08%  0.04%  0.05%   0 IP Input        

244         428      326057          1  0.08%  0.00%  0.00%   0 Ethernet Timer C

#clear cef linecard

#sh proc cpu s

CPU utilization for five seconds: 0%/0%; one minute: 3%; five minutes: 3%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

157       36588        1511      24214  0.23%  0.02%  0.00%   0 Per-minute Jobs 

346      110540       43940       2515  0.15%  0.06%  0.05%   0 HIDDEN VLAN Proc

211       45600      280259        162  0.07%  0.04%  0.03%   0 IP Input     

18 Replies 18

rsimoni
Cisco Employee
Cisco Employee

Hi Alex,

what do you mean by "i obtain lowing traffic on interface" ? Do you see drops? Can you document them?

how often do you see the problem?

how many boxes are affected?

Next time you see the problem can you take the following BEFORE clearing CEF.

show module

show version

show process cpu sorted

show ibc brief

show cef line

after you clear cef you wait 2-3 minutes and then

show process cpu s

show ibc

show cef line

Let's see if I can get something useful from those outputs or else, as I wrote on the other thread, you'd better open a TAC case for deeper investigation.

Riccardo

Hi, Riccardo

problem is vary irregular - can be many times per day or no per week

any parameters no chage befor and after command clear

>how many boxes are affected?

do not understand - what is box ?

#sh mod

Mod Ports Card Type                              Model              Serial No.

--- ----- -------------------------------------- ------------------ -----------

  1    2  Route Switch Processor 720 (Active)    RSP720-3CXL-GE    

  2    4  CEF720 4 port 10-Gigabit Ethernet      WS-X6704-10GE     

  3    4  CEF720 4 port 10-Gigabit Ethernet      WS-X6704-10GE     

Mod MAC addresses                       Hw    Fw           Sw           Status

--- ---------------------------------- ------ ------------ ------------ -------

  1     5.7   12.2(33r)SRB 12.2(33)SRE5 Ok

  2     2.7   12.2(14r)S5  12.2(33)SRE5 Ok

  3     2.3   12.2(14r)S5  12.2(33)SRE5 Ok

Mod  Sub-Module                  Model              Serial       Hw     Status

---- --------------------------- ------------------ ----------- ------- -------

  1  Policy Feature Card 3       7600-PFC3CXL         1.2    Ok

  1  C7600 MSFC4 Daughterboard   7600-MSFC4           1.1    Ok

  2  Centralized Forwarding Card WS-F6700-CFC         4.1    Ok

  3  Centralized Forwarding Card WS-F6700-CFC         4.1    Ok

sh ver

Cisco IOS Software, c7600rsp72043_rp Software (c7600rsp72043_rp-IPSERVICESK9-M), Version 12.2(33)SRE5, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2011 by Cisco Systems, Inc.

Compiled Thu 15-Sep-11 01:11 by prod_rel_team

ROM: System Bootstrap, Version 12.2(33r)SRB4, RELEASE SOFTWARE (fc1)

BOOTLDR: Cisco IOS Software, c7600rsp72043_rp Software (c7600rsp72043_rp-IPSERVICESK9-M), Version 12.2(33)SRE5, RELEASE SOFTWARE (fc1)

OnePower uptime is 5 days, 3 hours, 39 minutes

Uptime for this control processor is 5 days, 3 hours, 37 minutes

System returned to ROM by reload (SP by reload)

System restarted at 11:35:01 YEKST Fri Oct 7 2011

System image file is "bootdisk:/c7600rsp72043-ipservicesk9-mz.122-33.SRE5.bin"

Last reload type: Normal Reload

Cisco CISCO7604 (M8500) processor (revision 2.0) with 1900544K/131072K bytes of memory.

Processor board ID FOX1326GDYU

BASEBOARD: RSP720

CPU: MPC8548_E, Version: 2.0, (0x80390020)

CORE: E500, Version: 2.0, (0x80210020)

CPU:1200MHz, CCB:400MHz, DDR:200MHz,

L1:    D-cache 32 kB enabled

        I-cache 32 kB enabled

Last reset from power-on

9 Virtual Ethernet interfaces

2 Gigabit Ethernet interfaces

8 Ten Gigabit Ethernet interfaces

3964K bytes of non-volatile configuration memory.

507024K bytes of Internal ATA PCMCIA card (Sector size 512 bytes).

Configuration register is 0x2102

show process cpu sorted | e 0.00%  0.00%  0.00%

CPU utilization for five seconds: 3%/2%; one minute: 5%; five minutes: 5%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

367       97620     1475871         66  0.07%  0.01%  0.00%   0 BGP I/O         

306        3900     2421176          1  0.07%  0.02%  0.01%   0 TCP Timer       

  29       88848      674355        131  0.07%  0.01%  0.00%   0 IPC Seat Manager

438      210188     2401185         87  0.07%  0.03%  0.02%   0 Port manager per

  47         244      444522          0  0.07%  0.00%  0.00%   0 GraphIt         

245        6396    54021009          0  0.07%  0.08%  0.07%   0 Ethernet Msec Ti

248        2900    13539680          0  0.07%  0.01%  0.00%   0 IPAM Manager    

  59          80       44615          1  0.07%  0.00%  0.00%   0 Net Background  

   7     5529260      295431      18715  0.00%  0.90%  1.02%   0 Check heaps     

   2       17712       89010        198  0.00%  0.01%  0.00%   0 Load Meter      

  73        3040        4847        627  0.00%  0.01%  0.00%   1 Virtual Exec    

210        2496    13539674          0  0.00%  0.02%  0.00%   0 IP ARP Retry Age

211      200952     2778465         72  0.00%  0.05%  0.07%   0 IP Input        

294      116484      358092        325  0.00%  0.02%  0.02%   0 XDR mcast       

346      254440      222523       1143  0.00%  0.05%  0.05%   0 HIDDEN VLAN Proc

444      878864     2088940        420  0.00%  0.05%  0.07%   0 BGP Router      

499     7460608       36888     202250  0.00%  0.87%  1.22%   0 BGP Scanner

show cef line

Slot   Flags

1/0    up

VRF IPv4:Default, 368765 routes

Slot    I/Fs State    Flags

1/0        6 Active   sync, table-up

VRF IPv6:Default, 2 routes

Slot    I/Fs State    Flags

1/0        0 Active   sync, table-up

show ibc brief

Interface information:

        Interface IBC0/0(idb 0x150A73E8)

        5 minute rx rate 11585000 bits/sec, 1529 packets/sec

        5 minute tx rate 23542000 bits/sec, 3059 packets/sec

        2970488354 packets input, 2217147403408 bytes

        2970394762 broadcasts received

        2969724007 packets output, 2214293033620 bytes

        66688948 broadcasts sent

        0 Bridge Packet loopback drops

        2967333578 Packets CEF Switched, 0 Packets Fast Switched

        0 Packets SLB Switched, 0 Packets CWAN Switched

        Label switched pkts dropped: 0    Pkts dropped during dma: 1097

        Invalid pkts dropped: 0    Pkts dropped(not cwan consumed): 0

        IPSEC pkts: 4553242

        Xconnect pkts processed: 0, dropped: 0

        Xconnect pkt reflection drops: 0

        Total paks copied for process level 0

        Total short paks sent in route cache 438681305

        Total throttle drops 0    Input queue drops 0

        total spd packets classified (1269812 low, 1627108 medium, 48840 high)

        total spd packets dropped (1097 low, 0 medium, 0 high)

        spd prio pkts allowed in due to selective throttling (0 med, 0 high)

        IBC resets   = 1; last at 23:35:59.527 YEKST Sat Jul 15 2000

rsimoni
Cisco Employee
Cisco Employee

Alex,

have you rated/closed the other question yet?

Riccardo

nkarpysh
Cisco Employee
Cisco Employee

Hi Alex,

As we can see High CPU is due to interrupts:

CPU utilization for five seconds: 87%/83%; one minute: 91%; five minutes: 96%

Value after /  - 83% is interrupts which are used to send traffi to CPU. So for some reason some traffic getting software switched instead of being HW switched.

As clear cef solves the issue I guess some CEF/GRT mismatch is happening. Possibly some CEF routes are lost making router to send prefixes for those lost entries to the CPU.

First of all we need to understand what are those packets.


Best tool to sniff cpu is netdr. It is safe to use with High CPU

Configure it when you see CPU going up again:

- debug netdr capture rx (let it run few seconds)

- show netdr capture

http://www.cisco.com/en/US/docs/routers/7600/ios/15S/configuration/guide/dos.html#wp1163918

You will see the packets coming in CPU. And then you will be able to check if you have CEF entries for destination ips.

That will give more clue of possible root cause.

Nik

HTH,
Niko

Hi, Nikolay

CPU now grow

sh proc cpu s

CPU utilization for five seconds: 41%/40%; one minute: 44%; five minutes: 43%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

244        1784    55116403          0  0.15%  0.10%  0.10%   0 Ethernet Msec Ti

498      388716      566545        686  0.07%  0.04%  0.06%   0 BGP Task        

  12       28076      190864        147  0.07%  0.00%  0.00%   0 ARP Input       

   4           0         174          0  0.00%  0.00%  0.00%   0 Retransmission o

   5           0           3          0  0.00%  0.00%  0.00%   0 IPC ISSU Dispatc

   6           0           1          0  0.00%  0.00%  0.00%   0 PF Redun ICC Req

i do not see that

>So for some reason some traffic getting software switched instead of being HW switched.

all are show command  give me picture than all traffic goes throw CEF

show ibc brief

Interface information:

        Interface IBC0/0(idb 0x150A73E8)

        5 minute rx rate 304992000 bits/sec, 48826 packets/sec

        5 minute tx rate 621909000 bits/sec, 97638 packets/sec

        4090902684 packets input, 2930329196392 bytes

        4090652673 broadcasts received

        4090450834 packets output, 2926531746102 bytes

        107393 broadcasts sent

        0 Bridge Packet loopback drops

        4088722150 Packets CEF Switched, 0 Packets Fast Switched

        0 Packets SLB Switched, 0 Packets CWAN Switched

        Label switched pkts dropped: 0    Pkts dropped during dma: 0

        Invalid pkts dropped: 0    Pkts dropped(not cwan consumed): 0

        IPSEC pkts: 5277877

        Xconnect pkts processed: 0, dropped: 0

        Xconnect pkt reflection drops: 0

        Total paks copied for process level 0

        Total short paks sent in route cache 536662685

        Total throttle drops 0    Input queue drops 0

        total spd packets classified (787923 low, 1116063 medium, 54980 high)

        total spd packets dropped (0 low, 0 medium, 0 high)

        spd prio pkts allowed in due to selective throttling (0 med, 0 high)

        IBC resets   = 1; last at 10:53:29.327 YEKST Fri Oct 7 2011

nkarpysh
Cisco Employee
Cisco Employee

Hi Alex,

Actually these values:

41%/40% mean that CPU is busy on 41% of resources and 40% of CPU taken by traffic and only 1% by SW processes (STP, telnet, etc.). I don't mean that all traffic hitting the CPU. It may be just some - but that it causing the spike. As I advised previously you need to understand what is the traffic hitting the CPU.

You can do either CPU SPAN or netdr as suggested before.

Nik

HTH,
Niko

Alexandr Gurbo
Level 1
Level 1

Hello Alex,

Do you found a problem?

I have the same problem on SRE6 and SRE7a.

Same problem here with SRE6, RSP720-3CXL

CPU load in interrupts grows to 50-80%, "clear cef linecard" helps for random time from hours to days.

It is one of two almost identical boxes, configuration also almost the same. On another box no such problem.

show netdr captured gives lots of normal packets, which are supposed to be routed in hardware but

for some reason spontaneously hit CPU.

If anybody has a solution, please let me know. Thanks!

Nicholas Oliver
Cisco Employee
Cisco Employee

Alex,

Just to reiterate what Nikolay said before, CPU utilization on an RSP720 or a SUP720 under interrupts, as you see, is generally due to punted traffic.  There will be no single solution for this type of problem as each situation is unique.  The question that must be answered in a situation like this is WHY is the traffic being punted?  The key to answering this question is always figuring out what that traffic is.  A netdr capture or a span of the inband channel will help in this process, but this is not an easy process to track down why it is occurring.  In your situation it sounds like we may be dealing with an issue in which the CEF forwarding table is either too large, and this is resulting in punts, or is somehow becoming stale and the clear allows it to be reprogrammed.  This issue will not be easily troubleshot through posts to the support forums, though we can certainly help if you prefer to handle the issue in this way.  My suggestion would be that you open a case with the TAC when this issue is being seen so that we can have a live view of what is being punted and assist in determining why. 

If you open a case, the best tech/subtech to select for this type of problem is:

TECH: LAN Switching

SUBTECH: Cat 6000, 6500 Troubleshooting High CPU Running IOS

PROBLEM CODE: Error Messages, Logs, Debugs

The same answer goes for sbr@infonet.ee, if you are actively seeing this problem engaging the TAC will get you the fastest resolution for this type of issue.  If you would prefer to go through the support forums, we will need to see the packets being punted through a netdr capture as Nikolay described above.

-Nick

sbr
Level 1
Level 1

6 days ago issued command, found from some internet forum discusstion "remote command switch test mls cef tcam-shadow off" and CPU load is still at normal 2-5%.

Alex,

Do you see anything in the output of the 'show mls cef inconsistency' or 'show mls cef logg' outputs?  I can't come up with a valid reason why you should have to disable tcam-shadowing in order for the punts to stop.  I expect there's a problem that is causing an inconsistency and resulting in punts to the RP CPU.  I would not be content to leave it as is, and would be interested in investigating further the reason for these punts.

-Nick

show mls cef exception status is/was always FALSE, I've checked it while high CPU load.

show mls cef logg is empty

#show mls cef inconsistency

Consistency Check Count       : 52973

TCAM Consistency Check Errors : 0

SSRAM Consistency Check Errors : 0

Still waiting for high CPU load event after reload 6 days ago except single one minutes right after reboot, but it is gone after (or maybe it was coincidence) "remote command switch test mls cef tcam-shadow off".

Alex,

The exception status would be an indication that we had filled the TCAM and had to punt everything.  My suspicion is that we are not in an exception state where we are punting everything, but rather that there is an inconsistency causing certain types of traffic to be punted, when they shouldn't.  It is possible that disabling tcam shadowing is avoiding this issue, by going directly to the source, rather than relying on the shadow copy to determine if forwarding can take place without a punt. 

-Nick

Dear All,

Please kindly advise how to use "remote command switch test mls cef tcam-shadow off" for this case. I go command on router but there're not any test option. The below for your reference

remote command switch ?

  LINE  Remote command string

Best Regards

Review Cisco Networking for a $25 gift card