cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7370
Views
8
Helpful
4
Replies

%SYS-3-CPUHOG: Task is running for...

kthned
Level 3
Level 3

Hi

Hi

I am observing a strange behaviour of high CPU spikes once in a week at a very same time and also in the same pattern . The loggs are below. Show version command attached. The switch did not crash but a lots of %SYS-3-CPUHOG: appears after couple of %PLATFORM_UCAST-4-PREFIX: messages at the end. Any idea ?

Mar  5 10:25:38: %SYS-3-CPUHOG: Task is running for (2099)msecs, more than (2000)msecs (8/3),process = HL3U bkgrd process.

-Traceback= 0x1BA3250z 0x27A933Cz 0x27A9264z 0xE4ABD8z 0x1BA2300z 0x1BA3824z 0x27A933Cz 0x27A9264z 0x1FAB190z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FBAB58z 0x1FAFCDCz

Mar  5 10:25:48: %SYS-3-CPUHOG: Task is running for (2098)msecs, more than (2000)msecs (3/0),process = HL3U bkgrd process.

-Traceback= 0x1C21168z 0x1C214F8z 0x1BFE4F0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FAF3ECz 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz

Mar  5 10:25:59: %SYS-3-CPUHOG: Task is running for (2098)msecs, more than (2000)msecs (6/4),process = HL3U bkgrd process.

-Traceback= 0x1FD2578z 0x2597634z 0x1FDE560z 0x1FBAFECz 0x1FAF4E8z 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z 0x1FA4DA4z 0x2687C04z 0x2682358z

Mar  5 10:26:09: %SYS-3-CPUHOG: Task is running for (2106)msecs, more than (2000)msecs (27/5),process = HL3U bkgrd process.

-Traceback= 0x1C21368z 0x1BFE4D0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FAF3ECz 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z

Mar  5 10:26:19: %SYS-3-CPUHOG: Task is running for (2099)msecs, more than (2000)msecs (10/7),process = HL3U bkgrd process.

-Traceback= 0x1BFE4B0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FBAB58z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z 0x1FA4DA4z 0x2687C04z

Mar  5 10:26:41: %PLATFORM_UCAST-4-PREFIX:  One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

Mar  5 10:28:59: %PLATFORM_UCAST-4-PREFIX:  One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

Switch Ports Model              SW Version            SW Image

------ ----- -----              ----------            ----------

*    1 18    WS-C3560E-12SD     15.0(2)SE1            C3560E-UNIVERSALK9-M

4 Replies 4

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello Syerdumairali,

>> Mar  5 10:28:59: %PLATFORM_UCAST-4-PREFIX:  One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

This is much more important the the CPUHOG messages

This is the sign that the switch is learning too many routes, there are too many CEF entries that cannot be programmed in the TCAM for hardware based forwarding.

You can check the total number of routes with

show ip route summary.

Now, the use of TCAM resources is decided by the SDM template in use. A routing SDM template may give you more routing entries and less entries for MAC addresses.

see

http://www.cisco.com/en/US/docs/switches/lan/catalyst3560/software/release/15.0_2_se/configuration/guide/swsdm.html

A reload is needed after an SDM template change

If SDM template cannot be changed or it is already the routing SDM you need to reduce the number of routes received by the switch using route summarization or other protocol specific tools ( like OSPF stub areas)

Hope to help

Giuseppe


Thanks Giuseppe,

Output of the sh ip route summary, and cef stats attached. Could you point out if the number of router is the case here ? I can see they are far less routes than allowed (8k). Or am I seeing the wrong numbers ?

swgw2#sh ip route summary

IP routing table name is default (0x0)

IP routing table maximum-paths is 32

Route Source    Networks    Subnets     Replicates  Overhead    Memory (bytes)

connected       1           115          0           6960        20416

static               0             0           0           0           0

ospf 1             83          605         0           53340       123840

  Intra-area: 496 Inter-area: 41 External-1: 0 External-2: 151

  NSSA External-1: 0 NSSA External-2: 0

internal        85                                              52424

Total           169         720         0           60300       196680

swgw2#sh ip cef switching statistics

       Reason                          Drop       Punt  Punt2Host

RP LES No route                         682          0          2

RP LES No adjacency                   13640          0        333

RP LES Incomplete adjacency         5272818          0          0

RP LES TTL expired                        0          0        177

RP LES IP options set                     0          0       4212

RP LES Features                           1          0          3

RP LES Neighbor resolution req       191820       6085          0

RP LES Total                        5478961       6085       4727

All    Total                        5478961       6085       4727

Hello Syedumairali,

yes the number of total routes looks like less then 8k. This is strange the error message is typically seen in a scenario like the one I have described in my previous post.

>> IP routing table maximum-paths is 32

Even considering multiple CEF entries used for equal cost multi path it cannot justify the error message.

Hope to help

Giuseppe

Hi Giuseppe,

I have again seen the exactly the same message at he very same time. I have taken few traces and found out that there is a machine in our network which scan all of the subnets (may be exchange server).

Than I looked out the tcam utlization (sh platform tcam uti), that shows unicast directly connected routes are fully utilized.

     IPv4 unicast directly-connected routes:      2048/2048       2048/2048

sh platform ip unicast faile routes also shows a huge over 2000 entries

sh platform ip unicast faile adjacencies shows also high failure with a strange message " ATM fail when added, still has ATM fail" for some of entries.

Questions :

why tcam for directly connected routes is overutilized in case of scanning a system ?

why the strange message "ATM fail when added, still has ATM fail" ?

Do you still think changing a sdm need to be change ?

Regards,

Umair

Review Cisco Networking for a $25 gift card