%SYS-3-CPUHOG: Task is running for...

kthned · ‎03-05-2013

Hi

I am observing a strange behaviour of high CPU spikes once in a week at a very same time and also in the same pattern . The loggs are below. Show version command attached. The switch did not crash but a lots of %SYS-3-CPUHOG: appears after couple of %PLATFORM_UCAST-4-PREFIX: messages at the end. Any idea ?

Mar 5 10:25:38: %SYS-3-CPUHOG: Task is running for (2099)msecs, more than (2000)msecs (8/3),process = HL3U bkgrd process.

-Traceback= 0x1BA3250z 0x27A933Cz 0x27A9264z 0xE4ABD8z 0x1BA2300z 0x1BA3824z 0x27A933Cz 0x27A9264z 0x1FAB190z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FBAB58z 0x1FAFCDCz

Mar 5 10:25:48: %SYS-3-CPUHOG: Task is running for (2098)msecs, more than (2000)msecs (3/0),process = HL3U bkgrd process.

-Traceback= 0x1C21168z 0x1C214F8z 0x1BFE4F0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FAF3ECz 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz

Mar 5 10:25:59: %SYS-3-CPUHOG: Task is running for (2098)msecs, more than (2000)msecs (6/4),process = HL3U bkgrd process.

-Traceback= 0x1FD2578z 0x2597634z 0x1FDE560z 0x1FBAFECz 0x1FAF4E8z 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z 0x1FA4DA4z 0x2687C04z 0x2682358z

Mar 5 10:26:09: %SYS-3-CPUHOG: Task is running for (2106)msecs, more than (2000)msecs (27/5),process = HL3U bkgrd process.

-Traceback= 0x1C21368z 0x1BFE4D0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FAF3ECz 0x1FBAC30z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z

Mar 5 10:26:19: %SYS-3-CPUHOG: Task is running for (2099)msecs, more than (2000)msecs (10/7),process = HL3U bkgrd process.

-Traceback= 0x1BFE4B0z 0x4EFBA8z 0x4E5904z 0x4D944Cz 0x1FAB140z 0x1EFF338z 0x1F006B8z 0x1F00BE4z 0x1F01F14z 0x1FABCF8z 0x1FBAB58z 0x1FAFCDCz 0x1FB06CCz 0x1FB1524z 0x1FA4DA4z 0x2687C04z

Mar 5 10:26:41: %PLATFORM_UCAST-4-PREFIX: One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

Mar 5 10:28:59: %PLATFORM_UCAST-4-PREFIX: One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

Switch Ports Model SW Version SW Image

------ ----- ----- ---------- ----------

* 1 18 WS-C3560E-12SD 15.0(2)SE1 C3560E-UNIVERSALK9-M

Giuseppe Larosa · ‎03-05-2013

Hello Syerdumairali,

>> Mar 5 10:28:59: %PLATFORM_UCAST-4-PREFIX: One or more, more specific prefixes could not be programmed into TCAM and are being covered by a less specific prefix, and the packets may be software forwarded

This is much more important the the CPUHOG messages

This is the sign that the switch is learning too many routes, there are too many CEF entries that cannot be programmed in the TCAM for hardware based forwarding.

You can check the total number of routes with

show ip route summary.

Now, the use of TCAM resources is decided by the SDM template in use. A routing SDM template may give you more routing entries and less entries for MAC addresses.

see

http://www.cisco.com/en/US/docs/switches/lan/catalyst3560/software/release/15.0_2_se/configuration/guide/swsdm.html

A reload is needed after an SDM template change

If SDM template cannot be changed or it is already the routing SDM you need to reduce the number of routes received by the switch using route summarization or other protocol specific tools ( like OSPF stub areas)

Hope to help

Giuseppe

kthned · ‎03-05-2013

Thanks Giuseppe,

Output of the sh ip route summary, and cef stats attached. Could you point out if the number of router is the case here ? I can see they are far less routes than allowed (8k). Or am I seeing the wrong numbers ?

swgw2#sh ip route summary

IP routing table name is default (0x0)

IP routing table maximum-paths is 32

Route Source Networks Subnets Replicates Overhead Memory (bytes)

connected 1 115 0 6960 20416

static 0 0 0 0 0

ospf 1 83 605 0 53340 123840

Intra-area: 496 Inter-area: 41 External-1: 0 External-2: 151

NSSA External-1: 0 NSSA External-2: 0

internal 85 52424

Total 169 720 0 60300 196680

swgw2#sh ip cef switching statistics

Reason Drop Punt Punt2Host

RP LES No route 682 0 2

RP LES No adjacency 13640 0 333

RP LES Incomplete adjacency 5272818 0 0

RP LES TTL expired 0 0 177

RP LES IP options set 0 0 4212

RP LES Features 1 0 3

RP LES Neighbor resolution req 191820 6085 0

RP LES Total 5478961 6085 4727

All Total 5478961 6085 4727

Giuseppe Larosa · ‎03-05-2013

Hello Syedumairali,

yes the number of total routes looks like less then 8k. This is strange the error message is typically seen in a scenario like the one I have described in my previous post.

>> IP routing table maximum-paths is 32

Even considering multiple CEF entries used for equal cost multi path it cannot justify the error message.

Hope to help

Giuseppe

kthned · ‎03-12-2013

Hi Giuseppe,

I have again seen the exactly the same message at he very same time. I have taken few traces and found out that there is a machine in our network which scan all of the subnets (may be exchange server).

Than I looked out the tcam utlization (sh platform tcam uti), that shows unicast directly connected routes are fully utilized.

IPv4 unicast directly-connected routes: 2048/2048 2048/2048

sh platform ip unicast faile routes also shows a huge over 2000 entries

sh platform ip unicast faile adjacencies shows also high failure with a strange message " ATM fail when added, still has ATM fail" for some of entries.

Questions :

why tcam for directly connected routes is overutilized in case of scanning a system ?

why the strange message "ATM fail when added, still has ATM fail" ?

Do you still think changing a sdm need to be change ?

Regards,

Umair