cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2469
Views
0
Helpful
5
Replies

Catalyst 6500 with SUP720 High SP utilization due to heartbeat process (not interrupt / traffic)

Shadow-82
Level 1
Level 1

Hi!

Recently we got issues with our Cat 6500 (2x SUP720)
SP util is high but it's not interrupt based.
There are many tshoot articles regarding traffic / interrupt based problems but here it looks that process kills my guy.
For now I kinda cannot reboot / swap supervisors, etc (I need to wait for TW if doing so).

See me SP util stats with "heartbeat process" at the very top

 

sw#remote command switch show processe cpu sort

CPU utilization for five seconds: 99%/5%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
102 1516076 164598279 9 100.00% 53.05% 56.05% 0 Heartbeat Proces
297 187048188 572222625 326 7.85% 6.59% 6.61% 0 Spanning Tree
486 25630268 67979841 377 3.69% 1.30% 1.26% 0 LTL MGR
114 22439876241738490297 1290 3.14% 3.46% 3.53% 0 slcp process
269 2707503320 111421942 24299 1.72% 0.79% 0.79% 0 Vlan Statistics

 

Any hints to solve this?

Thanks in advance

1 Accepted Solution

Accepted Solutions

It seems we've solved the problem

I'm posting this for further reference

We've reviewed HSRP config for 2 switches working in pair (where one of them had SP at 100% util.) We ordered the priorities + added preempt on both sides for every HSRP gateway

Also we've added to every SVI:

  interface Vlanxxx
    no ip redirects
    no ip proxy-arp
    standby 0 preempt

    standby 0 priority 200 (when necessar)
    arp timeout 1500

And SP util. went down to 40%. It works like this 3rd day in a row.

View solution in original post

5 Replies 5

Leo Laohoo
Hall of Fame
Hall of Fame
Post the complete output to the following commands:
1. sh version; and
2. sh proc cpu sort | ex 0.00

Thanks for reply

RP is ok, i's SP which gets high.

Below you'll find show output requested (full output attached in file)

 

sw#show version
Cisco IOS Software, s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(33)SXI9, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Fri 24-Feb-12 21:38 by prod_rel_team

ROM: System Bootstrap, Version 12.2(17r)SX5, RELEASE SOFTWARE (fc1)

sw uptime is 5 years, 3 weeks, 6 days, 3 hours, 58 minutes
Uptime for this control processor is 5 years, 3 weeks, 6 days, 3 hours, 47 minutes
Time since sw switched to active is 4 weeks, 1 day, 10 hours, 27 minutes
System returned to ROM by Stateful Switchover at 22:19:22 CET Fri Oct 9 2009 (SP by power-on)
System restarted at 08:49:46 CET Sun Jul 29 2012
System image file is "sup-bootdisk:s72033-ipservicesk9_wan-mz.122-33.SXI9.bin"
Last reload reason: reload

 

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

cisco WS-C6513 (R7000) processor (revision 2.0) with 983008K/65536K bytes of memory.
Processor board ID XXXXXXXXXXX
SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache
Last reset from s/w reset
4294967294 Ethernet interfaces
66 Virtual Ethernet interfaces
452 Gigabit Ethernet interfaces
1917K bytes of non-volatile configuration memory.
8192K bytes of packet buffer memory.

65536K bytes of Flash internal SIMM (Sector size 512K).
Configuration register is 0x2102

 

sw#sho proc cpu sort | ex 0.00%
CPU utilization for five seconds: 8%/2%; one minute: 11%; five minutes: 11%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
11 24242396 47617397 509 1.19% 1.11% 1.14% 0 ARP Input
121 287896 1951815 147 1.03% 0.82% 0.82% 0 CDP Protocol
256 20047944 256422482 78 0.87% 0.89% 0.91% 0 IP Input
446 51836168 351091288 147 0.71% 1.89% 1.94% 0 Port manager per
48 5109800 25330285 201 0.31% 0.32% 0.45% 0 Logger
323 7080068 191081744 37 0.23% 0.19% 0.18% 0 CEF: IPv4 proces
23 13864728 199813645 69 0.23% 0.33% 0.43% 0 IPC Seat Manager
137 9960 11675 853 0.23% 0.05% 0.01% 5 SSH Process
515 2558441366302949 0 0.15% 0.21% 0.23% 0 HSRP Common
279 2018443081601723 0 0.15% 0.14% 0.15% 0 Ethernet Msec Ti
516 2599904 60770574 42 0.15% 0.10% 0.08% 0 HSRP IPv4
347 2108884 80013108 26 0.07% 0.03% 0.03% 0 HIDDEN VLAN Proc
129 92228 518304 177 0.07% 0.04% 0.05% 0 Compute load avg
445 328772 691224163 0 0.07% 0.05% 0.07% 0 PM Callback
239 2799032 9683125 289 0.07% 0.11% 0.12% 0 Earl NDE Task
520 889900 539279306 1 0.07% 0.07% 0.07% 0 EIGRP-IPv4 Hello
501 32910688 92128769 357 0.07% 0.18% 0.23% 0 SNMP ENGINE
85 2484936 55982095 44 0.07% 0.03% 0.02% 0 ARP HA
50 2517304 159974297 15 0.07% 0.08% 0.07% 0 Per-Second Jobs

 

sw#remote login switch
Trying Switch ...
Entering CONSOLE for Switch
Type "^C^C^C" to end this session

 

sw-sp#show processes cpu sort

CPU utilization for five seconds: 99%/3%; one minute: 99%; five minutes: 99%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

102 1517012 164608343   9 100.00% 50.71% 54.92%   0 Heartbeat Proces
297 187357416 572261698 327 6.27% 6.54% 6.45% 0 Spanning Tree
114 22441355681738622319 1290 3.49% 4.25% 3.67% 0 slcp process
502 33010632 47541673 694 2.85% 1.29% 1.17% 0 SEA write CF pro
225 22292764 291875318 76 0.71% 0.71% 0.71% 0 cpf_process_tpQ
539 4629012 18110937 255 0.71% 0.10% 0.07% 0 DiagCard11/-1
535 3727832 21243014 175 0.55% 0.07% 0.04% 0 DiagCard6/-1
127 3692828 21154205 174 0.47% 0.06% 0.04% 0 DiagCard3/-1
540 4786868 19122967 250 0.39% 0.16% 0.08% 0 DiagCard12/-1
537 4783620 19109410 250 0.39% 0.14% 0.08% 0 DiagCard9/-1
93 118981224 83226500 1429 0.39% 0.30% 0.25% 0 Compute load avg
494 31195764 215149470 144 0.31% 0.33% 0.36% 0 NDE - IPV4
538 4780320 19087764 250 0.23% 0.20% 0.10% 0 DiagCard10/-1
129 3881532 22984558 168 0.23% 0.53% 0.36% 0 RFS server proce
340 233968 2667255 87 0.23% 0.27% 2.49% 0 XDR LC PRIO Crit
533 4119600 25398817 162 0.23% 0.12% 0.07% 0 DiagCard4/-1
376 8589132 3059603 2807 0.23% 0.36% 0.35% 0 Env Poll
299 28166561743916415 1 0.15% 0.14% 0.15% 0 UDLD
314 41379961569570387 2 0.15% 0.17% 0.13% 0 Mcast Hw Agent6
353 1265780841986742713 63 0.15% 0.46% 0.29% 0 mls-gc Process
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
143 584 1799 324 0.15% 0.44% 0.12% 0 RFSS worker proc
416 44745044127867871 0 0.07% 0.08% 0.07% 0 PM Callback
283 7448832 97018485 76 0.07% 0.04% 0.05% 0 Netflow Aging Ta
75 69994976 396199822 176 0.07% 0.18% 0.16% 0 SCP Download Lis
531 2145700 12527586 171 0.07% 0.07% 0.06% 0 DiagCard1/-1
94 796328 5003641 159 0.07% 0.04% 0.05% 0 OIR Process
331 3876136 193524154 20 0.07% 0.04% 0.05% 0 CEF LC Stats
265 10562548 72072076 146 0.07% 0.23% 0.19% 0 SCP async: LCP#1

Read THIS.

Thanks for reply, but... you're sure?
I've read this one before & the article focuses on debug what comes into SP
I have no high CPU utilization due to excessive traffic, but due to internal process
Should I do "debug netdr capture rx" despite the difference?

 

I did some of that tshoot session show outputs, but for debug I'll need to wait after business hours (please reply if netdr results would be helpful here). The rest - attached

It seems we've solved the problem

I'm posting this for further reference

We've reviewed HSRP config for 2 switches working in pair (where one of them had SP at 100% util.) We ordered the priorities + added preempt on both sides for every HSRP gateway

Also we've added to every SVI:

  interface Vlanxxx
    no ip redirects
    no ip proxy-arp
    standby 0 preempt

    standby 0 priority 200 (when necessar)
    arp timeout 1500

And SP util. went down to 40%. It works like this 3rd day in a row.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card