cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6018
Views
40
Helpful
65
Replies

CISCO 6509 high CPU

andresdavid
Level 1
Level 1

Hello everybody, I have this problem for 2 days and I don't know what to do, 

bb01.network.ro>sh processes cpu | exclude 0.00%__0.00%__0.00%

CPU utilization for five seconds: 76%/67%; one minute: 84%; five minutes: 91%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
2 56640 661700 85 0.00% 0.01% 0.00% 0 Load Meter
8 34163576 1740542 19628 0.00% 0.88% 0.67% 0 Check heaps
11 4609760 7352294 626 0.15% 0.12% 0.13% 0 ARP Input
20 469000 3278408 143 0.00% 0.01% 0.00% 0 IPC Periodic Tim
23 26914948 162975847 165 0.00% 0.45% 0.51% 0 IPC Seat Manager
50 1405336 3451726 407 0.07% 0.07% 0.07% 0 Per-Second Jobs
51 2748004 116538 23580 0.87% 0.10% 0.06% 0 Per-minute Jobs
63 3541328 146553080 24 0.07% 0.07% 0.07% 0 Net Input
65 4048 463 8742 0.31% 0.19% 0.26% 2 SSH Process
84 2107256 5299555 397 0.07% 0.05% 0.05% 0 DHCP Snooping
218 659728 743227 887 0.00% 0.03% 0.02% 0 Compute load avg
241 526228 93192878 5 0.00% 0.03% 0.02% 0 ACE Tunnel Task
242 297764 6528667 45 0.07% 0.01% 0.00% 0 ACE Config Prop
253 1524904 3120700 488 0.00% 0.02% 0.01% 0 esw_vlan_stat_pr
266 1424060 2545041 559 0.07% 0.08% 0.08% 0 CDP Protocol
272 425015252 275710734 1541 2.63% 5.69% 8.45% 0 IP Input
298 98168 20468998 4 0.07% 0.00% 0.00% 0 Ethernet Timer C
299 1543800 374615844 4 0.23% 0.17% 0.16% 0 Ethernet Msec Ti
315 3613924 2274463 1588 0.15% 0.10% 0.11% 0 QOS Stats Gather
322 690244 120644 5721 0.00% 0.01% 0.00% 0 CEF background p
327 443864 77336 5739 0.07% 0.00% 0.00% 0 IP Background
333 526848 11680137 45 0.00% 0.03% 0.04% 0 TCP Timer
363 2090900 4989795 419 0.00% 0.06% 0.07% 0 CEF: IPv4 proces
376 95343748 35597873 2678 0.71% 0.78% 1.05% 0 FM core
384 1511928 1655151 913 0.00% 0.04% 0.05% 0 HIDDEN VLAN Proc
403 517800 93032054 5 0.00% 0.03% 0.02% 0 RADIUS
495 27500908 125788459 218 1.19% 0.70% 0.72% 0 Port manager per
555 153215372 12309297 12447 1.35% 2.20% 1.76% 0 IP NAT Ager
557 525376 32595667 16 0.00% 0.02% 0.01% 0 IGMP Input
558 302768 3315067 91 0.00% 0.01% 0.00% 0 PIM Process
559 344156 32519050 10 0.00% 0.01% 0.00% 0 Mwheel Process
562 4821188 317657 15177 0.00% 0.06% 0.09% 0 BGP Scanner
570 283060 13253757 21 0.00% 0.01% 0.00% 0 MLD

 

 

 

65 Replies 65

Whether the UDP value was changed is unclear.  What's described is timers were set an immediately CPU dropped for a short time.  So, possibly a TCAM flush?  If so, TCAM overflow will hit main CPU.

I do agree with checking NAT stats, even first, but as any TCAM overflows may cause a CPU hit, whether NAT related or not, worth checking too, I think.

bb01.netw.ro>show ip nat statistics
Total active translations: 263507 (0 static, 263507 dynamic; 263507 extended)
Outside interfaces:
Vlan1881, Vlan1882
Inside interfaces:
Vlan10
Hits: 917669387 Misses: 0
CEF Translated packets: 903754570, CEF Punted packets: 1104889329
Expired translations: 103970292
Dynamic mappings:
-- Inside Source
[Id: 9] access-list 1309 pool NAT_9 refcount 5199
pool NAT_9: netmask 255.255.255.252
start 86.xxx.xx.0 end 86.xxx.xx.3
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 10] access-list 1310 pool NAT_10 refcount 4105
pool NAT_10: netmask 255.255.255.252
start 86.xxx.xx.4 end 86.xxx.xx.7
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 11] access-list 1311 pool NAT_11 refcount 6873
pool NAT_11: netmask 255.255.255.252
start 86.xxx.xx.8 end 86.xxx.xx.11
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 12] access-list 1312 pool NAT_12 refcount 1706
pool NAT_12: netmask 255.255.255.252
start 86.xxx.xx.12 end 86.xxx.xx.15
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 13] access-list 1313 pool NAT_13 refcount 5773
pool NAT_13: netmask 255.255.255.252
start 86.xxx.xx.16 end 86.xxx.xx.19
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 14] access-list 1314 pool NAT_14 refcount 3478
pool NAT_14: netmask 255.255.255.252
start 86.xxx.xx.20 end 86.xxx.xx.23
type generic, total addresses 4, allocated 1 (25%), misses 136
[Id: 15] access-list 1315 pool NAT_15 refcount 2919
pool NAT_15: netmask 255.255.255.252
start 86.xxx.xx.24 end 86.xxx.xx.27
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 16] access-list 1316 pool NAT_16 refcount 7963
pool NAT_16: netmask 255.255.255.252
start 86.xxx.xx.28 end 86.xxx.xx.31
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 17] access-list 1317 pool NAT_17 refcount 9010
pool NAT_17: netmask 255.255.255.252
start 94.xxx.xx.0 end 94.xxx.xx.3
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 18] access-list 1318 pool NAT_18 refcount 9781
pool NAT_18: netmask 255.255.255.252
start 94.xxx.xx.4 end 94.xxx.xx.7
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 19] access-list 1319 pool NAT_19 refcount 77385
pool NAT_19: netmask 255.255.255.252
start 94.xxx.xx.8 end 94.xxx.xx.11
type generic, total addresses 4, allocated 2 (50%), misses 0
[Id: 20] access-list 1320 pool NAT_20 refcount 7740
pool NAT_20: netmask 255.255.255.252
start 94.xxx.xx.12 end 94.xxx.xx.15
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 21] access-list 1321 pool NAT_21 refcount 6333
pool NAT_21: netmask 255.255.255.252
start 94.xxx.xx.16 end 94.xxx.xx.19
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 22] access-list 1322 pool NAT_22 refcount 9542
pool NAT_22: netmask 255.255.255.252
start 94.xxx.xx.20 end 94.xxx.xx.23
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 23] access-list 1323 pool NAT_23 refcount 8653
pool NAT_23: netmask 255.255.255.252
start 94.xxx.xx.24 end 94.xxx.xx.27
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 24] access-list 1324 pool NAT_24 refcount 10135
pool NAT_24: netmask 255.255.255.252
start 94.xxx.xx.28 end 94.xxx.xx.31
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 25] access-list 1325 pool NAT_25 refcount 17963
pool NAT_25: netmask 255.255.255.252
start 94.xxx.xx.32 end 94.xxx.xx.35
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 26] access-list 1326 pool NAT_26 refcount 10442
pool NAT_26: netmask 255.255.255.252
start 94.xxx.xx.36 end 94.xxx.xx.39
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 27] access-list 1327 pool NAT_27 refcount 10440
pool NAT_27: netmask 255.255.255.252
start 94.xxx.xx.40 end 94.xxx.xx.43
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 28] access-list 1328 pool NAT_28 refcount 9566
pool NAT_28: netmask 255.255.255.252
start 94.xxx.xx.44 end 94.xxx.xx.47
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 29] access-list 1329 pool NAT_29 refcount 7805
pool NAT_29: netmask 255.255.255.252
start 94.xxx.xx.48 end 94.xxx.xx.51
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 30] access-list 1330 pool NAT_30 refcount 11477
pool NAT_30: netmask 255.255.255.252
start 94.xxx.xx.52 end 94.xxx.xx.55
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 31] access-list 1331 pool NAT_31 refcount 8078
pool NAT_31: netmask 255.255.255.252
start 94.xxx.xx.56 end 94.xxx.xx.59
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 32] access-list 1332 pool NAT_32 refcount 10905
pool NAT_32: netmask 255.255.255.252
start 94.xxx.xx.60 end 94.xxx.xx.63
type generic, total addresses 4, allocated 1 (25%), misses 0
[Id: 33] access-list 1666 pool NAT_666 refcount 0
pool NAT_666: netmask 255.255.255.252
start 46.xxx.xxx.0 end 46.xxx.xxx.3
type generic, total addresses 4, allocated 0 (0%), misses 0

When you can, please try the following commands (and post their output).:

Show sdm prefer

Show tcam count

Show platform hardware utilization 

bb01.netw.ro#show tcam counts
Used Free Percent Used Reserved
---- ---- ------------ --------
Labels:(in) 8 4064 0 24
Labels:(eg) 4 4068 0 24

ACL_TCAM
--------
Masks: 95 4001 2 72
Entries: 119 32649 0 576

QOS_TCAM
--------
Masks: 37 4059 0 25
Entries: 54 32714 0 200

LOU: 0 128 0
ANDOR: 0 16 0
ORAND: 0 16 0
ADJ: 3 2045 0

263507 NATing 

Each IP give 65000 port I see multi NAT with multi IP in pool' so the NAT is not exhaust.

Now there is one statement here it like hit what happened 

Cef punt'

punt/inject is process use when packet send/receive from cpu.

This here I need more deep dive to check this process with 6000 series SW.

"so the NAT is not exhaust."

Ah, but that could be part of the problem.  Too large a NAT table could overflow TCAM.  NAT would continue to work, but using the CPU.

Too small a NAT table would break some NAT operations.

Ideally you want the NAT table sized to contain all active flows but not contain dead flows.  Finding the ideal size can be difficult, so often you err in favor for a table larger than needed.

Is this the actual problem, or part of the problem?  Don't know, yet.  Just a possibility.

Yes, I too noticed the CEF punts.  Indeed very concerning, for it's CPU impact, as you also noted.

In my mind, I wouldn't normally think of CEF together with NAT, unless Cisco is just using CEF as saying "fast path".

To me, perhaps it means not using TCAM.  If so, why?  Obviously, for some reason which might include running out of TCAM, or such as your other post asking about logging on an ACE.

Two things I keep in mind, I recall OP noted no config changes, implying issue due to traffic kind and/or volume change.  Second, OP reported temporary CPU relief after making a NAT config change.  If CPU issue is due to traffic, alone, the temporary CPU relief might have been coincidental or by the config change.

Basically, I agree a deep dive on causes for those NAT punts is needed, but as running out of TCAM causes so many issues, believe checking its stats is warranted too.

Yes I agree' you ask him

Show platform hardware utilization 

This can show us if tcam is enough for NAT or not.

Already did.  ;  )

Was just doing some deep diving, and found:

ACL logging, with NAT on 6500 very, very bad.

6500 ideally uses Netflow for NAT.

6500 CEF punts might be normal for a NAT flow's first packet.

Interesting community posting:  https://community.cisco.com/t5/networking-knowledge-base/troubleshooting-high-cpu-on-a-6500/ta-p/3125784

bb01.netw.ro#show platform hardware capacity
System Resources
PFC operating mode: PFC3BXL
Supervisor redundancy mode: administratively sso, operationally sso
Switching resources: Module Part number Series CEF mode
1 WS-X6724-SFP CEF720 CEF
4 WS-X6748-GE-TX CEF720 dCEF
5 WS-SUP720-3BXL supervisor CEF
6 WS-SUP720-3BXL supervisor CEF
7 WS-X6724-SFP CEF720 CEF
8 WS-X6704-10GE CEF720 dCEF

Power Resources
Power supply redundancy mode: administratively redundant
operationally non-redundant (single power supply)
System power: 2331W, 0W (0%) inline, 1893W (81%) total allocated
Powered devices: 0 total, 0 Class3, 0 Class2, 0 Class1, 0 Class0, 0 Cisco

Flash/NVRAM Resources
Usage: Module Device Bytes: Total Used %Used
1 dfc#1-bootflash: 15990784 0 0%
4 dfc#4-bootflash: 15990784 0 0%
5 RP bootflash: 65536000 2511524 4%
5 SP disk0: 512073728 262144000 51%
5 SP disk1: 512065536 0 0%
5 SP sup-bootflash: 65536000 2886136 4%
5 SP const_nvram: 129004 1576 1%
5 SP hidden-nvram: 1964024 28190 1%
5 RP nvram: 1964024 28190 1%
6 slavenvram: 1964024 28190 1%
6 slaveconst_nvram: 129004 1576 1%
6 slavedisk0: 512073728 459366400 90%
6 slavesup-bootdisk: 512024576 437231616 85%
6 slavebootflash: 65536000 11393628 17%
7 dfc#7-bootflash: 15990784 0 0%
8 dfc#8-bootflash: 15990784 0 0%

CPU Resources
CPU utilization: Module 5 seconds 1 minute 5 minutes
1 1% / 0% 2% 2%
4 4% / 1% 5% 5%
5 RP 55% / 28% 78% 93%
5 SP 23% / 0% 25% 24%
6 RP 0% / 0% 1% 1%
6 SP 8% / 0% 5% 5%
7 0% / 0% 1% 2%
8 10% / 3% 13% 13%
Processor memory: Module Bytes: Total Used %Used
1 198918912 38938664 20%
4 1004225152 177515496 18%
5 RP 890709616 693337092 78%
5 SP 824701356 142678992 17%
6 RP 890677392 142542572 16%
6 SP 824728412 136759448 17%
7 198918912 38944876 20%
8 1004225152 177561400 18%
I/O memory: Module Bytes: Total Used %Used
5 RP 67108864 21605604 32%
5 SP 67108864 20884952 31%
6 RP 67108864 21605604 32%
6 SP 67108864 19082712 28%

EOBC Resources
Module Packets/sec Total packets Dropped packets
1 Rx: 29 471719780 0
Tx: 23 37763786 3
4 Rx: 50 471719585 0
Tx: 44 67134593 3
5 RP Rx: 144 207470126 42
Tx: 134 185384009 0
5 SP Rx: 49 80305695 60
Tx: 59 91688006 0
6 RP Rx: 0 3178151 10
Tx: 0 3249122 0
6 SP Rx: 6 6407304 1
Tx: 6 6444575 0
7 Rx: 24 471716471 0
Tx: 18 31149491 4
8 Rx: 27 471714957 0
Tx: 30 48835199 3

VLAN Resources
VLANs: 4094 total, 22 VTP, 53 extended, 18 internal, 4001 free

L2 Forwarding Resources
MAC Table usage: Module Collisions Total Used %Used
4 0 65536 511 1%
5 0 65536 2637 4%
6 0 65536 2636 4%
8 0 65536 702 1%

VPN CAM usage: Total Used %Used
512 0 0%
L3 Forwarding Resources
FIB TCAM usage: Total Used %Used
72 bits (IPv4, MPLS, EoM) 1032192 21733 2%
144 bits (IP mcast, IPv6) 8192 8 1%

detail: Protocol Used %Used
IPv4 21733 2%
MPLS 0 0%
EoM 0 0%

IPv6 1 1%
IPv4 mcast 4 1%
IPv6 mcast 3 1%

Adjacency usage: Total Used %Used
1048576 262479 25%

Forwarding engine load:
Module pps peak-pps peak-time
4 14154 277585 21:26:29 EETDST Sat Apr 29 2023
5 381276 766937 18:14:44 EETDST Thu Apr 20 2023
6 390231 535099 21:07:17 EETDST Sun Apr 30 2023
8 605213 1318133 18:24:25 EETDST Tue Apr 18 2023

Netflow Resources
TCAM utilization: Module Created Failed %Used
4 52325 0 19%
5 51366 0 19%
6 54442 0 20%
8 72454 0 27%
ICAM utilization: Module Created Failed %Used
4 7 0 5%
5 6 0 4%
6 6 0 4%
8 7 0 5%

Flowmasks: Mask# Type Features
IPv4: 0 reserved none
IPv4: 1 Intf Ful NAT_INGRESS NAT_EGRESS FM_GUARDIAN
IPv4: 2 unused none
IPv4: 3 reserved none

IPv6: 0 reserved none
IPv6: 1 unused none
IPv6: 2 unused none
IPv6: 3 reserved none

CPU Rate Limiters Resources
Rate limiters: Total Used Reserved %Used
Layer 3 9 5 1 56%
Layer 2 5 3 3 60%

ACL/QoS TCAM Resources
Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR,
QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND,
Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source,
LOUdst - LOU destination, ADJ - ACL adjacency

Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst AND OR ADJ
4 1% 2% 1% 1% 1% 1% 0% 0% 0% 0% 1%
5 1% 2% 1% 1% 1% 1% 0% 0% 0% 0% 1%
6 1% 2% 1% 1% 1% 1% 0% 0% 0% 0% 1%
8 1% 2% 1% 1% 1% 1% 0% 0% 0% 0% 1%

L3 Multicast Resources
IPv4 replication mode: egress
IPv6 replication mode: egress
Bi-directional PIM Designated Forwarder Table usage: 4 total, 0 (0%) used
Replication capability: Module IPv4 IPv6
1 egress egress
4 egress egress
5 egress egress
6 egress egress
7 egress egress
8 egress egress
MET table Entries: Module Total Used %Used
4 65516 6 1%
5 65516 6 1%
6 65516 0 0%
8 65516 6 1%

QoS Policer Resources
Aggregate policers: Module Total Used %Used
4 1024 3 1%
5 1024 3 1%
8 1024 3 1%
Microflow policer configurations: Module Total Used %Used
4 64 1 1%
5 64 1 1%
8 64 1 1%

Switch Fabric Resources
Bus utilization: current: 3%, peak was 6% at 22:11:40 EETDST Sun Apr 30 2023
Fabric utilization: Ingress Egress
Module Chanl Speed rate peak rate peak
1 0 20G 1% 7% @16:12 19Apr23 11% 20% @20:56 18Apr23
4 0 20G 0% 1% @22:11 30Apr23 0% 4% @18:46 27Apr23
4 1 20G 0% 1% @22:16 30Apr23 17% 28% @22:24 29Apr23
5 0 20G 12% 23% @15:31 17Apr23 14% 23% @22:24 29Apr23
6 0 20G 0% 0% 0% 1% @22:15 30Apr23
7 0 20G 3% 8% @12:06 25Apr23 2% 10% @18:43 17Apr23
8 0 20G 13% 25% @22:12 21Apr23 13% 23% @22:12 21Apr23
8 1 20G 13% 32% @18:43 17Apr23 1% 7% @15:40 29Apr23
Switching mode: Module Switching mode
1 acef
4 dcef
5 dcef
6 crossbar
7 acef
8 dcef

Interface Resources
Interface drops:
Module Total drops: Tx Rx Highest drop port: Tx Rx
1 1460171 43342 14 13
4 3889465860 21 3 41
5 0 261 0 2
7 905306 143372 3 24
8 11732387 139410 4 1

Interface buffer sizes:
Module Bytes: Tx buffer Rx buffer
1 (asic-1) 1221120 173504
4 (asic-1) 1221120 174016
7 (asic-1) 1221120 173504
8 (asic-1) 14622592 2068416
IBC Resources
Module Packets/sec Total packets Dropped packets
5 RP Rx: 2470 2457745229 0
Tx: 21031 15128120129 0
5 SP Rx: 9 9327824 0
Tx: 170 160256660 0
6 RP Rx: 0 70308 0
Tx: 0 70308 0
6 SP Rx: 4 3988244 0
Tx: 0 70455 0

SPAN Resources
Source sessions: 16 maximum, 1 used
Type Max Used
Local 2(*) 1
Local-tx 14 0
RSPAN source 2(*) 0
ERSPAN source 2(*) 0

Capture 1(*) 0
Service module 1(*) 0
OAM loopback 1(*) 0
* - shared source sessions and the total can not exceed 2
Destination sessions: 64 maximum, 0 used
Type Max Used
RSPAN destination 64(*) 0
ERSPAN destination 23(*) 0
* - shared destination sessions and the total can not exceed 64

Multicast LTL Resources
Usage: 30656 Total, 1386 Used

 

 

I see your sup720s are XLs, are your two line cards with DFCs also XLs?

CPU Resources
CPU utilization: Module 5 seconds 1 minute 5 minutes
1 1% / 0% 2% 2%
4 4% / 1% 5% 5%
5 RP 55% / 28% 78% 93%
5 SP 23% / 0% 25% 24%
6 RP 0% / 0% 1% 1%
6 SP 8% / 0% 5% 5%
7 0% / 0% 1% 2%
8 10% / 3% 13% 13%
Processor memory: Module Bytes: Total Used %Used
1 198918912 38938664 20%
4 1004225152 177515496 18%
5 RP 890709616 693337092 78%
5 SP 824701356 142678992 17%
6 RP 890677392 142542572 16%
6 SP 824728412 136759448 17%
7 198918912 38944876 20%
8 1004225152 177561400 18%
I/O memory: Module Bytes: Total Used %Used
5 RP 67108864 21605604 32%
5 SP 67108864 20884952 31%
6 RP 67108864 21605604 32%
6 SP 67108864 19082712 28%

In the above, we continue to see high RP CPU on sup in slot 5 - no surprise there - as we're still trying to find the cause.

Also in the above, memory utilization also seems somewhat high on your sup.  As CPU drops to 28% during last 5 seconds, unsure what time period the memory % is based on.  It might be interesting to see if your syslog traceback NAT memory errors also happen when CPU utilization is high.  (Understand, the memory errors in your other topic were due to fragmentation, not just lack of free memory.  I.e. even though 22% of memory is free, due to fragmentation, it's not useful.)

Looking at the TCAM usage stats, they appear okay.

Possibly if you reset NAT settings, NAT memory usage was reduced, reducing memory failures, until NAT memory usage grew again.

I.e. there might still be some benefit in reducing some of the NAT timers.

That aside, @MHM Cisco World investigation of other possible CPU high usage is still worthwhile.

@andresdavid are you using log for acl of NAT.

sorry, no, we don't use log for acl of NAT

First TS cef punt 

Are you using Log for the acl of NAT??

Joseph W. Doherty
Hall of Fame
Hall of Fame

You might consider trying:

ip nat translation timeout 600

ip nat translation tcp-timeout 3600

ip nat translation dns-timeout 60
ip nat translation finrst-timeout 60
ip nat translation icmp-timeout 60
ip nat translation syn-timeout 60

ip nat translation udp-timeout 300

I put these values, from 80% the cpu drops.. now it's 38 and it keeps dropping.. let's see how long.. and if it stays like that or goes back up:))

Review Cisco Networking for a $25 gift card