05-22-2007 09:52 AM - edited 03-05-2019 04:13 PM
I have a Catalyst 6509 with a SUP720 running a modular IOS. The IOS filename is s72033-adventerprisek9_wan-vz.122-18.SXF6.bin. I have noticed that the CPU utilization on this switch increases constantly since the time it was last rebooted. On all of our other switches and routers the CPU is higher during the day, and lower during the evening, but with this 6509, the CPU constantly climbs. It never decreases at all. This climb may take over 2 months, but it will start at about 10% utilization, and within 2 months, it'll be near 40% utilization.
I have another 6509 that was just deployed and is also running a modular IOS (but a different version), and I am experiencing the exact same thing. On other 6509's that are running the standard IOS (not modular) we do not see this.
Does anyone know if there are any known issues like this? I tried searching the bug lists, but I didn't see any obvious bugs.
Thanks,
-Steve
05-22-2007 10:12 AM
Please post show proc cpu and lets see what is using up your resources.
Thanks
Steve
05-22-2007 10:20 AM
Here is the output from the 'show proc cpu' command.
CPU utilization for five seconds: 30%; one minute: 33%; five minutes: 34%
PID 5Sec 1Min 5Min Process
1 0.3% 15.0% 16.0% kernel
3 0.0% 0.0% 0.0% qdelogger
4 0.0% 0.0% 0.0% devc-pty
5 0.0% 0.0% 0.0% devc-mistral.proc
6 0.0% 0.0% 0.0% pipe
7 0.0% 0.0% 0.0% dumper.proc
4104 0.0% 0.0% 0.0% pcmcia_driver.proc
4105 0.0% 0.0% 0.0% bflash_driver.proc
20490 0.0% 0.0% 0.0% mqueue
20491 0.0% 0.0% 0.0% flashfs_hes.proc
20492 0.0% 0.0% 0.0% dfs_bootdisk.proc
20493 0.0% 0.0% 0.0% ldcache.proc
20494 0.0% 0.0% 0.0% watchdog.proc
20495 0.0% 0.0% 0.0% syslogd.proc
20496 0.0% 0.0% 0.0% name_svr.proc
20497 0.0% 0.0% 0.0% wdsysmon.proc
20498 0.0% 0.0% 0.0% sysmgr.proc
24578 0.0% 0.0% 0.0% chkptd.proc
24595 0.0% 0.0% 0.0% sysmgr.proc
24596 0.0% 0.0% 0.0% syslog_dev.proc
24597 0.0% 0.0% 0.0% itrace_exec.proc
PID 5Sec 1Min 5Min Process
24598 0.0% 0.0% 0.0% packet.proc
24599 0.0% 0.0% 0.0% installer.proc
24600 25.9% 16.6% 16.7% ios-base
24601 0.0% 0.0% 0.0% fh_fd_oir.proc
24602 0.0% 0.0% 0.0% fh_metric_dir.proc
24603 0.0% 0.0% 0.0% fh_fd_snmp.proc
24604 0.0% 0.0% 0.0% fh_fd_none.proc
24605 0.0% 0.0% 0.0% fh_fd_intf.proc
24606 0.0% 0.0% 0.0% fh_fd_gold.proc
24607 0.0% 0.0% 0.0% fh_fd_timer.proc
24608 0.0% 0.0% 0.0% fh_fd_ioswd.proc
24609 0.0% 0.0% 0.0% fh_fd_counter.proc
24610 0.0% 0.0% 0.0% fh_fd_rf.proc
24611 0.0% 0.0% 0.0% fh_fd_cli.proc
24612 0.0% 0.0% 0.0% fh_server.proc
24613 0.0% 0.0% 0.0% fh_policy_dir.proc
24614 2.8% 0.3% 0.2% tcp.proc
24615 0.0% 0.0% 0.0% ipfs_daemon.proc
24616 0.4% 0.2% 0.2% raw_ip.proc
24617 0.0% 0.0% 0.0% inetd.proc
24618 0.0% 0.1% 0.2% udp.proc
24619 0.0% 0.1% 0.1% iprouting.iosproc
24620 0.2% 0.1% 0.1% cdp2.iosproc
05-22-2007 10:50 AM
What about your logs any thing in the logs indicating anything?
I wonder if you are having issues beacuse packets are getting punted to the CPU instead of being software switched CEF.
Please include
Show logs
show ip arp sum
show processes cpu | exclude 0.00
show mls statistics
05-22-2007 11:01 AM
There is nothing in the logs to indicate any sort of a problem. Like I said, the CPU ramps up over several months. After a reboot, the cycle starts again. I was also wondering about packets being punted to the CPU, but everything appears to be running CEF. No ACL's or anything either.
Here are the outputs you requested. I ommitted the 'show log' output, as it has nothing useful in it, but is quite long.
6509#show ip arp sum
1222 IP ARP entries, with 16 of them incomplete
6509#show processes cpu | exclude 0.0
CPU utilization for five seconds: 19%; one minute: 42%; five minutes: 43%
PID 5Sec 1Min 5Min Process
1 0.1% 11.9% 15.6% kernel
24600 17.1% 26.8% 24.7% ios-base
24614 1.7% 1.8% 1.4% tcp.proc
24616 0.2% 0.3% 0.3% raw_ip.proc
24619 0.1% 0.2% 0.1% iprouting.iosproc
6509#show mls statistics
Statistics for Earl in Module 5
L2 Forwarding Engine
Total packets Switched : 48640339585
L3 Forwarding Engine
Total packets L3 Switched : 48594000495 @ 15575 pps
Total Packets Bridged : 26677053072
Total Packets FIB Switched : 20778492539
Total Packets ACL Routed : 0
Total Packets Netflow Switched : 0
Total Mcast Packets Switched/Routed : 127984744
Total ip packets with TOS changed : 2
Total ip packets with COS changed : 2
Total non ip packets COS changed : 0
Total packets dropped by ACL : 0
Total packets dropped by Policing : 0
Total packets exceeding CIR : 0
Total packets exceeding PIR : 0
Errors
MAC/IP length inconsistencies : 13
Short IP packets received : 0
IP header checksum errors : 0
Total packets L3 Switched by all Modules: 48594000495 @ 15575 pps
05-22-2007 11:13 AM
Can we try clearing the arp and lets see if the issue is with the MAC/IP length inconsistencies
05-22-2007 11:25 AM
I cannot clear the ARP table at this time. This switch is in production.
I don't beleive the 13 errors are the cause of this issue though. There have only been 13 of them, and the switch has been up almost 8 weeks. We have other switches with almost 40 inconsistencies, and they have no problems.
-Steve
05-22-2007 11:35 AM
http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml
The release noteslink below has alot of info on high cpu regarding your code and different enviroments.
http://www.cisco.com/univercd/cc/td/doc/product/lan/cat6000/122sx/ol_4164.htm#wp3144175
Steve
05-22-2007 12:21 PM
I am probably going to revert to a non-modular IOS. I need to schedule a downtime to do this. I was just hoping somemone maybe had some input or had maybe seen this before.
Thanks.
-Steve
05-25-2007 06:05 AM
I haven't seen anything in this thread asking, but do you by chance run BGP on this box? We had this issue with 7600's (same architecture) when using SUP720-B.
What version of SUP720 are you using and are you taking the full BGP table?
05-25-2007 08:08 AM
I just wanna share this, also try to check the IOS version of the switch. We had an incident that both of our 4500 core switches crashed at the same time because they were turned on at the same time. That time, redundancy was totally useless. It was because of a memory leak.
10-31-2007 04:51 PM
Did you ever resolve this?
I'm also seeing this exact problem with my 6509 running 12.2(33)SXH modular. Slow creep over time.
A show CPU seems to indicate this is in the kernel process. If I get the details, the high CPU use is from TID 14 of PID 1:
1 14 10 Running 0 (128K) 5h26m procnto-cisco
I can't find any other information past this.
10-31-2007 06:22 PM
The only resolution I found was to upgrade the IOS to a non-modular model. I had to upgrade 2 switches and haven't had a problem since.
After I had this problem I had a Cisco engineer on site about 2 weeks later and he said not to use modular in production systems until it evolves some more.
Hope this helps!
-Steve
11-01-2007 08:31 AM
Hi,
We have had exactly the same issues with Modular IOS on the Sup720.
CPU was high on ios-base process. We also suffered from excessively high cpu when issuing show tech-support, or even show run.
We are currently downgrading all sups to non modular ios on the latest safe harbour 12.2(18)SXF8.
This seems to have solved all our issues.
Stay away from the modular IOS!
11-01-2007 12:10 PM
I've got a TAC case open now on this so I'll let you know what happens. I've got a graph of CPU over time, and there is a nice stair-step progression of the CPU climbing. On Oct 27th I was a my normal baseline of 8%, now I'm at almost 30%. I looks like it jumps up every 5.5 to 6-hours
Worst case, I'm thinking I'll just force a fail-over to the redundant Sup, and see if that at least helps while cisco sorts it out.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide