10-08-2014 12:31 PM - edited 03-07-2019 09:02 PM
We are getting high cpu on at least 2 of our 3850x Stacks. Spoke to TAC he noted:
CSCuo14511 fed and stack-mgr causing High CPU on 3850
Suggesting to upgrade to 03.03.04SE
Has anyone seen this and solved it without an upgrade.
Thanks,
Tom
ST3-Stack1-3850#sho proc cpu sort | e 0.00
Core 0: CPU utilization for five seconds: 94%; one minute: 92%; five minutes: 90%
Core 1: CPU utilization for five seconds: 94%; one minute: 93%; five minutes: 94%
Core 2: CPU utilization for five seconds: 98%; one minute: 96%; five minutes: 93%
Core 3: CPU utilization for five seconds: 80%; one minute: 83%; five minutes: 89%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5719 3399767 25647721 923 49.61 49.34 49.07 0 stack-mgr
5717 2287217 89195731 434 27.47 27.21 26.97 0 fed
10243 733637 16770188 371 13.71 13.54 13.70 34816 iosd
6250 40832 82186167 665 0.78 0.60 0.59 0 pdsd
10239 1299808 36806083 15 0.24 0.12 0.11 0 wcm
6261 692231 40238409 871 0.10 0.06 0.05 0 cpumemd
19 974400 53966893 33 0.05 0.03 0.04 0 sirq-net-rx/1
43 2452848 54120184 28 0.05 0.04 0.05 0 sirq-net-rx/3
5718 115760 32235590 20 0.05 0.09 0.10 0 platform_mgr
10240 1899130 59100638 3 0.05 0.04 0.02 0 table_mgr
ST3-Stack1-3850#sho ver
Cisco IOS Software, IOS-XE Software, Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 03.03.01SE RELEASE SOFTWARE (fc1)
Switch Ports Model SW Version SW Image Mode
------ ----- ----- ---------- ---------- ----
* 1 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
2 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
3 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
4 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
5 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
6 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
7 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
8 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
9 56 WS-C3850-48P 03.03.01SE cat3k_caa-universalk9 INSTALL
08-09-2016 03:52 AM
Hi
well, our experience is that whatever IOS-XE version is running, after 2-3 months cpu goes up to 31% again.
When it starts hitting 85% we re reloading the stacks in service windows.
Thats our fix.....
08-09-2016 04:16 AM
Hi Ton,
This is not a proper solution. have you checked with cisco TAC ?
08-09-2016 04:29 AM
Oh yeah.
They claimed it was a station generating excessive broadcast traffic.
So i brought down all ports and consoled into the stack and ran the debug commands again.
It still was a station generating excessive bc traffic they said. (with all ports down you know?)
I gave up on this plaform really. Too many issues last 2 years
We ve even found ISL issues in the logging, while dot1Q is now default....
09-01-2016 09:05 AM
Hey,
I have the same problem. Any news? I tried the "ipv6 mld snooping" but that didn't work.
Sw#show process cpu detail process fed sorted | ex 0.0
Core 0: CPU utilization for five seconds: 99%; one minute: 98%; five minutes: 97%
Core 1: CPU utilization for five seconds: 99%; one minute: 95%; five minutes: 96%
Core 2: CPU utilization for five seconds: 99%; one minute: 89%; five minutes: 95%
Core 3: CPU utilization for five seconds: 88%; one minute: 82%; five minutes: 93%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
5712 L 1858252 4132141 497 26.55 26.90 26.57 1088 fed
5712 L 3 12349 3193139 1511072 0 24.21 24.66 24.62 0 PunjectRx
5712 L 0 6152 2835105 6395961 0 0.59 0.85 0.75 0 fed-ots-main
5712 L 0 12350 2855645 3551507 0 0.29 0.25 0.22 0 PunjectTx
5712 L 0 10690 386394 2256693 0 0.24 0.13 0.10 0 Xcvr
5712 L 0 6147 238130 6694802 0 0.20 0.21 0.17 1088 CMI default xdm
11-22-2016 03:22 AM
Hi,
I have the same problem as everyone else.
I wonder if someone has a proper solution at the moment?
We are using 03.03.05SE . Any ideas if upgrading to 3.06.05 as suggested will resolve the problem?
11-22-2016 06:16 AM
Hello,
Have been running 3.06.05 since August and haven't seen the issue since. Running Lan Base if that helps.
09-27-2016 03:18 AM
Cisco TAC have advised this is a cosmetic bug which will be fix in yet to be released versions 3.6.6E and 3.7.5E. But for now a reload of the stack is a temporary fix as cpu will creep again after few months.
Cosmetic bug id is CSCuz57493 - High CPU observed in punjectrx fed-ots-main thread. this will be modified soon to include stack-mgr – replenish OOB/OOBnd RX
Thanks
04-08-2016 09:15 AM
Since I have upgraded all my 3850s to 03.06.03E, I have an uptime of 8 weeks, 6 days, 9 hours, 9 minutes and my cpu looks like this now;
This stack has mild usage on it, but was spiked out at 98-99% previously,
Core 0: CPU utilization for five seconds: 3%; one minute: 5%; five minutes: 5%
Core 1: CPU utilization for five seconds: 1%; one minute: 0%; five minutes: 0%
Core 2: CPU utilization for five seconds: 0%; one minute: 1%; five minutes: 1%
Core 3: CPU utilization for five seconds: 1%; one minute: 2%; five minutes: 1%
This is a stack with heavier usage on it, but was also running 98-99% prior,
Core 0: CPU utilization for five seconds: 26%; one minute: 18%; five minutes: 17%
Core 1: CPU utilization for five seconds: 14%; one minute: 11%; five minutes: 11%
Core 2: CPU utilization for five seconds: 8%; one minute: 8%; five minutes: 7%
Core 3: CPU utilization for five seconds: 9%; one minute: 6%; five minutes: 6%
09-11-2017 01:08 AM
Hi Guys,
I am having similar issue on one of my switch, see the version below
03.07.03E
I did a reset, it fixes the problem but its building up again, any help please?
11-20-2017 12:18 AM
Hi!
I have the same issue with the same version. Did you find a solution?
Best regards.
11-29-2017 09:10 PM - edited 11-29-2017 09:12 PM
If anyone face the issue with this version (3.7.3.E=, it is a cosmetic Bug, as it has no impact in the performance.
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuz57493/?reffering_site=dumpcr
The workaround is to reload the stack, but the final solution is to perform an upgrade to 3.7.5.E
Best regards.
02-02-2016 01:17 PM
Working with TAC and they are suggesting upgrading to 3.07.02.E. I have upgraded (2) 3850's. So far after 2 weeks they are running good. Will wait at least a month to verify however. I have seen this same situation happen before where the issue does not reoccur for several weeks.
-CPU broadcast queue is congested, and that ARP is around 60% of the capture
-Still see ARP traffic hitting the CPU regardless of whether or not the SVI is configured anymore, is because this platform by design allows 200pps (packets per second) of this kind of traffic.
-a few duplicate ARP packets due to software bug CSCur30273 – “3850 duplicates pass-through ARP packets”.
10-22-2014 12:30 PM
Bruno... No new news on this.
Aninda... do you have any interest in resolving this?
We downloaded 3.6 but got errors during the upgrade. Downloaded it again with the same results. It did boot up... but I am not going to even try it until I get a clean upgrade.
All my 3850 are stacks... some 9 deep. Still at 90+
Thanks,
Tom
10-22-2014 07:16 PM
Hey Tom,
It is going to be very difficult troubleshooting this issue over a medium like this. I'd suggest that everyone on here (facing this problem) open a (or another in case of Tom) TAC case for live troubleshooting.
As I had stated earlier, there may be genuine, underlying network issues that can cause stack-mgr to stay up. If that is not the case, and you're not seeing any impact from your cores running very high, then you could be hitting an internal defect where the show process cpu output is misreporting the CPU values.
You can post your SRs here and I can take a look at them as well.
Additionally, may I know what errors you encountered while trying to upgrade to 3.6.0?
Regards,
Aninda
10-23-2014 11:06 AM
Aninda,
We got the error during the copy from USB port to Flash: The copy did finish. We can try again if you need to exact error. I believe it was a checksum error
Tom
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide