Re: Cisco 3850 High CPU

johnlloyd_13 · ‎04-12-2019

hi,

we got an issue on a secondary core C3850 3.6.4 which consistently averaging a CPU of 50%+

we got another primary C3850 3.6.4 but has only 2% CPU ave.

not sure what could be the culprit. any ideas?

C3850#show processes cpu sort | exclude 0.0
Core 0: CPU utilization for five seconds: 99%; one minute: 95%; five minutes: 93%
Core 1: CPU utilization for five seconds: 82%; one minute: 87%; five minutes: 91%
Core 2: CPU utilization for five seconds: 6%; one minute: 19%; five minutes: 24%
Core 3: CPU utilization for five seconds: 18%; one minute: 22%; five minutes: 26%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5715 4011600 27402184 4878 25.73 26.07 26.42 1088 fed
6258 3292482 92102444 4593 23.34 23.44 23.39 0 pdsd
8581 3259771 25430981 187 2.39 5.04 6.86 0 iosd

C3850# show processes cpu detailed process fed sorted | ex 0.0
Core 0: CPU utilization for five seconds: 98%; one minute: 92%; five minutes: 93%
Core 1: CPU utilization for five seconds: 100%; one minute: 98%; five minutes: 95%
Core 2: CPU utilization for five seconds: 8%; one minute: 13%; five minutes: 19%
Core 3: CPU utilization for five seconds: 3%; one minute: 14%; five minutes: 18%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
5715 L 1 6155 4009829 1574356 0 24.76 24.66 24.49 0 fed-ots-main
5715 L 3 9087 1784445 9273872 0 0.54 0.71 0.66 0 PunjectTx
5715 L 3 6158 259374 9013314 0 0.25 0.25 0.25 0 fed-ots-nfl

C3850# show processes cpu detailed process pdsd sorted | ex 0.0
Core 0: CPU utilization for five seconds: 91%; one minute: 93%; five minutes: 93%
Core 1: CPU utilization for five seconds: 99%; one minute: 94%; five minutes: 94%
Core 2: CPU utilization for five seconds: 5%; one minute: 16%; five minutes: 18%
Core 3: CPU utilization for five seconds: 9%; one minute: 10%; five minutes: 16%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
6258 L 1540442 9209233 4593 23.89 23.92 23.80 0 pdsd
6258 L 0 8533 1349466 3162035 0 23.89 23.91 23.79 0 pdsd

C3850# show processes cpu detailed process iosd sorted | ex 0.0
Core 0: CPU utilization for five seconds: 97%; one minute: 94%; five minutes: 94%
Core 1: CPU utilization for five seconds: 100%; one minute: 96%; five minutes: 95%
Core 2: CPU utilization for five seconds: 6%; one minute: 14%; five minutes: 18%
Core 3: CPU utilization for five seconds: 5%; one minute: 9%; five minutes: 16%
PID T C TID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
8581 L 2935671 2542610 186 2.62 2.32 3.90 0 iosd
230 I 1679065 3548848 0 4.11 3.44 3.11 0 Spanning Tree
197 I 4032665 2719257 0 0.22 0.44 0.44 0 Tunnel IOSd shim DB

balaji.bandi · ‎04-12-2019

What was the Uptime of the switch, Try reboot the switch and see if that resolved. If not time to upgrade to new Code.

Hope below URL helps you :

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/117594-technote-hicpu3850-00.html

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

johnlloyd_13 · ‎04-12-2019

that's the link/doc i used for my show commands :)

not sure if it's related to a port having low reliability and high output error/drop.

#sh int g1/0/12 counters errors

Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Gi1/0/12 0 0 18446744073687917395 0 0 18446744073687917395

Port Single-Col Multi-Col Late-Col Excess-Col Carri-Sen Runts
Gi1/0/12 0 0 0 0 0 0

#sh int g1/0/12

GigabitEthernet1/0/12 is up, line protocol is up (connected)

Hardware is Gigabit Ethernet, address is c414.3c99.a48c (bia c414.3c99.a48c)

MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,

reliability 251/255, txload 13/255, rxload 3/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX

input flow-control is off, output flow-control is unsupported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output never, output hang never

Last clearing of "show interface" counters 00:46:43

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 168972

Queueing strategy: fifo

Output queue: 0/40 (size/max)

30 second input rate 1302000 bits/sec, 778 packets/sec

30 second output rate 5225000 bits/sec, 1137 packets/sec

2672113 packets input, 651052605 bytes, 0 no buffer

Received 30856 broadcasts (13432 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 13432 multicast, 0 pause input

0 input packets with dribble condition detected

3902024 packets output, 2459883905 bytes, 0 underruns

168972 output errors, 0 collisions, 0 interface resets

0 unknown protocol drops

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 pause output

0 output buffer failures, 0 output buffers swapped out

joseph.h.nguyen · ‎04-12-2019

Your network symptoms may relate to a previous post, see link https://community.cisco.com/t5/switching/catalyst-3850-high-total-output-drops-and-output-errors/td-p/2896553.

In short, it may be a bad cable or some type of electromagnetic interference. You may swap cable and rule out that possibility as long as you don't disrupt any service.

Leo Laohoo · ‎04-12-2019

@johnlloyd_13 wrote:

Total output drops: 168972

168972 output errors

John,

This is a known bug. Notice the Total Output Drops and Output Errors are exactly equal? This is due to CSCvb65304. The switch running 3.6.X is a giveaway.

Can you try this command "ipv6 mld snooping" and see if it makes any improvements?

johnlloyd_13 · ‎04-13-2019

hi leo,
i checked the said command was already there but doesn't fix the high CPU.

the bug id didn't mention 3.6(4) which is the current IOS. the other switch also has the same code but doesn't show the same symptoms.

Leo Laohoo · ‎04-13-2019

@johnlloyd_13 wrote:

the bug id didn't mention 3.6(4) which is the current IOS. the other switch also has the same code but doesn't show the same symptoms.

Bug affects 3.6.X and 3.7.X. No idea why no one (from Cisco) bothered to update the Bug ID.

balaji.bandi · ‎04-13-2019

For temparary remdiation, if you have maintenance window, reboot teh device and test.

Since you have tried all the option we have now the last resort is Open a TAC case with bug id, so they can able to suggest to best.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help