Cisco CBS350 High CPU

groupccologin · ‎06-06-2023

I've recently deployed a 2x CBS350-48FP-4X stack and the CPU is pinned at 100% during production hours. There's a 2x CBS350-24FP-4X stack in the same location at the access layer with an almost identical config and that thing is fine.

This problem exists on 3.2 and 3.3 (the latest as of writing this) firmwares.

The stack is based in a hub office and the config isn't complicated, it has 9 APs, 2 firewalls in HA, 2 routers in HA, a couple of door entry controllers, a cctv dvr, a couple of ACLs and 1 static route. There are no cameras plugged into the switch.

When bandwidth testing, the switch actually is moving traffic at wire speed so I don't know whether I'm over thinking this, but the management plane and cli are awfully slow and my monitoring looks bad with the processor pinned all day!

We have tried all sorts now:

Temperatures are fine

I've rebooted it

There are no loops (I've also switched between MST and RPVST) and the appropriate ports are set to PortFast.

I've tried SNMPv2, v3 or just disabled it altogether

Disabled SmartPort thingy, GVRP, sFlow, Bonjour, loop-back detection, link-flap detection, storm-control. If it's useful, assume I've turned it off and on and observed the results, but I'm very open to more ideas!

We're not using the Business Dashboard

Typically there's a max of ~180 addresses in the mac address table

No jumbo frames are hitting the switch

None of the interfaces have errors or are flapping

I've raised a case with Cisco TAC, however they've suggested rebuilding the switch, virus scanning the switch (yuh huh) and I've been sent lots of IOS based device troubleshooting articles and YouTube videos and they've told us the switch hardware might not be powerful enough which is rubbish because the switches are moving considerably less traffic than the Dell stack they just replaced.

All in all Cisco support has been poor to date . I guess these switches are deemed the cheaper option and that is most defintely reflected in the support.

Thanks in advance

KJK99 · ‎06-06-2023

@groupccologin

Are you saying that one switch in the stack is at v3.2 and the other at v3.3? If you stack CBS350 switches, they should be running the same version of the firmware.

Kris K

groupccologin · ‎06-06-2023

Sorry if it wasnt clear both in the stack are v3.3

Damian

buzzbombers · ‎05-01-2024

Did you end up getting to the bottom of this? I too have the same issue, I am not routing traffic back our the same interface which is an apparent bug and have done all you mentioned too, yet to contact support as I find I can often work it out myself instead of going down a dead-end path.

groupccologin · ‎05-01-2024

We did indeed.

The high CPU on the CBS350 was caused by bug CSCwe47566 (TAC found it so I can't take any credit, I only implemented the fix) where the traffic was routed out the same subnet/vlan interface it came in on.

An example of this scenario would be:

Clients: DHCP range 10.0.0.50-200

Switch SVI: 10.0.0.2

Firewall Interface: 10.0.0.1

Clients have their gateway as 10.0.0.2 and the switch default route is 10.0.0.1.

Solutions: Implement a /30 transit subnet between the switch and the firewall, or implement a router-on-a-stick configuration.

I was advised this operation is correct. Apparently inter-VLAN routing happens in hardware, same-vlan routing happens in software.

Bigger switches like the 9200 do not have this issue.

If it's not the above bug then I'm unsure sorry, I'd start switching features off (SmartPort thing, GVRP, sFlow, Bonjour, loop-back detection, link-flap detection, storm-control etc) or check other things like jumbo frames and interface errors see what the results are.

I hope that helps.

buzzbombers · ‎05-01-2024

I did see others stating an issue with redirecting clients to a different gateway (in/out the same interface) but we aren't doing that.

I do use SmartPort but for VoIP handsets only as it helps with ease of deployment and the feature is awesome.

I just disabled SNMP, Bonjour, HTTP, enabled portfast and bpduguard on every client facing port and link type point-to-point on all switch uplinks, CPU is down from nearly 100 to about 70.

I will advise if I find anything else for others that may have the same problem.

Thanks for your input.