10-30-2024 08:15 AM - edited 10-30-2024 08:16 AM
We have (4) CBS220 that periodically reboot. At first at thought it was maybe a power issue, but we have confirmed that our UPS units are good and another Cisco switch (CBS250) on the same UPS as one of our CBS220 does not reboot. The CBS220 may be up a few weeks and then next day will show an uptime (in "show version") of just a few hours. The "show logging" does not reveal anything in particular.
I've started by disabling unused services :
no pnp enable
That did not appear to resolve it (had reboots a couple/few weeks later).
I've next disabled bonjour :
no bonjour enable
that was a few days ago and we are continuing to monitor to see if any further reboots occur.
There are Cisco TAC Cases I see someone has open with similar experience, but specific to having a mikrotik router in the environment :
https://bst.cisco.com/bugsearch/bug/CSCwh29387
https://bst.cisco.com/bugsearch/bug/CSCwf78354
we are running v2.0.2.14 which I believe is the latest version available.
I'm also watching "show memory statistics" to see if available memory drops slowly over time.
CPU appears to be normal when I check.
I manage these CBS220 strictly via ssh and not via web ui as that interface is just painfully slow. I suppose I could also do "no ip http secure-server" ("no ip http server" is already present in the config).
any other thoughts , things to check or disable?
I'm trying to avoid disabling other services like CDP/LLDP as those are generally useful for troubleshooting.
regards,
Jason
10-30-2024 08:47 AM
- It would be advisable besides 'show logging' to configure a central syslog server on these switches to capture overall logging ; then logs can be examined before devices reboot and check if a (last gasp) pattern can be detected.
The same applies for configuring and sending all snmp traps to an snmp manager (trap receiver) , which should then be examined with the same purposes when these devices reboot ,
M.
10-30-2024 04:01 PM
Hi,
There have ben known issues with this platform; I assume the switches are held under normal environment conditions, right? sure you run the latest version (if not upgrade) and open TAC case if it keeps happening, most likely you'll get an RMA.
Best,
Cristian.
10-31-2024 11:15 AM
yes latest version 2.0.2.14 and normal temp conditions environment. Knock on wood so far stable with bonjour disabled starting on friday oct 25th. so only be 6 days or so...need several weeks of uptime to gain some confidence on that being a valid workaround. I did setup logging to file ("warning" level or better). We don't have a syslog server or snmp mgmt server unfortunately to capture logs (small non-profit business here). maybe I can spin up something temporarily for syslog troubleshooting
10-31-2024 11:29 AM
- Syslog server easy to configure on a Linux VM ('spin it up!) ; e.g ; probably the same for a snmp trap receiver ,on the same host.
Searching on the internet for the subjects will get you there fast.
M.
11-06-2024 05:55 AM
still continuing to monitor...still stable without any reboots. available memory stats about where they were back on Oct 25th...holding steady.
12-02-2024 06:54 PM
Well, made it to about 42 days or so uptime and then rebooted. 3 of 4 cbs220 we have rebooted in last week at different times. Not much from log files.
12-03-2024 10:59 AM
turned on debug logging and noticed these every 15 min or so in the log....
*Dec 03 2024 13:38:45.768+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:44.808+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:44.158+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:41.098+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:40.518+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:39.808+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:23.237+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:22.947+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:22.157+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.807+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.467+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.357+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
these (4) CBS220 switches have IGMP snopping enabled (although the documentation states IGMP snooping is disabled by default , and the running-config has no line items related to IGMP). so, as another "thing to try" I've disabled igmp snooping to see if that helps (again, grasping at straws). I determined the source of the IGMPv3 messages that trigger the above syslog messages and it is a known system on the same local broadcast domain as the mgmt interfaces of these switches.
12-03-2024 11:15 AM
12-09-2024 05:08 AM
our 4th cbs220 rebooted over the weekend at around 50 days uptime. That was with igmp snooping disabled as well. I had syslog logging to external syslog server set at debug level I will check those logs but I'm not confident that they will reveal anything.
12-09-2024 11:09 AM
as expected, nothing in the syslogs around the time of the suspected crash and reboot. That was with debug level enabled
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide