10-30-2024 08:15 AM - edited 10-30-2024 08:16 AM
We have (4) CBS220 that periodically reboot. At first at thought it was maybe a power issue, but we have confirmed that our UPS units are good and another Cisco switch (CBS250) on the same UPS as one of our CBS220 does not reboot. The CBS220 may be up a few weeks and then next day will show an uptime (in "show version") of just a few hours. The "show logging" does not reveal anything in particular.
I've started by disabling unused services :
no pnp enable
That did not appear to resolve it (had reboots a couple/few weeks later).
I've next disabled bonjour :
no bonjour enable
that was a few days ago and we are continuing to monitor to see if any further reboots occur.
There are Cisco TAC Cases I see someone has open with similar experience, but specific to having a mikrotik router in the environment :
https://bst.cisco.com/bugsearch/bug/CSCwh29387
https://bst.cisco.com/bugsearch/bug/CSCwf78354
we are running v2.0.2.14 which I believe is the latest version available.
I'm also watching "show memory statistics" to see if available memory drops slowly over time.
CPU appears to be normal when I check.
I manage these CBS220 strictly via ssh and not via web ui as that interface is just painfully slow. I suppose I could also do "no ip http secure-server" ("no ip http server" is already present in the config).
any other thoughts , things to check or disable?
I'm trying to avoid disabling other services like CDP/LLDP as those are generally useful for troubleshooting.
regards,
Jason
10-30-2024 08:47 AM
- It would be advisable besides 'show logging' to configure a central syslog server on these switches to capture overall logging ; then logs can be examined before devices reboot and check if a (last gasp) pattern can be detected.
The same applies for configuring and sending all snmp traps to an snmp manager (trap receiver) , which should then be examined with the same purposes when these devices reboot ,
M.
10-30-2024 04:01 PM
Hi,
There have ben known issues with this platform; I assume the switches are held under normal environment conditions, right? sure you run the latest version (if not upgrade) and open TAC case if it keeps happening, most likely you'll get an RMA.
Best,
Cristian.
10-31-2024 11:15 AM
yes latest version 2.0.2.14 and normal temp conditions environment. Knock on wood so far stable with bonjour disabled starting on friday oct 25th. so only be 6 days or so...need several weeks of uptime to gain some confidence on that being a valid workaround. I did setup logging to file ("warning" level or better). We don't have a syslog server or snmp mgmt server unfortunately to capture logs (small non-profit business here). maybe I can spin up something temporarily for syslog troubleshooting
10-31-2024 11:29 AM
- Syslog server easy to configure on a Linux VM ('spin it up!) ; e.g ; probably the same for a snmp trap receiver ,on the same host.
Searching on the internet for the subjects will get you there fast.
M.
11-06-2024 05:55 AM
still continuing to monitor...still stable without any reboots. available memory stats about where they were back on Oct 25th...holding steady.
12-02-2024 06:54 PM
Well, made it to about 42 days or so uptime and then rebooted. 3 of 4 cbs220 we have rebooted in last week at different times. Not much from log files.
12-03-2024 10:59 AM
turned on debug logging and noticed these every 15 min or so in the log....
*Dec 03 2024 13:38:45.768+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:44.808+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:44.158+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:41.098+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:40.518+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:38:39.808+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:23.237+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:22.947+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:22.157+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.807+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.467+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
*Dec 03 2024 13:25:17.357+-500: %MCAST-7-GROUP_RANGE_INVALID: Received invalid group range 224.0.0.X
these (4) CBS220 switches have IGMP snopping enabled (although the documentation states IGMP snooping is disabled by default , and the running-config has no line items related to IGMP). so, as another "thing to try" I've disabled igmp snooping to see if that helps (again, grasping at straws). I determined the source of the IGMPv3 messages that trigger the above syslog messages and it is a known system on the same local broadcast domain as the mgmt interfaces of these switches.
12-03-2024 11:15 AM
12-09-2024 05:08 AM
our 4th cbs220 rebooted over the weekend at around 50 days uptime. That was with igmp snooping disabled as well. I had syslog logging to external syslog server set at debug level I will check those logs but I'm not confident that they will reveal anything.
12-09-2024 11:09 AM
as expected, nothing in the syslogs around the time of the suspected crash and reboot. That was with debug level enabled
01-13-2025 07:10 AM - edited 01-13-2025 07:12 AM
two of four cbs220 appeared to "reboot" over the weekend (one on friday and one of SUnday) at the 50 day marker. I have a new theory : the switches are NOT rebooting, there is something with the "uptime" that gets reset after 50days. WHy do I think that ? the logs do not show any up/down events for the uplinks ports in/around the time of "reboot". I am using ntp (internal ntp server on our network, not the default ntp servers) and it shows in sync. the logs ("show logging") all show with an asterisk before them. Usually with Cisco IOS/IOS-XE switches that meant time NOT in sync at the time the log entry was generated. But the "show sntp configuration" and "show clock" are all showing "time source is sntp" and "sntp server status: Up".
so - what do folks think about this theory ? something amiss with the "uptime" function in the "show version" output that causes it to reset uptime at 50 days? I have two consistent 50 days cycles now on 2 of 4 switches. my other two switches are at 36 and 41 days so I'm going to watch those at 50 days.
01-28-2025 02:25 PM
I just posted about a reboot around 50 days. Your message interests me especially since my switch currently has an uptime of 48 days, while the tech support file indicates 147 days.
01-28-2025 02:59 PM - edited 01-30-2025 12:41 AM
But no, it can’t be that. My CCTV server stops recording because the PoE cameras reboot.
EDIT: no, finally it does not cut PoE power with this last firmware.
01-29-2025 05:08 AM
this is not the same here - other than the "show version" indicating recent uptime/reboot, there is no actual reboot or ports down/up in the syslog. In my case, it truly appears to be something wonky with the uptime in "show version". Maybe just a cosmetic bug in my case.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide