11-13-2024 10:03 PM
Hi, looking for any advice on solving this issue, any help would be greatly appreciated since I dont have an active TAC support account.
I have a two member 3650 stack using IOS XE 16.9.5 that for the past 6 months crashes at random. It will be up for a month or less then one of the members will reboot/crash etc. 
I have another stack with the same switches and IOS version with no issues, so I can't seem to figure out what the issue is as the config its almost the same. I was able to download the crash log and attached what might be some relevant information. 
I tried doing some research on my own, trying to figure out what open bugs 19.6.5 has but looking at, 3650 IOS version info 
it only seems to show resolved caveats. I tried using the bug tracker but you need to know the bug you're looking for. I was hoping someone has a URL where I can get more info on a particular IOS version and determine all open bugs etc.
Again any suggestion or pointers is greatly appreciated. I know I will probably have to upgrade to a newer IOS, suggestions on that is welcomed also.
Thanks in advance, P 
 
NAME: "Switch 1", DESCR: "WS-C3650-48FD-E"
PID: WS-C3650-48FD-E   , VID: V01 
Switch Ports Model              SW Version        SW Image              Mode
------ ----- -----              ----------        ----------            ----
     1 52    WS-C3650-48PD      16.9.5            CAT3K_CAA-UNIVERSALK9 INSTALL
*    2 52    WS-C3650-48PD      16.9.5            CAT3K_CAA-UNIVERSALK9 INSTALLHere's the onboard utime log showing the software crash.
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 05/29/2020 19:41:24
Total uptime            :  4  years  22 weeks  1  days  6  hours  2  minutes
Total downtime          :  0  years  2  weeks  0  days  3  hours  29 minutes
Number of resets        : 18
Number of slot changes  : 0
Current reset reason    : Critical software exception
Current reset timestamp : 11/13/2024 06:08:08
Current slot            : 1
Chassis type            : 0
Current uptime          :  0  years  0  weeks  0  days  23 hours  5  minutes
--------------------------------------------------------------------------------Here's the partial crash file below, I can supply more info upon request. I pasted what I thought was the most relevant part of the file as I don't particular understand all the info am seeing. But it looks like its not a memory leak and I'm not sure if spanning-tree caused the issue.
========= Start of Crashinfo Collection (01:00:36 est Wed Nov 13 2024) =========
For image:
Cisco IOS Software [Fuji], Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 16.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Thu 30-Jan-20 18:17 by mcpre
Uptime = 6d21h
========= Exception Tracebacks =================================================
Exception to Fastpath Thread:
Frame pointer 0xFF8A4B6DA0, PC = 0xFFDC39A5FC
IOS Thread backtrace:
UNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning Tree
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+78DC7B8 :AAABB0F000+78DC770 :AAABB0F000+78DA240 :AAABB0F000+76A6FAC :AAABB0F000+78E96F8
Fastpath Thread backtrace:
-Traceback= 1#29f72ee29020089947733224e5b632bb   ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FFF1487000+374C p
thread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Auxiliary Thread backtrace:
-Traceback= 1#29f72ee29020089947733224e5b632bb   iosd_ngwc_unix:FFCEB5C000+47F2C prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
========= Top 100 Allocator PC summary =========================================
Nov 13 01:00:54 est: %SYS-3-INVMEMINT: Invalid memory action (malloc) at interrupt level
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+790B6EC :AAABB0F000+790BFC0 :AAABB0F000+8DDC950 :AAABB0F000+54B5FAC :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F000+54B70E0 :AAABB0F
000+1E30800 :AAABB0F000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5
C000+2EC14 prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
%Log packet overrun, PC 0xAAB101D40C, format:
%s
Nov 13 01:00:54 est: %SYS-2-MALLOCFAIL: Memory allocation of 64 bytes failed from 0xAAB0FC4FA4, alignment 0
Pool: Processor  Free: 491754096  Cause: Interrupt level allocation
Alternate Pool: None  Free: 0  Cause: Interrupt level allocation
 -Process= "<interrupt level>", ipl= 2, pid= 250
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+790A418 :AAABB0F000+790B730 :AAABB0F000+790BFC0 :AAABB0F000+8DDC950 :AAABB0F000+54B5FAC :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F
000+54B70E0 :AAABB0F000+1E30800 :AAABB0F000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdll**MSG 00001 TRUNCATED**
**MSG 00001 CONTINUATION #01**ib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Nov 13 01:00:54 est: %SYS-3-INTPRINT: Illegal printing attempt from interrupt level. -Process= "<interrupt level>", ipl= 2, pid= 250
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+7829560 :AAABB0F000+78279FC :AAABB0F000+54B6088 :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F000+54B70E0 :AAABB0F000+1E30800 :AAABB0F
000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FF
F1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Allocator PC Summary for: Processor
OVERFLOW = TRUE
 
Processor Memory: total 852233264, free 491754096, used 360479168
IO Memory: total 852233264, free 491754096, used 360479168
 
11-13-2024 11:49 PM
- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvw82898
Advising to upgrade to https://software.cisco.com/download/home/284850400/type/282046477/release/Gibraltar-16.12.12
M.
11-14-2024 11:15 AM
Marce, thank you!! I'm curious how did you search that bug? The bug tool asks for the bug id which I didn't have. Did you just search part of the tracelog on google? I'm trying to educate myself of getting better at looking up these bugs.
thanks i really appreciate it, P
11-14-2024 11:54 AM
@paul amaral wrote : >...Marce, thank you!! I'm curious how did you search that bug?
- Normally I always use Bug Search first but it didn't come up with anything (for :
UNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning TreeUNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning Treeand it still pointed me to the (a) bug report. Bug Search from cisco sometimes has limiting parsing capabilities and then does not give you the info that you need immediately.
M.
11-14-2024 12:26 PM
thank you!
11-13-2024 11:53 PM
Upgrade to 16.12.X and check, remove all crash logs from device to make some space.
https://software.cisco.com/download/home/284846032/type/282046477/release/Gibraltar-16.12.12
11-14-2024 11:17 AM
Thank you, I believe am hitting bug CSCvw82898. I will be upgrading to a known fixed version.
11-14-2024 11:51 AM
Sure let us know the outcome.
11-14-2024 04:12 PM
16.9.X with an uptime of >4 years? What are the chances it was caused by FN72323 - Cisco IOS XE Software: QuoVadis Root CA 2 Decommission Might Affect Smart Licensing, Smart Call Home, and Other Functionality.
Post the complete output to the following commands:
dir crashinfo-1:/tracelogs
dir crashinfo-1:/tracelogs | exclude gz
dir crashinfo-2:/tracelogs
dir crashinfo-2:/tracelogs | exclude gz
sh logs11-14-2024 06:30 PM
Hi, there was never any smart license errors on the logs prior to the crash. The software exception also points to a spanning tree issue/bug. I will be upgrading to a newer version, most likely 16.12.12 which should take care of this SSL issue also. I appreciate your input.
 
					
				
				
			
		
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide