cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1159
Views
4
Helpful
9
Replies

Cisco 3650 stack random crashes - IOS 16.9.5

paul amaral
Level 4
Level 4

Hi, looking for any advice on solving this issue, any help would be greatly appreciated since I dont have an active TAC support account.
I have a two member 3650 stack using IOS XE 16.9.5 that for the past 6 months crashes at random. It will be up for a month or less then one of the members will reboot/crash etc. 

I have another stack with the same switches and IOS version with no issues, so I can't seem to figure out what the issue is as the config its almost the same. I was able to download the crash log and attached what might be some relevant information. 

I tried doing some research on my own, trying to figure out what open bugs 19.6.5 has but looking at, 3650 IOS version info
it only seems to show resolved caveats. I tried using the bug tracker but you need to know the bug you're looking for. I was hoping someone has a URL where I can get more info on a particular IOS version and determine all open bugs etc.

Again any suggestion or pointers is greatly appreciated. I know I will probably have to upgrade to a newer IOS, suggestions on that is welcomed also.

Thanks in advance, P 
 

NAME: "Switch 1", DESCR: "WS-C3650-48FD-E"
PID: WS-C3650-48FD-E   , VID: V01 

Switch Ports Model              SW Version        SW Image              Mode
------ ----- -----              ----------        ----------            ----
     1 52    WS-C3650-48PD      16.9.5            CAT3K_CAA-UNIVERSALK9 INSTALL
*    2 52    WS-C3650-48PD      16.9.5            CAT3K_CAA-UNIVERSALK9 INSTALL

Here's the onboard utime log showing the software crash.

UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 05/29/2020 19:41:24
Total uptime            :  4  years  22 weeks  1  days  6  hours  2  minutes
Total downtime          :  0  years  2  weeks  0  days  3  hours  29 minutes
Number of resets        : 18
Number of slot changes  : 0
Current reset reason    : Critical software exception
Current reset timestamp : 11/13/2024 06:08:08
Current slot            : 1
Chassis type            : 0
Current uptime          :  0  years  0  weeks  0  days  23 hours  5  minutes
--------------------------------------------------------------------------------

Here's the partial crash file below, I can supply more info upon request. I pasted what I thought was the most relevant part of the file as I don't particular understand all the info am seeing. But it looks like its not a memory leak and I'm not sure if spanning-tree caused the issue. 

========= Start of Crashinfo Collection (01:00:36 est Wed Nov 13 2024) =========

For image:
Cisco IOS Software [Fuji], Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 16.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Thu 30-Jan-20 18:17 by mcpre


Uptime = 6d21h



========= Exception Tracebacks =================================================

Exception to Fastpath Thread:
Frame pointer 0xFF8A4B6DA0, PC = 0xFFDC39A5FC
IOS Thread backtrace:
UNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning Tree
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+78DC7B8 :AAABB0F000+78DC770 :AAABB0F000+78DA240 :AAABB0F000+76A6FAC :AAABB0F000+78E96F8
Fastpath Thread backtrace:
-Traceback= 1#29f72ee29020089947733224e5b632bb   ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FFF1487000+374C p
thread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Auxiliary Thread backtrace:
-Traceback= 1#29f72ee29020089947733224e5b632bb   iosd_ngwc_unix:FFCEB5C000+47F2C prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC



========= Top 100 Allocator PC summary =========================================

Nov 13 01:00:54 est: %SYS-3-INVMEMINT: Invalid memory action (malloc) at interrupt level
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+790B6EC :AAABB0F000+790BFC0 :AAABB0F000+8DDC950 :AAABB0F000+54B5FAC :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F000+54B70E0 :AAABB0F
000+1E30800 :AAABB0F000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5
C000+2EC14 prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
%Log packet overrun, PC 0xAAB101D40C, format:
%s

Nov 13 01:00:54 est: %SYS-2-MALLOCFAIL: Memory allocation of 64 bytes failed from 0xAAB0FC4FA4, alignment 0
Pool: Processor  Free: 491754096  Cause: Interrupt level allocation
Alternate Pool: None  Free: 0  Cause: Interrupt level allocation
 -Process= "<interrupt level>", ipl= 2, pid= 250
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+790A418 :AAABB0F000+790B730 :AAABB0F000+790BFC0 :AAABB0F000+8DDC950 :AAABB0F000+54B5FAC :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F
000+54B70E0 :AAABB0F000+1E30800 :AAABB0F000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdll**MSG 00001 TRUNCATED**
**MSG 00001 CONTINUATION #01**ib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FFF1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Nov 13 01:00:54 est: %SYS-3-INTPRINT: Illegal printing attempt from interrupt level. -Process= "<interrupt level>", ipl= 2, pid= 250
-Traceback= 1#29f72ee29020089947733224e5b632bb   :AAABB0F000+5510954 :AAABB0F000+550E414 :AAABB0F000+7829560 :AAABB0F000+78279FC :AAABB0F000+54B6088 :AAABB0F000+54B6C14 :AAABB0F000+54B6B80 :AAABB0F000+54B70E0 :AAABB0F000+1E30800 :AAABB0F
000+1E26E14 :AAABB0F000+1E27014 iosd_ngwc_unix:FFCEB5C000+428CC ld:FFF150C000+38010 tdllib:FFDC2F0000+AA5FE tdllib:FFDC2F0000+AA8AC iosd_ngwc_unix:FFCEB5C000+49178 iosd_ngwc_unix:FFCEB5C000+3A848 iosd_ngwc_unix:FFCEB5C000+2EC14 prelib:FF
F1487000+374C pthread:FFCE40F000+B240 c:FFCE94B000+11C0DC
Allocator PC Summary for: Processor
OVERFLOW = TRUE
 
Processor Memory: total 852233264, free 491754096, used 360479168
IO Memory: total 852233264, free 491754096, used 360479168


 

9 Replies 9

marce1000
Hall of Fame
Hall of Fame

 

            - FYI https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvw82898

   Advising to upgrade to https://software.cisco.com/download/home/284850400/type/282046477/release/Gibraltar-16.12.12

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Marce, thank you!! I'm curious how did you search that bug? The bug tool asks for the bug id which I didn't have. Did you just search part of the tracelog on google? I'm trying to educate myself of getting better at looking up these bugs. 

thanks i really appreciate it, P 

 

 @paul amaral wrote :    >...Marce, thank you!! I'm curious how did you search that bug?

  - Normally I always use  Bug Search first but it didn't come up with anything (for : 

UNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning Tree

then I searched google on

UNIX-EXT-SIGNAL: User defined signal 2(17), Process = Spanning Tree

 and it still pointed me to the (a) bug report. Bug Search from cisco sometimes has limiting parsing capabilities and then does not give you the info that you need immediately.

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

thank you!

balaji.bandi
Hall of Fame
Hall of Fame

Upgrade to 16.12.X and check, remove all crash logs from device to make some space.

https://software.cisco.com/download/home/284846032/type/282046477/release/Gibraltar-16.12.12

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Thank you, I believe am hitting bug CSCvw82898. I will be upgrading to a known fixed version.

 

Sure let us know the outcome.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Leo Laohoo
Hall of Fame
Hall of Fame

16.9.X with an uptime of >4 years?  What are the chances it was caused by FN72323 - Cisco IOS XE Software: QuoVadis Root CA 2 Decommission Might Affect Smart Licensing, Smart Call Home, and Other Functionality.

Post the complete output to the following commands: 

dir crashinfo-1:/tracelogs
dir crashinfo-1:/tracelogs | exclude gz
dir crashinfo-2:/tracelogs
dir crashinfo-2:/tracelogs | exclude gz
sh logs

Hi, there was never any smart license errors on the logs prior to the crash. The software exception also points to a spanning tree issue/bug. I will be upgrading to a newer version, most likely 16.12.12 which should take care of this SSL issue also.  I appreciate your input.