cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4729
Views
0
Helpful
23
Replies

9200L Stack Crash with no recovery

CaeCae
Level 1
Level 1

I have a few 9200L stacks that will crash every so (usually about a week, upwards of 3). The stack will go down and not recover, and I am unable to console into 3 of the 4 switches in the stack until I power cycle the whole stack. The switch that stays up is ALWAYS the standby switch. Hardware is C9200L-48P-4X. IOS release is 17.03.05 (Same issue was present on 17.03.04b).

Skimming the crash logs still, but I do see an instance where it looks like the stack-mgr process crashes. It list - PROCESS : exit code for stack-mgr was 69. I've posted some outputs below. Thanks!

 

sh log on sw 1 up de

--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 10/18/2021 09:39:10
Total uptime : 0 years 24 weeks 2 days 20 hours 43 minutes
Total downtime : 0 years 31 weeks 1 days 9 hours 23 minutes
Number of resets : 30
Number of slot changes : 0
Current reset reason : Power Failure or Unknown
Current reset timestamp : 11/11/2022 13:46:37
Current slot : 1
Chassis type : 247
Current uptime : 0 years 0 weeks 0 days 2 hours 0 minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
10/18/2021 09:39:10 Power Failure or Unknown 0 0 0 0 0
10/18/2021 09:57:41 Image Install 0 0 0 0 15
10/18/2021 10:02:02 Power Failure or Unknown 0 0 0 0 0
10/18/2021 11:08:06 Power Failure or Unknown 0 0 0 0 0
10/18/2021 12:23:58 Reload Command 0 0 0 0 0
10/18/2021 12:42:35 Image Install 0 0 0 0 15
10/25/2021 06:34:16 Power Failure or Unknown 0 0 0 0 0
10/25/2021 06:40:31 Reload Command 0 0 0 0 0
10/25/2021 06:45:32 Power Failure or Unknown 0 0 0 0 0
05/17/2022 14:44:49 Power Failure or Unknown 0 0 0 0 0
05/17/2022 16:38:59 Image Install 0 0 0 1 15
05/17/2022 18:08:45 Reload Command 0 0 0 1 0
05/17/2022 18:23:49 Reload Command 0 0 0 0 10
05/17/2022 18:37:47 Power Failure or Unknown 0 0 0 0 5
05/25/2022 15:14:17 Power Failure or Unknown 0 0 0 20 0
05/25/2022 16:23:48 Reload Slot Command 0 0 0 1 0
06/28/2022 21:03:03 Power Failure or Unknown 0 4 6 1 10
06/30/2022 19:11:42 Power Failure or Unknown 0 0 1 20 0
08/03/2022 05:11:00 Power Failure or Unknown 0 4 5 9 2
10/02/2022 19:46:57 Power Failure or Unknown 0 8 4 14 5
10/07/2022 01:56:31 EHSA keepalive timeout 0 0 4 6 0
10/07/2022 14:04:42 Power Failure or Unknown 0 0 0 12 0
10/11/2022 17:28:52 EHSA keepalive timeout 0 0 4 3 0
10/11/2022 18:52:36 Power Failure or Unknown 0 0 0 1 0
10/11/2022 19:18:07 active removed before switch beca 0 0 0 0 20
10/15/2022 21:46:30 EHSA keepalive timeout 0 0 4 2 0
10/17/2022 12:19:49 Power Failure or Unknown 0 0 1 14 0
10/19/2022 01:29:28 Image Install 0 0 1 13 0
11/11/2022 00:38:05 EHSA keepalive timeout 0 3 1 23 2
11/11/2022 13:46:37 Power Failure or Unknown 0 0 0 13 0

 

sh pla so statu con bri

Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 1.27 1.29 1.35
2-RP0 Healthy 0.74 0.84 0.83
3-RP0 Healthy 0.44 0.44 0.49
4-RP0 Healthy 0.35 0.42 0.44
5-RP0 Healthy 0.33 0.54 0.63
6-RP0 Healthy 0.93 0.90 0.83

Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 1984308 1313808 (66%) 670500 (34%) 1878608 (95%)
2-RP0 Healthy 1984308 1282816 (65%) 701492 (35%) 1722128 (87%)
3-RP0 Healthy 1984308 826592 (42%) 1157716 (58%) 841292 (42%)
4-RP0 Healthy 1984308 830172 (42%) 1154136 (58%) 832164 (42%)
5-RP0 Healthy 1984308 829388 (42%) 1154920 (58%) 815784 (41%)
6-RP0 Healthy 1984308 829624 (42%) 1154684 (58%) 815096 (41%)

CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 18.29 12.61 0.00 66.66 1.89 0.52 0.00
1 17.19 11.78 0.00 69.74 0.74 0.53 0.00
2 17.92 9.65 0.00 71.15 0.74 0.53 0.00
3 18.55 10.27 0.00 69.91 0.62 0.62 0.00
2-RP0 0 11.14 8.12 0.00 79.06 1.25 0.41 0.00
1 11.55 8.50 0.00 78.78 0.73 0.42 0.00
2 13.72 7.73 0.00 77.29 0.72 0.51 0.00
3 10.42 7.19 0.00 81.23 0.62 0.52 0.00
3-RP0 0 8.52 3.35 0.00 87.00 1.01 0.10 0.00
1 7.09 3.39 0.00 88.90 0.51 0.10 0.00
2 6.95 3.53 0.00 88.88 0.51 0.10 0.00
3 8.76 3.36 0.00 87.15 0.50 0.20 0.00
4-RP0 0 8.57 3.71 0.00 86.57 0.92 0.20 0.00
1 7.86 3.67 0.00 87.74 0.51 0.20 0.00
2 7.96 3.20 0.00 88.00 0.62 0.20 0.00
3 7.05 3.57 0.00 88.65 0.51 0.20 0.00
5-RP0 0 9.48 3.67 0.00 85.40 1.12 0.30 0.00
1 9.37 4.38 0.00 85.42 0.50 0.30 0.00
2 11.96 3.44 0.00 83.87 0.60 0.10 0.00
3 10.70 4.28 0.00 84.40 0.50 0.10 0.00
6-RP0 0 12.06 4.22 0.00 82.16 1.34 0.20 0.00
1 9.69 4.58 0.00 84.88 0.62 0.20 0.00
2 11.32 4.63 0.00 83.21 0.61 0.20 0.00
3 13.17 4.80 0.00 81.30 0.51 0.20 0.00

 

3 Accepted Solutions

Accepted Solutions

marce1000
Hall of Fame
Hall of Fame

 

 - You may try latest advisory release : https://software.cisco.com/download/home/286320060/type/282046477/release/Bengaluru-17.6.4

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

View solution in original post

Numbers look OK.  

Try upgrading to 17.6.4.

View solution in original post

CaeCae
Level 1
Level 1

We just upgraded one of our IDF stacks to 17.6.4. Already memory utilization is better than what it was before. *Note, this is not the same stack as previous outputs provided in thread, it is the same environment minus 2 switches. All outputs and crashes were the same/similar as the other stacks. Already we have noticed an improvement in memory utilization. I will continue to monitor and update this thread of any changes. I'm going to accept as a solution in the meantime. Thank you both for your help!

sh pla so statu con bri
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 0.56 1.15 1.29
2-RP0 Healthy 0.35 0.67 0.78
3-RP0 Healthy 0.20 0.41 0.47
4-RP0 Healthy 0.35 0.61 0.55

Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 1984168 1153740 (58%) 830428 (42%) 1866780 (94%)
2-RP0 Healthy 1984168 1078448 (54%) 905720 (46%) 1717740 (87%)
3-RP0 Healthy 1984168 800268 (40%) 1183900 (60%) 840688 (42%)
4-RP0 Healthy 1984168 801976 (40%) 1182192 (60%) 840456 (42%)

CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 10.75 6.88 0.00 80.89 1.25 0.20 0.00
1 9.17 7.50 0.00 82.48 0.52 0.31 0.00
2 11.15 5.73 0.00 82.27 0.62 0.20 0.00
3 9.51 5.33 0.00 84.41 0.52 0.20 0.00
2-RP0 0 7.08 3.69 0.00 88.29 0.82 0.10 0.00
1 6.48 2.98 0.00 89.80 0.51 0.20 0.00
2 7.82 2.88 0.00 88.67 0.51 0.10 0.00
3 5.48 3.61 0.00 90.27 0.51 0.10 0.00
3-RP0 0 6.87 1.82 0.00 90.39 0.80 0.10 0.00
1 5.72 3.06 0.00 90.70 0.40 0.10 0.00
2 6.39 2.94 0.00 90.15 0.40 0.10 0.00
3 6.67 3.13 0.00 89.58 0.50 0.10 0.00
4-RP0 0 9.21 3.95 0.00 85.61 1.11 0.10 0.00
1 7.88 3.84 0.00 87.76 0.40 0.10 0.00
2 7.47 3.68 0.00 88.21 0.51 0.10 0.00
3 9.77 3.86 0.00 85.84 0.40 0.10 0.00

View solution in original post

23 Replies 23

marce1000
Hall of Fame
Hall of Fame

 

 - You may try latest advisory release : https://software.cisco.com/download/home/286320060/type/282046477/release/Bengaluru-17.6.4

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

We currently have a TAC case open. Hence why we upgrade from 17.03.04b. Will run your response by them before proceeding.

Thanks!

Leo Laohoo
Hall of Fame
Hall of Fame

@CaeCae wrote:
11/11/2022 00:38:05 EHSA keepalive timeout 0 3 1 23 2
11/11/2022 13:46:37 Power Failure or Unknown 0 0 0 13 0

Reading this thread makes me "angry" (trying to keep this wholesome and rated "PG").  There are so many "red flags" and if Cisco TAC cannot see ... (remember, keep it "wholesome").  "EHSA keepalive timeout" is red flag #1. 


@CaeCae wrote:
1-RP0 Healthy 1984308 1313808 (66%) 670500 (34%) 1878608 (95%)
2-RP0 Healthy 1984308 1282816 (65%) 701492 (35%) 1722128 (87%)

Memory leak present in switch member 1 and switch member 2.  Normal memory utilization (after a crash) should be <40%.  Normal memory utilization should be <50%.  This is already red flag #2.  

Post the complete output to the following commands: 

  1. dir flash-1:
  2. dir flash-1:crashinfo
  3. dir flash-1:tracelogs | ex *bin*.*
  4. dir flash-2:
  5. dir flash-2:crashinfo
  6. dir flash-2:tracelogs | ex *bin*.*

@CaeCae wrote:
IOS release is 17.03.05 (Same issue was present on 17.03.04b).

Be prepared to upgrade to 17.6.4.  


@CaeCae wrote:
Skimming the crash logs still, but I do see an instance where it looks like the stack-mgr process crashes.

There are several many IOS-XE bugs where a switch, router, WLC, on IOS-XE, would crash and would not leave any useful crashinfo file(s).  It will only say the platform reboot due to "power issue" or "EHSA keepalive timeout".  Finding the cause of the crash is going to be a challenge.  

Next, each IOS-XE version have different "standards" what crashinfo file(s) to generate and is kept in different parts of the flash.  Sometimes, it is kept in the main flash directory, sometimes in the crashinfo sub-directory.  Recently, we found some in the tracelogs subdirectory but it is buried in between junk BIN or GZ files.  

Thank you for your response. Here are the outputs requested

dir flash-1:
Directory of flash:/

73128 drwx 4096 Nov 12 2022 01:30:55 +00:00 sdavc
40505 -rw- 109509 Nov 11 2022 21:15:22 +00:00 collated_log_20221111-211521
72868 drwx 4096 Nov 11 2022 21:09:07 +00:00 tech_support
80961 drwx 4096 Nov 11 2022 13:52:20 +00:00 .installer
40495 -rw- 1336 Nov 11 2022 13:51:57 +00:00 stby-vlan.dat
40492 -rw- 2097152 Nov 11 2022 13:49:20 +00:00 nvram_config_bkup
40491 -rw- 2097152 Nov 11 2022 13:49:20 +00:00 nvram_config
72904 drwx 4096 Nov 11 2022 13:49:04 +00:00 license_evlog
40506 -rw- 1477802 Nov 11 2022 13:49:00 +00:00 AvcWeb.tar.zip
40502 -rw- 1336 Nov 11 2022 13:48:56 +00:00 vlan.dat
40488 -rw- 15500 Nov 11 2022 13:47:54 +00:00 rdope_out.txt
40630 -rw- 0 Nov 11 2022 13:47:54 +00:00 dope_hist
40489 -rw- 89 Nov 11 2022 13:47:50 +00:00 rdope.log
40486 -rw- 134458 Nov 11 2022 13:45:53 +00:00 memleak.tcl
40482 -rw- 2149 Nov 11 2022 13:45:38 +00:00 boothelper.log
72891 drwx 4096 Nov 11 2022 13:45:36 +00:00 dc_profile_dir
40483 -rw- 999 Nov 11 2022 13:45:17 +00:00 bootloader_evt_handle.log
97153 drwx 4096 Nov 11 2022 00:38:05 +00:00 .prst_sync
40485 -rw- 2149 Nov 11 2022 00:37:06 +00:00 boothelper.log.old
89057 drwx 4096 Oct 19 2022 01:24:56 +00:00 .rollback_timer
40509 -rw- 4920 Oct 19 2022 01:23:09 +00:00 packages.conf
73144 -rw- 4920 Oct 19 2022 01:15:54 +00:00 cat9k_lite_iosxe.17.03.05.SPA.conf
73148 -rw- 40713581 Oct 19 2022 01:15:54 +00:00 cat9k_lite-rpboot.17.03.05.SPA.pkg
73147 -rw- 11043860 Oct 19 2022 01:14:31 +00:00 cat9k_lite-webui.17.03.05.SPA.pkg
73146 -rw- 4367384 Oct 19 2022 01:14:31 +00:00 cat9k_lite-srdriver.17.03.05.SPA.pkg
73145 -rw- 427349016 Oct 19 2022 01:14:29 +00:00 cat9k_lite-rpbase.17.03.05.SPA.pkg
40494 -rw- 483131849 Oct 19 2022 01:05:16 +00:00 cat9k_lite_iosxe.17.03.05.SPA.bin
73134 drwx 4096 Oct 11 2022 19:13:11 +00:00 .rommon_sync
72935 drwx 4096 Sep 27 2022 19:26:09 +00:00 nbar2
40504 -rw- 2672640 Sep 27 2022 19:26:04 +00:00 nbar2_http_default.tar
72901 drwx 4096 May 17 2022 18:10:35 +00:00 pnp-info
40493 -rw- 12472 May 17 2022 16:41:34 +00:00 pnp-archive-May-17-16-41-33.729-0
72907 drwx 4096 May 17 2022 16:33:55 +00:00 pnp-tech
72865 drwx 4096 May 17 2022 16:31:29 +00:00 core
40501 -rw- 40716729 May 17 2022 16:25:54 +00:00 cat9k_lite-rpboot.17.03.04b.SPA.pkg
40500 -rw- 11035672 May 17 2022 16:24:34 +00:00 cat9k_lite-webui.17.03.04b.SPA.pkg
40499 -rw- 4256792 May 17 2022 16:24:34 +00:00 cat9k_lite-srdriver.17.03.04b.SPA.pkg
40498 -rw- 427189272 May 17 2022 16:24:34 +00:00 cat9k_lite-rpbase.17.03.04b.SPA.pkg
40490 -rw- 2318 May 17 2022 15:03:11 +00:00 dir_copy
72902 drwx 4096 Oct 18 2021 12:24:47 +00:00 onep
72894 drwx 4096 Oct 18 2021 12:23:14 +00:00 Tbot
72893 drwx 4096 Oct 18 2021 12:23:08 +00:00 sys_report
72867 drwx 4096 Oct 18 2021 12:23:06 +00:00 ss_disc
40484 -rw- 5242880 Oct 18 2021 12:23:06 +00:00 ssd
80966 drwx 4096 Oct 18 2021 09:43:03 +00:00 .dbpersist
40497 drwx 4096 Feb 25 2019 13:13:15 +00:00 avcui

1956839424 bytes total (376373248 bytes free)

dir crashinfo-1:
Directory of crashinfo:/

36641 drwx 32768 Nov 12 2022 01:33:15 +00:00 tracelogs
33 -rw- 1897327 Nov 11 2022 02:42:53 +00:00 RP_0_trace_archive_3-20221111-024251.tar.gz
32 -rw- 1790532 Nov 11 2022 02:10:51 +00:00 RP_0_trace_archive_2-20221111-021050.tar.gz
31 -rw- 1711647 Nov 11 2022 01:38:50 +00:00 RP_0_trace_archive_1-20221111-013848.tar.gz
30 -rw- 1657620 Nov 11 2022 01:06:49 +00:00 RP_0_trace_archive_0-20221111-010647.tar.gz
29 -rw- 15243748 Nov 11 2022 00:35:16 +00:00 SANBROCK_ES-9200-MDF_1_RP_0-system-report_1_20221111-003457-UTC.tar.gz
28 -rw- 1930287 Oct 16 2022 00:00:10 +00:00 RP_0_trace_archive_3-20221016-000008.tar.gz
27 -rw- 1819059 Oct 15 2022 23:25:56 +00:00 RP_0_trace_archive_2-20221015-232554.tar.gz
26 -rw- 1710278 Oct 15 2022 22:51:42 +00:00 RP_0_trace_archive_1-20221015-225140.tar.gz
25 -rw- 1594418 Oct 15 2022 22:17:29 +00:00 RP_0_trace_archive_0-20221015-221727.tar.gz
24 -rw- 15117732 Oct 15 2022 21:43:43 +00:00 SANBROCK_ES-9200-MDF_1_RP_0-system-report_1_20221015-214327-UTC.tar.gz
23 -rw- 1869519 Oct 11 2022 19:13:14 +00:00 RP_0_trace_archive_2-20221011-191310.tar.gz
22 -rw- 1891191 Oct 11 2022 19:13:11 +00:00 RP_0_trace_archive_1-20221011-191309.tar.gz
21 -rw- 2103580 Oct 11 2022 19:08:17 +00:00 RP_0_trace_archive_0-20221011-190815.tar.gz
20 -rw- 1707221 Oct 11 2022 18:34:04 +00:00 RP_0_trace_archive_1-20221011-183402.tar.gz
19 -rw- 1589496 Oct 11 2022 17:59:51 +00:00 RP_0_trace_archive_0-20221011-175949.tar.gz
18 -rw- 14931455 Oct 11 2022 17:26:05 +00:00 SANBROCK_ES-9200-MDF_1_RP_0-system-report_1_20221011-172550-UTC.tar.gz
17 -rw- 1936359 Oct 7 2022 04:10:11 +00:00 RP_0_trace_archive_3-20221007-041009.tar.gz
16 -rw- 1828602 Oct 7 2022 03:35:58 +00:00 RP_0_trace_archive_2-20221007-033556.tar.gz
15 -rw- 1717225 Oct 7 2022 03:01:45 +00:00 RP_0_trace_archive_1-20221007-030143.tar.gz
14 -rw- 1603181 Oct 7 2022 02:27:32 +00:00 RP_0_trace_archive_0-20221007-022730.tar.gz
13 -rw- 14928891 Oct 7 2022 01:53:44 +00:00 SANBROCK_ES-9200-MDF_1_RP_0-system-report_1_20221007-015329-UTC.tar.gz
12 -rw- 2522138 May 25 2022 15:47:39 +00:00 SANBROCK_ES-9200-MDF_1_RP_0_trace_archive_0-20220525-154735.tar.gz
11 -rw- 0 Apr 30 2021 03:53:24 +00:00 koops.dat

825638912 bytes total (679247872 bytes free)

dir crashinfo-1:tracelogs | ex .bin
Directory of crashinfo:/tracelogs/

37108 -rw- 51574 Nov 11 2022 00:35:23 +00:00 shutdown_rp0.log
36642 -rw- 210490 Nov 11 2022 00:35:23 +00:00 shutdown_journal_rp0.log
37092 -rw- 383 Nov 11 2022 00:34:50 +00:00 shutdown_fp0.log
37091 -rw- 383 Nov 11 2022 00:34:50 +00:00 shutdown_cc0.log
37148 -rw- 391 Oct 19 2022 01:33:46 +00:00 cia-confderr.20221111003451.log.1.gz
37149 -rw- 65 Oct 19 2022 01:32:07 +00:00 cia-confderr.20221111003451.log.idx.gz
37150 -rw- 63 Oct 19 2022 01:32:07 +00:00 cia-confderr.20221111003451.log.siz.gz
36991 -rw- 417 Oct 17 2022 12:24:16 +00:00 cia-confderr.20221019012643.log.1.gz
36999 -rw- 63 Oct 17 2022 12:22:22 +00:00 cia-confderr.20221019012643.log.siz.gz
36993 -rw- 65 Oct 17 2022 12:22:22 +00:00 cia-confderr.20221019012643.log.idx.gz
37227 -rw- 416 Oct 11 2022 19:22:29 +00:00 cia-confderr.20221015214320.log.1.gz
37229 -rw- 63 Oct 11 2022 19:20:38 +00:00 cia-confderr.20221015214320.log.siz.gz
37228 -rw- 65 Oct 11 2022 19:20:38 +00:00 cia-confderr.20221015214320.log.idx.gz
36661 -rw- 10 Sep 14 2022 14:47:37 +00:00 timestamp
36657 -rw- 20126 Sep 14 2022 14:47:37 +00:00 dmesg

825638912 bytes total (678690816 bytes free)

Posting in 2 parts since it's... so much info.... also, those exact commands didn't exist for me, so if I'm missing some please let me know. I'm still new to this (only got my ccna about 4 or 5 months ago).

dir flash-2:
Directory of flash-2:/

80966 drwx 4096 Nov 12 2022 01:44:40 +00:00 sdavc
81198 -rw- 0 Nov 11 2022 18:35:10 +00:00 dope_hist
11 drwx 4096 Nov 11 2022 13:56:39 +00:00 .installer
80968 -rw- 1336 Nov 11 2022 13:53:58 +00:00 vlan.dat
16217 -rw- 2097152 Nov 11 2022 13:53:57 +00:00 nvram_config_bkup
16216 -rw- 2097152 Nov 11 2022 13:53:57 +00:00 nvram_config
72865 drwx 4096 Nov 11 2022 13:53:38 +00:00 license_evlog
80986 -rw- 1477802 Nov 11 2022 13:48:58 +00:00 AvcWeb.tar.zip
16212 -rw- 15500 Nov 11 2022 13:47:53 +00:00 rdope_out.txt
16213 -rw- 89 Nov 11 2022 13:47:50 +00:00 rdope.log
16201 -rw- 134458 Nov 11 2022 13:45:52 +00:00 memleak.tcl
80963 -rw- 2149 Nov 11 2022 13:45:37 +00:00 boothelper.log
32408 drwx 4096 Nov 11 2022 13:45:35 +00:00 dc_profile_dir
16193 -rw- 888 Nov 11 2022 13:45:16 +00:00 bootloader_evt_handle.log
16194 drwx 4096 Oct 19 2022 01:34:53 +00:00 .prst_sync
80962 -rw- 2149 Oct 19 2022 01:28:30 +00:00 boothelper.log.old
8097 drwx 4096 Oct 19 2022 01:24:58 +00:00 .rollback_timer
81182 -rw- 4920 Oct 19 2022 01:23:07 +00:00 packages.conf
16220 -rw- 4920 Oct 19 2022 01:15:39 +00:00 cat9k_lite_iosxe.17.03.05.SPA.conf
16224 -rw- 40713581 Oct 19 2022 01:15:39 +00:00 cat9k_lite-rpboot.17.03.05.SPA.pkg
16223 -rw- 11043860 Oct 19 2022 01:14:18 +00:00 cat9k_lite-webui.17.03.05.SPA.pkg
16222 -rw- 4367384 Oct 19 2022 01:14:17 +00:00 cat9k_lite-srdriver.17.03.05.SPA.pkg
16221 -rw- 427349016 Oct 19 2022 01:14:17 +00:00 cat9k_lite-rpbase.17.03.05.SPA.pkg
80969 -rw- 483131849 Oct 19 2022 01:05:16 +00:00 cat9k_lite_iosxe.17.03.05.SPA.bin
81177 -rw- 2 Oct 13 2022 16:49:39 +00:00 collated_log_20221013-164938
80978 drwx 4096 Sep 27 2022 19:26:13 +00:00 nbar2
80977 -rw- 2672640 Sep 27 2022 19:26:09 +00:00 nbar2_http_default.tar
80967 drwx 4096 May 25 2022 16:20:56 +00:00 .rommon_sync
113347 drwx 4096 May 17 2022 18:10:26 +00:00 pnp-info
80965 -rw- 12472 May 17 2022 16:41:12 +00:00 pnp-archive-May-17-16-41-11.162-0
18 drwx 4096 May 17 2022 16:33:31 +00:00 pnp-tech
14 drwx 4096 May 17 2022 16:31:03 +00:00 core
80973 -rw- 40716729 May 17 2022 16:25:28 +00:00 cat9k_lite-rpboot.17.03.04b.SPA.pkg
80972 -rw- 11035672 May 17 2022 16:24:05 +00:00 cat9k_lite-webui.17.03.04b.SPA.pkg
80971 -rw- 4256792 May 17 2022 16:24:04 +00:00 cat9k_lite-srdriver.17.03.04b.SPA.pkg
80970 -rw- 427189272 May 17 2022 16:24:04 +00:00 cat9k_lite-rpbase.17.03.04b.SPA.pkg
80964 -rw- 2318 May 17 2022 15:02:45 +00:00 dir_copy
16 drwx 4096 Oct 18 2021 07:53:59 +00:00 .dbpersist
16215 drwx 4096 Oct 18 2021 07:50:56 +00:00 onep
16200 drwx 4096 Oct 18 2021 07:49:31 +00:00 .USWAP
16202 drwx 4096 Oct 18 2021 07:49:22 +00:00 Tbot
48577 drwx 4096 Oct 18 2021 07:49:21 +00:00 .CRFT
32410 drwx 4096 Oct 18 2021 07:49:16 +00:00 sys_report
32385 drwx 4096 Oct 18 2021 07:49:15 +00:00 tech_support
16198 drwx 4096 Oct 18 2021 07:49:15 +00:00 ss_disc
16197 -rw- 5242880 Oct 18 2021 07:49:15 +00:00 ssd
80975 drwx 4096 Feb 25 2019 13:13:15 +00:00 avcui

1956904960 bytes total (376438784 bytes free)

dir crashinfo-2:
Directory of crashinfo-2:/

7329 drwx 65536 Nov 12 2022 01:45:32 +00:00 tracelogs
31 -rw- 4667948 Nov 11 2022 02:42:57 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_4-20221111-024252.tar.gz
30 -rw- 4708868 Nov 11 2022 02:10:55 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_3-20221111-021050.tar.gz
29 -rw- 4494282 Nov 11 2022 01:38:52 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_2-20221111-013848.tar.gz
28 -rw- 4374005 Nov 11 2022 01:06:53 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_1-20221111-010648.tar.gz
27 -rw- 4136990 Nov 11 2022 00:34:53 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_0-20221111-003446.tar.gz
26 -rw- 4686888 Oct 16 2022 00:00:14 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_4-20221016-000010.tar.gz
25 -rw- 4638288 Oct 15 2022 23:26:00 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_3-20221015-232556.tar.gz
24 -rw- 4593034 Oct 15 2022 22:51:46 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_2-20221015-225142.tar.gz
23 -rw- 4476998 Oct 15 2022 22:17:33 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_1-20221015-221729.tar.gz
22 -rw- 4124487 Oct 15 2022 21:43:23 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_0-20221015-214316.tar.gz
21 -rw- 6124533 Oct 11 2022 19:13:16 +00:00 RP_0_trace_archive_1-20221011-191310.tar.gz
20 -rw- 6318533 Oct 11 2022 19:08:20 +00:00 RP_0_trace_archive_0-20221011-190816.tar.gz
19 -rw- 6059056 Oct 11 2022 18:34:06 +00:00 RP_0_trace_archive_0-20221011-183402.tar.gz
18 -rw- 5010573 Oct 7 2022 14:01:23 +00:00 SANBROCK_ES-9200-MDF_trace_archive_5-20221007-140118.tar.gz
17 -rw- 4631617 Oct 7 2022 04:10:15 +00:00 SANBROCK_ES-9200-MDF_trace_archive_4-20221007-041011.tar.gz
16 -rw- 4673297 Oct 7 2022 03:36:01 +00:00 SANBROCK_ES-9200-MDF_trace_archive_3-20221007-033557.tar.gz
15 -rw- 4497549 Oct 7 2022 03:01:48 +00:00 SANBROCK_ES-9200-MDF_trace_archive_2-20221007-030144.tar.gz
14 -rw- 4263595 Oct 7 2022 02:27:35 +00:00 SANBROCK_ES-9200-MDF_trace_archive_1-20221007-022731.tar.gz
13 -rw- 3788061 Oct 7 2022 01:53:23 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_0-20221007-015318.tar.gz
11 -rw- 2041374 May 25 2022 15:47:35 +00:00 SANBROCK_ES-9200-MDF_2_RP_0_trace_archive_0-20220525-154732.tar.gz
12 -rw- 0 Apr 30 2021 03:53:24 +00:00 koops.dat

825753600 bytes total (675020800 bytes free)

dir crashinfo-2:tracelogs | ex .bin
Directory of crashinfo-2:/tracelogs/

7365 -rw- 660 Oct 19 2022 01:26:48 +00:00 shutdown_rp0.log
7339 -rw- 282398 Oct 19 2022 01:26:47 +00:00 shutdown_journal_rp0.log
7442 -rw- 383 Oct 19 2022 01:26:43 +00:00 shutdown_fp0.log
7458 -rw- 383 Oct 19 2022 01:26:41 +00:00 shutdown_cc0.log
8128 -rw- 62 Oct 17 2022 12:27:20 +00:00 cia-confderr.20221019012642.log.1.gz
8130 -rw- 63 Oct 17 2022 12:27:20 +00:00 cia-confderr.20221019012642.log.siz.gz
8129 -rw- 65 Oct 17 2022 12:27:20 +00:00 cia-confderr.20221019012642.log.idx.gz
7722 -rw- 62 Oct 7 2022 14:12:19 +00:00 cia-confderr.20221007141240.log.1.gz
7724 -rw- 63 Oct 7 2022 14:12:19 +00:00 cia-confderr.20221007141240.log.siz.gz
7723 -rw- 65 Oct 7 2022 14:12:19 +00:00 cia-confderr.20221007141240.log.idx.gz
7344 -rw- 20468 Sep 14 2022 14:47:33 +00:00 dmesg
7345 -rw- 10 Sep 14 2022 14:47:33 +00:00 timestamp

825753600 bytes total (675282944 bytes free)


@CaeCae wrote:
36642 -rw- 210490 Nov 11 2022 00:35:23 +00:00 shutdown_journal_rp0.log​


Please post this file.  This is found in flash-1:tracelogs. 

Here is the requested file


@CaeCae wrote:
Nov 11 00:34:45 SANBROCK_ES-9200-MDF_1_RP_0 stack_mgr[8041]: %STACKMGR-1-RELOAD: Reloading due to reason EHSA keepalive timeout
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: deregister interrupt 
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: devinfo null
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Doppler found: slotid: 0,dplrid: 0
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 22 indx 0 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 23 indx 1 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 21 indx 2 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 20 indx 3 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 24 indx 4 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 kernel: dplr_intrpt: Irq 25 indx 5 is cleaned up
Nov 11 00:34:47 SANBROCK_ES-9200-MDF_1_RP_0 ncsshd[13997]: Received signal 15; terminating.
Nov 11 00:34:48 SANBROCK_ES-9200-MDF_1_RP_0 pvp[7668]: %PMAN-5-EXITACTION: Process manager is exiting: reload fp action requested
Nov 11 00:34:48 SANBROCK_ES-9200-MDF_1_RP_0 pvp[7969]: %PMAN-5-EXITACTION: Process manager is exiting: reload cc action requested
Nov 11 00:34:50 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[8256]: %SERVICES-2-NORESOLVE_LOCAL: Error resolving local FRU: Invalid argument
Nov 11 00:34:50 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[8289]: %SERVICES-2-NORESOLVE_LOCAL: Error resolving local FRU: Invalid argument
Nov 11 00:34:50 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[8289]: %SERVICES-3-INVALID_CHASFS: Thread 0xf664f010 has no global chasfs context
Nov 11 00:34:50 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[8256]: %SERVICES-3-INVALID_CHASFS: Thread 0xf6800010 has no global chasfs context
Nov 11 00:34:55 SANBROCK_ES-9200-MDF_1_RP_0 kernel: LSMPI: Deregister dual stack diverter
Nov 11 00:34:56 SANBROCK_ES-9200-MDF_1_RP_0 pvp[9037]: %PMAN-5-EXITACTION: Process manager is exiting: rp processes exit with reload switch code
Nov 11 00:34:56 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[9115]: %SERVICES-2-NORESOLVE_LOCAL: Error resolving local FRU: Invalid argument
Nov 11 00:34:56 SANBROCK_ES-9200-MDF_1_RP_0 btman_rotate_immediate[9115]: %SERVICES-3-INVALID_CHASFS: Thread 0xf6242010 has no global chasfs context

Very "generic" output, however, it did say the "stack_mgr" process crashed and took down the switch.  


@CaeCae wrote:
SANBROCK_ES-9200-MDF_1_RP_0-system-report_1_20221111-003457-UTC.tar.gz

Can we see this file?  It should be located in flash-1:crashinfo sub-directory and should have a file size of 14887 KB.  

Here is the requested file. I had to extract it down to just a .tar since it wouldn't let upload it is a .gz

edit - I have to figure out why it won't let me upload. please hold

edit # 2 - file is now properly uploaded as a .zip

 

 - Connect to the 9200L with : https://cway.cisco.com/cli , at the top right press (or run) 'Crashdump Analyzer'

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Thank you for your reply!

I will have to get setup on Monday to be able to use this program, marce1000, as I don't have the required access permissions from Cisco.

Thanks for the output.  Unfortunately, the crashinfo file did not provide any additional information that is not present in the shutdown_journal_rp0.log​ file.

From what I can discern, the stack-mgr caused a kernel-level "panic" and caused the Stack Manager to reboot.  Since Switch 1 and Switch 2 will always be "Stack Manager", this crash will always crash Switch 1 and Switch 2 (round robin). 


@CaeCae wrote:
Reloading due to reason EHSA keepalive timeout

Don't be mislead by that line.  The "reason" of "EHSA keepalive timeout" is very generic.  There is more than a dozen "unknown" causes of crash which will be blamed at "EHSA keepalive timeout".  

Can I see the output to the following command:  

sh platform soft mount switch active r0 | i ^tmpfs.*tmp | e /tmp/
sh platform soft mount switch st r0 | i ^tmpfs.*tmp | e /tmp/

 

I couldn't get any output with the exclude pipe, but here it is with the include pipe.

Edit - forgot to make it tmpfs.tmp. That pipe does not display any results. Leaving the original output just in case it's helpful. Thanks!

 

sh pla so mou sw act r0 | i tmpfs
devtmpfs 0 911900 0% /dev
tmpfs 28716 963436 3% /dev/shm
tmpfs 8616 983536 1% /run
tmpfs 0 992152 0% /sys/fs/cgroup
tmpfs 308 991844 1% /var
tmpfs 0 0
tmpfs 24 5096 1% /var/log/audit
tmpfs 8616 983536 1% /run/netns
tmpfs 0 1587448 0% /tmp/cc/tdldb

sh pla so mou sw st r0 | i tmpfs
devtmpfs 0 911900 0% /dev
tmpfs 22252 969900 3% /dev/shm
tmpfs 8620 983532 1% /run
tmpfs 0 992152 0% /sys/fs/cgroup
tmpfs 0 0
tmpfs 272 991880 1% /var
tmpfs 24 5096 1% /var/log/audit
tmpfs 8620 983532 1% /run/netns
tmpfs 0 1587448 0% /tmp/cc/tdldb

Thanks.  

Can you arrange for a maintenance window and COLD REBOOT (remove the power cables) of the entire stack?