04-29-2020 01:27 AM
I am having issues across the estate where the Cisco C9300-48UXM switches esp in stacks will do a random reboot usually a few weeks apart but at any time and nothing shows in the logs showing exactly why.
Apr 28 09:43:55 Cab-NLabsAC_2_RP_0 stack_mgr[13890]: %STACKMGR-1-RELOAD: Reloading due to reason Configuration mismatch
We thought that this was due to an ios bug and that was what was making the switches unstable (i.e. threshold too low, switch always triggering warning/critical, small memory leak, switch crash eventually). As we found a known software defect – CSCvn79101
But this has been the same across 3 different IOS versions we have changed and across two gold standards, the latest being 16.9.5
Switch output of the '#sh platform resources' command below.
#sh platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
Control Processor 1.70% 100% 5% 10% H
DRAM 2528MB(33%) 7583MB 90% 95% H
Though when I am looking through cisco docs, it is showing the percentage as 90%
Switch# show platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
Control Processor 7.20% 100% 90% 95% H
DRAM 2701MB(69%) 3883MB 90% 95% H
So really wanting to find out if this is a cisco IOS issue or if we have QOS setting issue.
Many thanks
Phil
Solved! Go to Solution.
04-30-2020 04:02 PM
04-29-2020 01:32 AM
Post the complete output to the following command:
dir crashinfo-<CRASHED SWITCH MEMBER>:
04-29-2020 01:39 AM
Morning Leo
Thanks for your response, I had switch 2 and 4 out of 5 switches in a stack do this yesterday.
please see the output below.
Thanks
Phil
Cab-NLabsAC#dir crashinfo-2:
Directory of crashinfo-2:/
31553 drwx 36864 Apr 29 2020 09:38:23 +01:00 tracelogs
11 -rw- 0 Jun 20 2018 10:25:07 +01:00 koops.dat
12 -rw- 410460 Aug 24 2018 23:27:10 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_0-20180824-222703.tar.gz
13 -rw- 403904 Aug 24 2018 23:34:04 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_1-20180824-223358.tar.gz
14 -rw- 572086 Aug 25 2018 01:48:41 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_4-20180825-004833.tar.gz
15 -rw- 590746 Aug 25 2018 01:52:05 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_5-20180825-005158.tar.gz
16 -rw- 809989 Aug 25 2018 02:07:36 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_0-20180825-010729.tar.gz
17 -rw- 1164655 Aug 25 2018 02:25:49 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_1-20180825-012542.tar.gz
18 -rw- 1574874 Aug 25 2018 02:48:58 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_0-20180825-014851.tar.gz
19 -rw- 1647484 Aug 25 2018 03:02:49 +01:00 RP_0_trace_archive_0-20180825-020242.tar.gz
20 -rw- 1658861 Aug 25 2018 03:03:04 +01:00 RP_0_trace_archive_1-20180825-020257.tar.gz
21 -rw- 1633446 Aug 25 2018 03:27:37 +01:00 Cab-NLabsAC_2_RP_0_trace_archive_0-20180825-022729.tar.gz
22 -rw- 1380610 Apr 28 2020 09:44:00 +01:00 system-report_2_20200428-094359-BST.tar.gz
1651507200 bytes total (1553989632 bytes free)
Cab-NLabsAC#dir crashinfo-4:
Directory of crashinfo-4:/
7889 drwx 36864 Apr 29 2020 09:34:31 +01:00 tracelogs
11 -rw- 0 Jun 20 2018 10:24:15 +01:00 koops.dat
12 -rw- 462786 Aug 25 2018 01:41:18 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_0-20180825-004111.tar.gz
13 -rw- 490520 Aug 25 2018 01:48:39 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_3-20180825-004832.tar.gz
14 -rw- 546696 Aug 25 2018 01:51:03 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_4-20180825-005056.tar.gz
15 -rw- 557237 Aug 25 2018 01:52:05 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_5-20180825-005158.tar.gz
16 -rw- 663325 Aug 25 2018 02:07:37 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_0-20180825-010731.tar.gz
17 -rw- 920736 Aug 25 2018 02:25:50 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_0-20180825-012543.tar.gz
18 -rw- 1212323 Aug 25 2018 02:48:58 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_0-20180825-014851.tar.gz
19 -rw- 1239533 Aug 25 2018 03:27:35 +01:00 Cab-NLabsAC_4_RP_0_trace_archive_0-20180825-022728.tar.gz
1651507200 bytes total (1556086784 bytes free)
Cab-NLabsAC#
04-29-2020 03:08 AM
22 -rw- 1380610 Apr 28 2020 09:44:00 +01:00 system-report_2_20200428-094359-BST.tar.gz
Switch 2 showed signs of crash. Switch 4 didn't.
Is there a way for you to post the above file?
Kindly post the output to this command:
remote command 4 sh log on up detail
The command above will give us a hint what caused the reboot of switch 4.
04-29-2020 04:37 AM
Thanks for your reply
Can you please tell me how I would get to the file, as I am unable to see it in dir, or dir flash: which is where I thought it would be, also unable to see any thing for system report from doing a ?.
system-report_2_20200428-094359-BST.tar.gz
as for the below command, I am unable to get it to work and nothing in show or privileged level either. I only get redundancy or renew as options.
remote command 4 sh log on up detail
Sorry about this.
Thanks
Phil
04-29-2020 05:32 AM
If there is a TFTP server, try this:
copy crashinfo-2:system-report_2_20200428-094359-BST.tar.gz tftp://<TFTP IP ADDRESS>/system-report_2_20200428-094359-BST.tar.gz
Can you try this command:
sh log on switch 4 up detail
04-29-2020 07:53 AM
Leo
I will try the tftp one tomorrow, but the show log is below.
Many thanks
Phil
Cab-NLabsAC#sh log on switch 4 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 06/11/2018 13:37:09
Total uptime : 1 years 35 weeks 3 days 14 hours 55 minutes
Total downtime : 0 years 10 weeks 4 days 9 hours 22 minutes
Number of resets : 17
Number of slot changes : 1
Current reset reason : EHSA standby down
Current reset timestamp : 04/28/2020 08:49:59
Current slot : 4
Chassis type : 0
Current uptime : 0 years 0 weeks 1 days 5 hours 5 minutes
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
06/11/2018 13:37:09 Reload 0 0 0 0 0
06/11/2018 13:50:08 Reload 0 0 0 0 5
06/20/2018 09:05:51 Reload 0 0 0 0 5
06/20/2018 09:13:57 Reload 0 0 0 0 5
06/20/2018 09:25:31 Reload 0 0 0 0 5
08/24/2018 22:24:39 Reload 0 0 0 0 0
08/24/2018 22:29:13 Reload 0 0 0 0 0
08/24/2018 22:36:05 Reload 0 0 0 0 0
08/25/2018 00:58:14 Reload 0 0 0 2 5
08/25/2018 01:09:59 Reload 0 0 0 0 5
08/25/2018 01:30:51 Reload 0 0 0 0 5
08/25/2018 01:39:13 Reload 0 0 0 0 0
08/25/2018 01:51:19 Reload 0 0 0 0 5
08/25/2018 02:13:53 Reload 0 0 0 0 5
08/25/2018 03:17:58 Reload 0 0 0 0 5
02/02/2020 09:24:46 Reload 1 23 0 5 5
02/02/2020 10:05:52 Reload 0 0 0 0 5
04/28/2020 08:49:59 EHSA standby down 0 12 1 22 5
--------------------------------------------------------------------------------
Cab-NLabsAC#
04-30-2020 02:40 AM
04-30-2020 04:01 AM - edited 04-30-2020 04:24 AM
ReloadReason=Configuration mismatch RET_2_RCALTS=1588063435 RET_2_RTS=09:43:55 BST Tue Apr 28 2020
So one of the file spat out this (and only this).
Apr 28 09:43:38 Cab-NLabsAC_2_RP_0 xinetd[12517]: execve /usr/binos/conf/in.telnetd.sh
Apr 28 09:43:55 Cab-NLabsAC_2_RP_0 xinetd[12986]: execve /usr/bin/rsync
Apr 28 09:43:55 Cab-NLabsAC_2_RP_0 stack_mgr[13890]: %STACKMGR-1-RELOAD: Reloading due to reason Configuration mismatch
Apr 28 09:43:56 Cab-NLabsAC_2_RP_0 kernel: LSMPI: Deregister dual stack diverter
Apr 28 09:43:57 Cab-NLabsAC_2_RP_0 pvp[14735]: %PMAN-5-EXITACTION: Process manager is exiting: reload fp action requested
Apr 28 09:43:59 Cab-NLabsAC_2_RP_0 pvp[14796]: %PMAN-5-EXITACTION: Process manager is exiting: rp processes exit with reload switch code
Apr 28 09:43:59 Cab-NLabsAC_2_RP_0 systemd[1]: agetty-iosd.service: Main process exited, code=killed, status=9/KILL
Apr 28 09:43:59 Cab-NLabsAC_2_RP_0 systemd[1]: agetty-iosd.service: Unit entered failed state.
Apr 28 09:43:59 Cab-NLabsAC_2_RP_0 systemd[1]: agetty-iosd.service: Failed with result 'signal'.
Another one is this.
04/28/2020 08:49:59 EHSA standby down 0 12 1 22 5
The above was taken from switch 4.
I think you've hit two (2) bugs.
The first bug talks about "configuration mismatch". This usually happens when a stack merge or "split brain" occurs. I cannot find the cause of the split brain.
The second bug is what happened to switch 4: CSCvi15897
You will need to raise this with TAC and get them to identify the first &/or confirm the 2nd bug.
Question: Before the crash, did someone telnet into the switch and left the telnet session running (until the active/master switch crashed)?
04-30-2020 07:31 AM
Leo
Thanks for your help, and I don't believe anyone had left a session open.
And unfortunately I can't raise a TAC as these are not under support, and I am not sure they will be.
Thanks
Phil
04-30-2020 04:02 PM
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: