05-28-2021 08:20 AM
Hello Everyone,
We have a stack of 3 C9200-48T switches with the following IOS
Switch Ports Model SW Version SW Image Mode
------ ----- ----- ---------- ---------- ----
1 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL
2 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL
* 3 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL
It started to reload, 3 times already for the past week
This is what we got in the logs
May 28 18:46:28: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 3: EMP_RELAY: Channel UP!
May 28 18:46:28: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 2: EMP_RELAY: Channel UP!
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, received chassis event to become active
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_NOT_PRESENT)
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_DOWN)
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_REDUNDANCY_STATE_CHANGE)
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi2/0/37 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi2/0/38 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi3/0/11 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, sent message became active. IOS is ready to switch to primary after chassis confirmation
May 28 18:46:28: %HMANRP-6-EMP_NO_ELECTION_INFO: Could not elect active EMP switch, setting emp active switch to 0: EMP_RELAY: Could not elect switch with mgmt port UP
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, received chassis event became active
May 28 18:46:28: %PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply A is restored
May 28 18:46:28: %PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply B is restored
May 28 18:46:28: %STACKMGR-4-SWITCH_REMOVED: Switch 2 R0/0: stack_mgr: Switch 1 has been removed from the stack.
May 28 18:46:28: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 10.30.26.130 port 514 started - CLI initiated
May 28 18:46:28: %STACKMGR-4-SWITCH_REMOVED: Switch 3 R0/0: stack_mgr: Switch 1 has been removed from the stack.
May 28 18:46:28: %PLATFORM-6-HASTATUS_DETAIL: RP switchover, received chassis event became active. Switch to primary (count 1)
May 28 18:46:28: %HA-6-SWITCHOVER: Route Processor switched from standby to being active
May 28 18:46:28: %IOSXE_MGMTVRF-3-SET_TABLEID_FAIL: Installing ipv4 Management interface tableid 0x1 failed
May 28 18:46:28: %IOSXE_MGMTVRF-3-SET_TABLEID_FAIL: Installing ipv6 Management interface tableid 0x1E000001 failed
May 28 18:46:28: Unable to set IPV4 table id for BT interface
May 28 18:46:28: Unable to set IPV6 table id for BT interface
May 28 18:46:28: %HMANRP-6-EMP_NO_ELECTION_INFO: Could not elect active EMP switch, setting emp active switch to 0: EMP_RELAY: Could not elect switch with mgmt port UP
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Gi3/0/10 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Te3/1/1 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Te3/1/2 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %STACKMGR-6-STACK_LINK_CHANGE: Switch 3 R0/0: stack_mgr: Stack port 2 on Switch 3 is down
May 28 18:46:30: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 1 gone DOWN!
May 28 18:46:31: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:31: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:31: %STACKMGR-6-STACK_LINK_CHANGE: Switch 3 R0/0: stack_mgr: Stack port 1 on Switch 3 is down
May 28 18:46:32: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 2 gone DOWN!
May 28 18:46:33: %SMART_LIC-6-HA_ROLE_CHANGED: Smart Agent HA role changed to Active.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Gi3/0/10 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Gi3/0/11 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Te3/1/1 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Te3/1/2 was bounced by Consistency Check IDBS Down.
Any ideas why is it happenings? It seems like the stack is being rebuild
Thank you in advance
Solved! Go to Solution.
05-29-2021 04:53 AM - edited 05-29-2021 04:53 AM
@fgasimzade wrote:May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-4-SWITCH_REMOVED: Switch 1 has been removed from the stack. May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-1-RELOAD: Reloading due to reason stack merge
I have reviewed the file and I have extracted a file called "SWH_STACK_Server_Room-bootuplog-20210526-145054-Baku.log".
See attachment. At the bottom of this file contains the above lines.
This points to CSCvq56135 but get TAC to check and verify.
Stay away from 16.12.X as possible. Downgrade to 16.9.X if possible. Alternatively, 17.3.4 will be out on July 2021.
05-28-2021 09:53 AM
>...%PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply A is restored
- You seem to have powering issues, make sure steady powering is available at all times.
M.
05-28-2021 10:04 AM
Well, I agree, but these switches has dual power supplies connected to 2 different UPS
Another question is, why only 2 out 3 switches reload, and they are always different, once it was 2 and 3 to reload, now it seems like 1 and 2
And now 3rd switch is Active
Switch# Role Mac Address Priority Version State
-------------------------------------------------------------------------------------
1 Standby b0c5.3c4e.1880 1 V01 Ready
2 Member e8eb.3422.4580 1 V01 Ready
*3 Active e8eb.3410.6980 1 V01 Ready
Moreover, the log says switch 3 power supplies, but currently Active Switch 3 shows it was up for more than 2 days:
Uptime for this control processor is 2 days, 6 hours, 11 minutes
Fist time last week it was Switch 2 and 3 to reload
05-28-2021 02:54 PM
Post the complete output to the following commands:
05-28-2021 03:22 PM
Hello Leo,
show version attached, others below
SWH_STACK_Server_Room#dir flash-1:/core
Directory of flash:/core/
64772 drwx 4096 Dec 4 2020 13:34:09 +04:00 modules
64775 -rw- 1 May 29 2021 02:06:46 +04:00 .callhome
1956839424 bytes total (1402101760 bytes free)
SWH_STACK_Server_Room#dir flash-2:/core
Directory of flash-2:/core/
89059 drwx 4096 Dec 4 2020 17:32:35 +04:00 modules
89061 -rw- 1 May 26 2021 14:44:29 +04:00 .callhome
1957167104 bytes total (1402470400 bytes free)
SWH_STACK_Server_Room#dir flash-3:/core
Directory of flash-3:/core/
48579 drwx 4096 Dec 4 2020 17:32:22 +04:00 modules
48581 -rw- 1 May 29 2021 02:02:18 +04:00 .callhome
1957167104 bytes total (1402470400 bytes free)
SWH_STACK_Server_Room#dir crashinfo-1:
Directory of crashinfo:/
36641 drwx 24576 May 29 2021 02:12:20 +04:00 tracelogs
12 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
11 -rw- 891918 Jan 29 2021 11:56:49 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_0-20210129-075647.tar.gz
13 -rw- 920718 Jan 29 2021 11:57:52 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_1-20210129-075750.tar.gz
14 -rw- 1273017 May 1 2021 14:38:31 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_0-20210501-143829.tar.gz
17 -rw- 2454114 May 29 2021 01:38:50 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_2-20210529-013845.tar.gz
15 -rw- 1163745 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_1-20210526-145049.tar.gz
16 -rw- 2259322 May 29 2021 01:13:36 +04:00 SWH_STACK_Server_Room_trace_archive_0-20210529-011331.tar.gz
825638912 bytes total (768987136 bytes free)
SWH_STACK_Server_Room#dir crashinfo-2:
Directory of crashinfo-2:/
14657 drwx 40960 May 29 2021 02:14:49 +04:00 tracelogs
12 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
14 -rw- 1416108 May 26 2021 14:50:57 +04:00 system-report_2_20210526-145054-Baku.tar.gz
11 -rw- 1805733 May 1 2021 14:38:30 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210501-143828.tar.gz
13 -rw- 1063824 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210526-145049.tar.gz
15 -rw- 1736020 May 28 2021 18:46:33 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210528-184628.tar.gz
16 -rw- 2720388 May 29 2021 01:13:34 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210529-011329.tar.gz
17 -rw- 2959211 May 29 2021 01:38:48 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210529-013844.tar.gz
825753600 bytes total (764936192 bytes free)
SWH_STACK_Server_Room#dir crashinfo-3:
Directory of crashinfo-3:/
11 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
21985 drwx 12288 May 29 2021 02:16:28 +04:00 tracelogs
14 -rw- 2000413 Jan 29 2021 11:58:04 +04:00 system-report_3_20210129-075801-UTC.tar.gz
12 -rw- 871872 Jan 29 2021 11:57:49 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210129-075747.tar.gz
13 -rw- 950865 Jan 29 2021 11:57:51 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210129-075750.tar.gz
17 -rw- 1611290 May 1 2021 15:04:27 +04:00 system-report_3_20210501-150424-Baku.tar.gz
15 -rw- 1217171 May 1 2021 14:39:31 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210501-143929.tar.gz
16 -rw- 1215691 May 1 2021 14:40:31 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_1-20210501-144029.tar.gz
21 -rw- 2340336 May 29 2021 01:13:40 +04:00 system-report_3_20210529-011336-Baku.tar.gz
18 -rw- 1748605 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210526-145049.tar.gz
19 -rw- 1070398 May 28 2021 18:46:32 +04:00 SWH_STACK_Server_Room_trace_archive_0-20210528-184629.tar.gz
23 -rw- 1288907 May 29 2021 01:40:11 +04:00 system-report_3_20210529-014010-Baku.tar.gz
20 -rw- 1287126 May 29 2021 01:13:35 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_2-20210529-011330.tar.gz
22 -rw- 1269934 May 29 2021 01:39:51 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210529-013946.tar.gz
SWH_STACK_Server_Room#show log onboard switch 1 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 08:19:43
Total uptime : 0 years 14 weeks 6 days 17 hours 55 minutes
Total downtime : 0 years 10 weeks 6 days 19 hours 34 minutes
Number of resets : 12
Number of slot changes : 0
Current reset reason : stack merge due to incompatiblity
Current reset timestamp : 05/28/2021 14:48:55
Current slot : 1
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 7 hours 0 minutes
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 08:19:43 Power Failure or Unknown 0 0 0 0 0
11/28/2020 08:34:22 Image Install 0 0 0 0 10
11/28/2020 08:37:51 Reload Command 0 0 0 0 0
11/30/2020 00:46:43 Power Failure or Unknown 0 0 0 0 0
12/04/2020 09:34:49 Reload Command 0 0 0 0 0
12/04/2020 09:49:38 Image Install 0 0 0 0 10
12/04/2020 09:53:08 Reload Command 0 0 0 0 0
01/25/2021 06:37:47 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:31:25 Power Failure or Unknown 0 0 0 0 30
02/17/2021 06:01:03 Power Failure or Unknown 0 0 4 3 0
02/17/2021 06:31:46 Power Failure or Unknown 0 0 0 0 10
02/17/2021 10:10:00 Power Failure or Unknown 0 0 0 0 25
05/28/2021 14:48:55 stack merge due to incompatiblity 0 14 2 4 0
--------------------------------------------------------------------------------
SWH_STACK_Server_Room#show log onboard switch 2 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 13:02:20
Total uptime : 0 years 14 weeks 6 days 16 hours 45 minutes
Total downtime : 0 years 10 weeks 6 days 16 hours 30 minutes
Number of resets : 16
Number of slot changes : 1
Current reset reason : EHSA standby down
Current reset timestamp : 05/28/2021 21:44:06
Current slot : 2
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 0 hours 35 minutes
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 13:02:20 Power Failure or Unknown 0 0 0 0 0
11/28/2020 13:17:07 Image Install 0 0 0 0 10
11/28/2020 13:20:42 Reload Command 0 0 0 0 0
12/04/2020 00:32:48 Power Failure or Unknown 0 0 0 0 0
12/04/2020 13:33:15 Reload Command 0 0 0 0 0
12/04/2020 13:47:57 Image Install 0 0 0 0 10
12/04/2020 13:51:27 Reload Command 0 0 0 0 0
01/25/2021 06:41:17 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:35:02 Power Failure or Unknown 0 0 0 0 30
01/29/2021 08:00:24 lost both active and standby 0 0 3 23 56
02/17/2021 06:02:43 Power Failure or Unknown 0 0 0 2 0
02/17/2021 06:33:26 Power Failure or Unknown 0 0 0 0 8
02/17/2021 10:11:39 Power Failure or Unknown 0 0 0 0 23
05/26/2021 10:53:18 stack merge 0 13 6 23 57
05/28/2021 14:48:55 lost both active and standby 0 0 2 3 0
05/28/2021 21:16:01 lost both active and standby 0 0 0 6 0
05/28/2021 21:44:06 EHSA standby down 0 0 0 0 25
--------------------------------------------------------------------------------
SWH_STACK_Server_Room#show log onboard switch 3 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 13:35:30
Total uptime : 0 years 14 weeks 6 days 17 hours 2 minutes
Total downtime : 0 years 10 weeks 6 days 15 hours 40 minutes
Number of resets : 17
Number of slot changes : 1
Current reset reason : stack merge
Current reset timestamp : 05/28/2021 21:44:06
Current slot : 3
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 0 hours 35 minutes
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 13:35:30 Power Failure or Unknown 0 0 0 0 0
11/28/2020 13:50:15 Image Install 0 0 0 0 10
11/28/2020 13:53:47 Reload Command 0 0 0 0 0
12/04/2020 00:32:21 Power Failure or Unknown 0 0 0 0 0
12/04/2020 13:33:02 Reload Command 0 0 0 0 0
12/04/2020 13:47:45 Image Install 0 0 0 0 10
12/04/2020 13:51:15 Reload Command 0 0 0 0 0
12/04/2020 13:54:56 Power Failure or Unknown 0 0 0 0 0
01/25/2021 06:37:11 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:31:05 Power Failure or Unknown 0 0 0 0 30
01/29/2021 08:00:25 stack merge 0 0 4 0 0
02/17/2021 06:01:20 Power Failure or Unknown 0 0 0 2 0
02/17/2021 06:32:02 Power Failure or Unknown 0 0 0 0 9
02/17/2021 10:10:13 Power Failure or Unknown 0 0 0 0 24
05/01/2021 11:06:48 stack merge 0 10 2 23 57
05/26/2021 10:53:19 lost both active and standby 0 3 3 23 0
05/28/2021 21:16:01 stack merge 0 0 2 9 59
05/28/2021 21:44:06 stack merge 0 0 0 0 20
--------------------------------------------------------------------------------
05-28-2021 04:42 PM
@fgasimzade wrote:
system-report_2_20210526-145054-Baku.tar.gz
This file will be very useful. If you can attach this file I would like to take a peek inside.
@fgasimzade wrote:
05/28/2021 14:48:55 stack merge due to incompatiblity 0 14 2 4 0
05/28/2021 21:44:06 stack merge 0 0 0 0 20
I suspect the cause of the issue is CSCvq56135. It is widely known that Cisco has been unable to fix "stack merge" issue(s) since introducing this bug on 16.10.X (affecting 9300 only). This bug has been seen on 16.11.X (affecting 9300 only) and 16.12.X (affecting 9200 only).
As far as I am concerned, 16.12.X is not a stable version. If you can downgrade to, say, 16.9.X this should fix the problem.
05-28-2021 09:40 PM - edited 05-28-2021 10:04 PM
05-29-2021 04:53 AM - edited 05-29-2021 04:53 AM
@fgasimzade wrote:May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-4-SWITCH_REMOVED: Switch 1 has been removed from the stack. May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-1-RELOAD: Reloading due to reason stack merge
I have reviewed the file and I have extracted a file called "SWH_STACK_Server_Room-bootuplog-20210526-145054-Baku.log".
See attachment. At the bottom of this file contains the above lines.
This points to CSCvq56135 but get TAC to check and verify.
Stay away from 16.12.X as possible. Downgrade to 16.9.X if possible. Alternatively, 17.3.4 will be out on July 2021.
05-29-2021 05:29 AM
Hello Leo,
Thank you for your time
I wanted to download 16.9, but looks like it is not available
What if we go with 17.X?
05-29-2021 05:34 AM
Go to 17.3.3 and be ready to jump to 17.3.4 when it becomes available.
05-30-2021 07:26 AM
Hello Leo,
We have updated to 17.3.3
Lets see how it goes. Thank you for your help
05-30-2021 09:44 AM
- It may also be advisable to use and configure a (central) syslog-server for capturing logs. Benefits , more flexible to follow-up and keep for longer times (syslog servers usually have auto-maintenance keeping of log files such as daily rotating). In such 'uncertain times' with the stack it can provide additional benefits.
M.
05-30-2021 11:34 AM
Thank you Marce1000, we will definitely set-up one
05-30-2021 11:32 AM
Hello Leo,
The stack has just reloaded, all 3 switches at the same time, not like before, when just 2 switches out of 3 were being restarted
05-30-2021 05:41 PM
@fgasimzade wrote:
The stack has just reloaded, all 3 switches at the same time, not like before, when just 2 switches out of 3 were being restarted
Same thing as before, give us the complete new output to the following commands:
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide