cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
12597
Views
0
Helpful
22
Replies

9200 Stack reloads once in a while

fgasimzade
Level 4
Level 4

Hello Everyone,

We have a stack of 3 C9200-48T switches with the following IOS

Switch Ports Model SW Version SW Image Mode
------ ----- ----- ---------- ---------- ----
1 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL
2 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL
* 3 56 C9200-48T 16.12.3a CAT9K_LITE_IOSXE INSTALL

 

It started to reload, 3 times already for the past week

 

This is what we got in the logs

 

May 28 18:46:28: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 3: EMP_RELAY: Channel UP!
May 28 18:46:28: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 2: EMP_RELAY: Channel UP!
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, received chassis event to become active
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_NOT_PRESENT)
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_DOWN)
May 28 18:46:28: %REDUNDANCY-3-SWITCHOVER: RP switchover (PEER_REDUNDANCY_STATE_CHANGE)
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi2/0/37 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi2/0/38 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PM-4-PORT_INCONSISTENT: Port Gi3/0/11 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, sent message became active. IOS is ready to switch to primary after chassis confirmation
May 28 18:46:28: %HMANRP-6-EMP_NO_ELECTION_INFO: Could not elect active EMP switch, setting emp active switch to 0: EMP_RELAY: Could not elect switch with mgmt port UP
May 28 18:46:28: %PLATFORM-6-HASTATUS: RP switchover, received chassis event became active
May 28 18:46:28: %PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply A is restored
May 28 18:46:28: %PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply B is restored
May 28 18:46:28: %STACKMGR-4-SWITCH_REMOVED: Switch 2 R0/0: stack_mgr: Switch 1 has been removed from the stack.
May 28 18:46:28: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 10.30.26.130 port 514 started - CLI initiated
May 28 18:46:28: %STACKMGR-4-SWITCH_REMOVED: Switch 3 R0/0: stack_mgr: Switch 1 has been removed from the stack.
May 28 18:46:28: %PLATFORM-6-HASTATUS_DETAIL: RP switchover, received chassis event became active. Switch to primary (count 1)
May 28 18:46:28: %HA-6-SWITCHOVER: Route Processor switched from standby to being active
May 28 18:46:28: %IOSXE_MGMTVRF-3-SET_TABLEID_FAIL: Installing ipv4 Management interface tableid 0x1 failed
May 28 18:46:28: %IOSXE_MGMTVRF-3-SET_TABLEID_FAIL: Installing ipv6 Management interface tableid 0x1E000001 failed
May 28 18:46:28: Unable to set IPV4 table id for BT interface

May 28 18:46:28: Unable to set IPV6 table id for BT interface

May 28 18:46:28: %HMANRP-6-EMP_NO_ELECTION_INFO: Could not elect active EMP switch, setting emp active switch to 0: EMP_RELAY: Could not elect switch with mgmt port UP
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Gi3/0/10 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Te3/1/1 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %PM-4-PORT_INCONSISTENT: Port Te3/1/2 is inconsistent: IDB state down (set 00:00:02 ago),
link: up (2d03h ago), admin: up (2d03h ago).
May 28 18:46:29: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:29: %STACKMGR-6-STACK_LINK_CHANGE: Switch 3 R0/0: stack_mgr: Stack port 2 on Switch 3 is down
May 28 18:46:30: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 1 gone DOWN!
May 28 18:46:31: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:31: %SMART_LIC-5-EVAL_START: Entering evaluation period
May 28 18:46:31: %STACKMGR-6-STACK_LINK_CHANGE: Switch 3 R0/0: stack_mgr: Stack port 1 on Switch 3 is down
May 28 18:46:32: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 2 gone DOWN!
May 28 18:46:33: %SMART_LIC-6-HA_ROLE_CHANGED: Smart Agent HA role changed to Active.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Gi3/0/10 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Gi3/0/11 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Te3/1/1 was bounced by Consistency Check IDBS Down.
May 28 18:46:33: %PM-4-PORT_BOUNCED: Port Te3/1/2 was bounced by Consistency Check IDBS Down.

 

Any ideas why is it happenings? It seems like the stack is being rebuild

Thank you in advance

1 Accepted Solution

Accepted Solutions


@fgasimzade wrote:
May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-4-SWITCH_REMOVED: Switch 1 has been removed from the stack.
May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-1-RELOAD: Reloading due to reason stack merge

I have reviewed the file and I have extracted a file called "SWH_STACK_Server_Room-bootuplog-20210526-145054-Baku.log".

See attachment.  At the bottom of this file contains the above lines. 

This points to CSCvq56135 but get TAC to check and verify.  

Stay away from 16.12.X as possible.  Downgrade to 16.9.X if possible.  Alternatively, 17.3.4 will be out on July 2021.  

View solution in original post

22 Replies 22

marce1000
VIP
VIP

 

  >...%PLATFORM_FEP-1-FRU_PS_SIGNAL_OK: Switch 3: signal on power supply A is restored

  - You seem to have powering issues, make sure steady powering is available at all times.

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Well, I agree, but these switches has dual power supplies connected to 2 different UPS

Another question is, why only 2 out 3 switches reload, and they are always different, once it was 2 and 3 to reload, now it seems like 1 and 2

And now 3rd switch is Active

 

Switch# Role Mac Address Priority Version State
-------------------------------------------------------------------------------------
1 Standby b0c5.3c4e.1880 1 V01 Ready
2 Member e8eb.3422.4580 1 V01 Ready
*3 Active e8eb.3410.6980 1 V01 Ready

 

Moreover, the log says switch 3 power supplies, but currently Active Switch 3 shows it was up for more than 2 days:

Uptime for this control processor is 2 days, 6 hours, 11 minutes

 

Fist time last week it was Switch 2 and 3 to reload

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the following commands: 

  • sh version
  • dir flash-1:/core
  • dir flash-2:/core
  • dir flash-3:/core
  • dir crashinfo-1:
  • dir crashinfo-2:
  • dir crashinfo-3:
  • sh log on switch 1 up detail
  • sh log on switch 2 up detail
  • sh log on switch 3 up detail

Hello Leo,

show version attached, others below

 

SWH_STACK_Server_Room#dir flash-1:/core
Directory of flash:/core/

64772 drwx 4096 Dec 4 2020 13:34:09 +04:00 modules
64775 -rw- 1 May 29 2021 02:06:46 +04:00 .callhome

1956839424 bytes total (1402101760 bytes free)

 

SWH_STACK_Server_Room#dir flash-2:/core
Directory of flash-2:/core/

89059 drwx 4096 Dec 4 2020 17:32:35 +04:00 modules
89061 -rw- 1 May 26 2021 14:44:29 +04:00 .callhome

1957167104 bytes total (1402470400 bytes free)

 

SWH_STACK_Server_Room#dir flash-3:/core
Directory of flash-3:/core/

48579 drwx 4096 Dec 4 2020 17:32:22 +04:00 modules
48581 -rw- 1 May 29 2021 02:02:18 +04:00 .callhome

1957167104 bytes total (1402470400 bytes free)

 

SWH_STACK_Server_Room#dir crashinfo-1:
Directory of crashinfo:/

36641 drwx 24576 May 29 2021 02:12:20 +04:00 tracelogs
12 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
11 -rw- 891918 Jan 29 2021 11:56:49 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_0-20210129-075647.tar.gz
13 -rw- 920718 Jan 29 2021 11:57:52 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_1-20210129-075750.tar.gz
14 -rw- 1273017 May 1 2021 14:38:31 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_0-20210501-143829.tar.gz
17 -rw- 2454114 May 29 2021 01:38:50 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_2-20210529-013845.tar.gz
15 -rw- 1163745 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_1_RP_0_trace_archive_1-20210526-145049.tar.gz
16 -rw- 2259322 May 29 2021 01:13:36 +04:00 SWH_STACK_Server_Room_trace_archive_0-20210529-011331.tar.gz

825638912 bytes total (768987136 bytes free)

 

SWH_STACK_Server_Room#dir crashinfo-2:
Directory of crashinfo-2:/

14657 drwx 40960 May 29 2021 02:14:49 +04:00 tracelogs
12 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
14 -rw- 1416108 May 26 2021 14:50:57 +04:00 system-report_2_20210526-145054-Baku.tar.gz
11 -rw- 1805733 May 1 2021 14:38:30 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210501-143828.tar.gz
13 -rw- 1063824 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210526-145049.tar.gz
15 -rw- 1736020 May 28 2021 18:46:33 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210528-184628.tar.gz
16 -rw- 2720388 May 29 2021 01:13:34 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210529-011329.tar.gz
17 -rw- 2959211 May 29 2021 01:38:48 +04:00 SWH_STACK_Server_Room_2_RP_0_trace_archive_0-20210529-013844.tar.gz

825753600 bytes total (764936192 bytes free)

 

SWH_STACK_Server_Room#dir crashinfo-3:
Directory of crashinfo-3:/

11 -rw- 0 Dec 11 2019 20:56:58 +04:00 koops.dat
21985 drwx 12288 May 29 2021 02:16:28 +04:00 tracelogs
14 -rw- 2000413 Jan 29 2021 11:58:04 +04:00 system-report_3_20210129-075801-UTC.tar.gz
12 -rw- 871872 Jan 29 2021 11:57:49 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210129-075747.tar.gz
13 -rw- 950865 Jan 29 2021 11:57:51 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210129-075750.tar.gz
17 -rw- 1611290 May 1 2021 15:04:27 +04:00 system-report_3_20210501-150424-Baku.tar.gz
15 -rw- 1217171 May 1 2021 14:39:31 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210501-143929.tar.gz
16 -rw- 1215691 May 1 2021 14:40:31 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_1-20210501-144029.tar.gz
21 -rw- 2340336 May 29 2021 01:13:40 +04:00 system-report_3_20210529-011336-Baku.tar.gz
18 -rw- 1748605 May 26 2021 14:50:53 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_0-20210526-145049.tar.gz
19 -rw- 1070398 May 28 2021 18:46:32 +04:00 SWH_STACK_Server_Room_trace_archive_0-20210528-184629.tar.gz
23 -rw- 1288907 May 29 2021 01:40:11 +04:00 system-report_3_20210529-014010-Baku.tar.gz
20 -rw- 1287126 May 29 2021 01:13:35 +04:00 SWH_STACK_Server_Room_3_RP_0_trace_archive_2-20210529-011330.tar.gz
22 -rw- 1269934 May 29 2021 01:39:51 +04:00 SWH_STACK_Server_Room_trace_archive_1-20210529-013946.tar.gz

 

SWH_STACK_Server_Room#show log onboard switch 1 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 08:19:43
Total uptime : 0 years 14 weeks 6 days 17 hours 55 minutes
Total downtime : 0 years 10 weeks 6 days 19 hours 34 minutes
Number of resets : 12
Number of slot changes : 0
Current reset reason : stack merge due to incompatiblity
Current reset timestamp : 05/28/2021 14:48:55
Current slot : 1
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 7 hours 0 minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 08:19:43 Power Failure or Unknown 0 0 0 0 0
11/28/2020 08:34:22 Image Install 0 0 0 0 10
11/28/2020 08:37:51 Reload Command 0 0 0 0 0
11/30/2020 00:46:43 Power Failure or Unknown 0 0 0 0 0
12/04/2020 09:34:49 Reload Command 0 0 0 0 0
12/04/2020 09:49:38 Image Install 0 0 0 0 10
12/04/2020 09:53:08 Reload Command 0 0 0 0 0
01/25/2021 06:37:47 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:31:25 Power Failure or Unknown 0 0 0 0 30
02/17/2021 06:01:03 Power Failure or Unknown 0 0 4 3 0
02/17/2021 06:31:46 Power Failure or Unknown 0 0 0 0 10
02/17/2021 10:10:00 Power Failure or Unknown 0 0 0 0 25
05/28/2021 14:48:55 stack merge due to incompatiblity 0 14 2 4 0
--------------------------------------------------------------------------------

 

SWH_STACK_Server_Room#show log onboard switch 2 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 13:02:20
Total uptime : 0 years 14 weeks 6 days 16 hours 45 minutes
Total downtime : 0 years 10 weeks 6 days 16 hours 30 minutes
Number of resets : 16
Number of slot changes : 1
Current reset reason : EHSA standby down
Current reset timestamp : 05/28/2021 21:44:06
Current slot : 2
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 0 hours 35 minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 13:02:20 Power Failure or Unknown 0 0 0 0 0
11/28/2020 13:17:07 Image Install 0 0 0 0 10
11/28/2020 13:20:42 Reload Command 0 0 0 0 0
12/04/2020 00:32:48 Power Failure or Unknown 0 0 0 0 0
12/04/2020 13:33:15 Reload Command 0 0 0 0 0
12/04/2020 13:47:57 Image Install 0 0 0 0 10
12/04/2020 13:51:27 Reload Command 0 0 0 0 0
01/25/2021 06:41:17 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:35:02 Power Failure or Unknown 0 0 0 0 30
01/29/2021 08:00:24 lost both active and standby 0 0 3 23 56
02/17/2021 06:02:43 Power Failure or Unknown 0 0 0 2 0
02/17/2021 06:33:26 Power Failure or Unknown 0 0 0 0 8
02/17/2021 10:11:39 Power Failure or Unknown 0 0 0 0 23
05/26/2021 10:53:18 stack merge 0 13 6 23 57
05/28/2021 14:48:55 lost both active and standby 0 0 2 3 0
05/28/2021 21:16:01 lost both active and standby 0 0 0 6 0
05/28/2021 21:44:06 EHSA standby down 0 0 0 0 25
--------------------------------------------------------------------------------

 

SWH_STACK_Server_Room#show log onboard switch 3 up detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 11/28/2020 13:35:30
Total uptime : 0 years 14 weeks 6 days 17 hours 2 minutes
Total downtime : 0 years 10 weeks 6 days 15 hours 40 minutes
Number of resets : 17
Number of slot changes : 1
Current reset reason : stack merge
Current reset timestamp : 05/28/2021 21:44:06
Current slot : 3
Chassis type : 255
Current uptime : 0 years 0 weeks 0 days 0 hours 35 minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp | Reset | Uptime
MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes
--------------------------------------------------------------------------------
11/28/2020 13:35:30 Power Failure or Unknown 0 0 0 0 0
11/28/2020 13:50:15 Image Install 0 0 0 0 10
11/28/2020 13:53:47 Reload Command 0 0 0 0 0
12/04/2020 00:32:21 Power Failure or Unknown 0 0 0 0 0
12/04/2020 13:33:02 Reload Command 0 0 0 0 0
12/04/2020 13:47:45 Image Install 0 0 0 0 10
12/04/2020 13:51:15 Reload Command 0 0 0 0 0
12/04/2020 13:54:56 Power Failure or Unknown 0 0 0 0 0
01/25/2021 06:37:11 Power Failure or Unknown 0 0 0 0 0
01/25/2021 07:31:05 Power Failure or Unknown 0 0 0 0 30
01/29/2021 08:00:25 stack merge 0 0 4 0 0
02/17/2021 06:01:20 Power Failure or Unknown 0 0 0 2 0
02/17/2021 06:32:02 Power Failure or Unknown 0 0 0 0 9
02/17/2021 10:10:13 Power Failure or Unknown 0 0 0 0 24
05/01/2021 11:06:48 stack merge 0 10 2 23 57
05/26/2021 10:53:19 lost both active and standby 0 3 3 23 0
05/28/2021 21:16:01 stack merge 0 0 2 9 59
05/28/2021 21:44:06 stack merge 0 0 0 0 20
--------------------------------------------------------------------------------


@fgasimzade wrote:
system-report_2_20210526-145054-Baku.tar.gz

This file will be very useful.  If you can attach this file I would like to take a peek inside. 


@fgasimzade wrote:
05/28/2021 14:48:55 stack merge due to incompatiblity 0 14 2 4 0
05/28/2021 21:44:06 stack merge 0 0 0 0 20

I suspect the cause of the issue is CSCvq56135.  It is widely known that Cisco has been unable to fix "stack merge" issue(s) since introducing this bug on 16.10.X (affecting 9300 only).  This bug has been seen on 16.11.X (affecting 9300 only) and 16.12.X (affecting 9200 only).  

As far as I am concerned, 16.12.X is not a stable version.  If you can downgrade to, say, 16.9.X this should fix the problem.  

Hello Leo,

 

Requested file attached

 

I have already opened a TAC case, but would really appreciate if you take a look at the report

 

P.S. I have checked the bug, it says the issue is fixed in our IOS version, should we consider the downgrade anyway?


@fgasimzade wrote:
May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-4-SWITCH_REMOVED: Switch 1 has been removed from the stack.
May 26 14:50:48 SWH_STACK_Server_Room stack_mgr[7768]: %STACKMGR-1-RELOAD: Reloading due to reason stack merge

I have reviewed the file and I have extracted a file called "SWH_STACK_Server_Room-bootuplog-20210526-145054-Baku.log".

See attachment.  At the bottom of this file contains the above lines. 

This points to CSCvq56135 but get TAC to check and verify.  

Stay away from 16.12.X as possible.  Downgrade to 16.9.X if possible.  Alternatively, 17.3.4 will be out on July 2021.  

Hello Leo,

Thank you for your time

I wanted to download 16.9, but looks like it is not available

 

16-9.jpeg

 

What if we go with 17.X?

Go to 17.3.3 and be ready to jump to 17.3.4 when it becomes available.

Hello Leo,

 

We have updated to 17.3.3

Lets see how it goes. Thank you for your help

 

 - It may also be advisable to use and configure a (central) syslog-server for capturing logs. Benefits , more flexible to follow-up and keep for longer times (syslog servers usually have auto-maintenance keeping of log files such as daily rotating). In such 'uncertain times' with the  stack it can provide additional benefits.

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Thank you Marce1000, we will definitely set-up one

Hello Leo,

 

The stack has just reloaded, all 3 switches at the same time, not like before, when just 2 switches out of 3 were being restarted


@fgasimzade wrote:

The stack has just reloaded, all 3 switches at the same time, not like before, when just 2 switches out of 3 were being restarted


Same thing as before, give us the complete new output to the following commands:

  • sh version
  • dir flash-1:/core
  • dir flash-2:/core
  • dir flash-3:/core
  • dir crashinfo-1:
  • dir crashinfo-2:
  • dir crashinfo-3:
  • sh log on switch 1 up detail
  • sh log on switch 2 up detail
  • sh log on switch 3 up detail
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: