cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
19847
Views
0
Helpful
20
Replies

One stack member switch removed from stack after reload

Hi Everyone,

In a stack, member switch (Cisco 9300-48P) went into removed state after reload.

Switch# Role     Mac Address          Priority       Version Current State
-------------------------------------------------------------------------------------
*1       Active   4ce1.75f1.a400       1             V03 Ready
2       Member  0000.0000.0000     0         V03 Removed

 

Is there any possible way to fix it without reload

 

As I can see in "show logging"


%HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 2: EMP_RELAY: Channel DOWN!

 

Please share you thoughts !

20 Replies 20

Helllo

Have you perfromed a physical check of the failed siwtch?
Is there power did it even come back up?

Are the stack cables seated correctly?


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hello Paul

 

I have exactly the same issue, I added a second switch to my stack of one switch and it worked for 2-3 hours where the second switch was marked as STANDBY but now I discover it has been removed. Any ideas what is wrong here? The two C9200L switches are new.

 

Below is the ouput of "sh platform":

 

#sh platform
Switch Ports Model Serial No. MAC address Hw Ver. Sw Ver.
------ ----- --------- ----------- -------------- ------- --------
1 52 C9200L-48PXG-4X ########## 28af.fd04.ec80 V01 17.03.03
Switch/Stack Mac Address : 28af.fd04.ec80 - Local Mac Address
Mac persistency wait time: Indefinite
Current
Switch# Role Priority State
-------------------------------------------
*1 Active 1 Ready
2 Member 0 Removed

 

In the log I have:

 

Aug 4 17:48:41.715: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 2: EMP_RELAY: Channel DOWN!
Aug 4 17:50:08.060: %SMART_LIC-3-COMM_FAILED: Communications failure with the Cisco Smart License Utility (CSLU) : Unable to resolve server hostname/domain name

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the command "sh platform".

If my suspicion is correct, the stack could be running a buggy version with "stack merge" as the cause.

marce1000
VIP
VIP

 

           - Use latest-advisory software release, check if that can help.

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

How do I find out the version of the latest advisory software release of IOS?

 

 - FYI : https://software.cisco.com/download/home/286313983/type/282046477/release/Amsterdam-17.3.3

            (Check download page, for the particular model , look for Suggested Release)

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Thank you for the pointer. So the suggested release is 17.3.3 and that's exactly the version number both of my switches in the stack are running.

 

          - Besides possible solutions offered here, then usually the  next step is to escalate the problem by contacting TAC.

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

If that 2nd switch has joined the stack, post the complete output to the following commands: 

dir flash-2:
dir flash-2:core
dir crashinfo-2:
sh log on switch 2 uptime detail

I actually managed to reload that switch earlier today running "reload slot 2" and it rejoined the stack, so far so good. Here is the output you have asked for:

 

sw#dir flash-2:
Directory of flash-2:/

8106    -rw-          2097152   Aug 5 2021 12:09:30 +02:00  nvram_config_bkup
8112    -rw-          2097152   Aug 5 2021 12:09:30 +02:00  nvram_config
40481   drwx             4096   Aug 5 2021 11:10:46 +02:00  .installer
8109    -rw-              556   Aug 5 2021 11:09:30 +02:00  vlan.dat
48579   drwx             4096   Aug 5 2021 11:09:26 +02:00  license_evlog
56673   drwx             4096   Aug 5 2021 11:09:09 +02:00  .prst_sync
8107    -rw-            15139   Aug 5 2021 11:08:04 +02:00  rdope_out.txt
8105    -rw-                0   Aug 5 2021 11:08:04 +02:00  dope_hist
8108    -rw-               89   Aug 5 2021 11:08:01 +02:00  rdope.log
8102    -rw-           134458   Aug 5 2021 11:05:55 +02:00  memleak.tcl
8098    -rw-             2130   Aug 5 2021 11:05:40 +02:00  boothelper.log
80984   drwx             4096   Aug 5 2021 11:05:38 +02:00  dc_profile_dir
8099    -rw-              407   Aug 5 2021 11:05:19 +02:00  bootloader_evt_handle.log
8110    drwx             4096   Aug 5 2021 11:04:08 +02:00  .rommon_sync
8101    -rw-             2130   Aug 5 2021 10:59:05 +02:00  boothelper.log.old
40491   drwx             4096  Jun 28 2021 11:57:05 +02:00  pnp-tech
48578   drwx             4096  May 24 2021 12:17:45 +02:00  .rollback_timer
40518   -rw-         40648801  May 24 2021 12:12:24 +02:00  cat9k_lite-rpboot.17.03.03.SPA.pkg
40514   -rw-             4919  May 24 2021 12:12:24 +02:00  packages.conf
40517   -rw-         11031572  May 24 2021 12:11:00 +02:00  cat9k_lite-webui.17.03.03.SPA.pkg
40516   -rw-          4133912  May 24 2021 12:11:00 +02:00  cat9k_lite-srdriver.17.03.03.SPA.pkg
40515   -rw-        426931224  May 24 2021 12:11:00 +02:00  cat9k_lite-rpbase.17.03.03.SPA.pkg
40486   drwx             4096  May 24 2021 12:05:48 +02:00  .dbpersist
40484   drwx             4096  May 24 2021 12:02:56 +02:00  core
48580   drwx             4096  May 24 2021 12:02:36 +02:00  pnp-info
40488   drwx             4096  May 24 2021 12:02:35 +02:00  onep
89057   drwx             4096  May 24 2021 12:01:21 +02:00  .USWAP
113345  drwx             4096  May 24 2021 11:56:40 +02:00  Tbot
105249  drwx             4096  May 24 2021 11:56:39 +02:00  .CRFT
80986   drwx             4096  May 24 2021 11:56:33 +02:00  sys_report
80961   drwx             4096  May 24 2021 11:56:32 +02:00  tech_support
56676   drwx             4096  May 24 2021 11:56:32 +02:00  ss_disc
8100    -rw-          5242880  May 24 2021 11:56:32 +02:00  ssd

1956904960 bytes total (1359216640 bytes free)

sw#dir flash-2:core
Directory of flash-2:/core/

40490   -rw-                1   Aug 5 2021 14:09:26 +02:00  .callhome
64769   drwx             4096  May 24 2021 11:56:29 +02:00  modules

1956904960 bytes total (1359216640 bytes free)

sw#dir crashinfo-2:
Directory of crashinfo-2:/

29313   drwx            24576   Aug 5 2021 14:19:42 +02:00  tracelogs
15      -rw-         11033250   Aug 5 2021 10:57:35 +02:00  sw_2_RP_0-system-report_2_20210805-105728-CEST.tar.gz
14      -rw-          2932907   Aug 4 2021 19:48:30 +02:00  sw_2_RP_0_trace_archive_0-20210804-194826.tar.gz
13      -rw-          9819246   Aug 4 2021 18:00:54 +02:00  sw_2_RP_0-system-report_2_20210804-180048-CEST.tar.gz
11      -rw-          2797408   Aug 4 2021 18:00:46 +02:00  sw_2_RP_0_trace_archive_0-20210804-180041.tar.gz
12      -rw-                0  Dec 12 2020 05:36:51 +01:00  koops.dat

825753600 bytes total (751304704 bytes free)

sw#sh log on switch 2 uptime detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 05/24/2021 12:01:55
Total uptime            :  0  years  0  weeks  1  days  0  hours  43 minutes
Total downtime          :  0  years  10 weeks  2  days  1  hours  20 minutes
Number of resets        : 8
Number of slot changes  : 1
Current reset reason    : Reload Slot Command
Current reset timestamp : 08/05/2021 11:06:41
Current slot            : 2
Chassis type            : 247
Current uptime          :  0  years  0  weeks  0  days  3  hours  0  minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp          | Reset                       | Uptime
MM/DD/YYYY HH:MM:SS | Reason                      | years weeks days hours minutes
--------------------------------------------------------------------------------
05/24/2021 12:01:55   Power Failure or Unknown      0     0     0     0     0  
05/24/2021 12:20:42   Image Install                 0     0     0     0     15 
05/24/2021 12:24:27   Reload Command                0     0     0     0     0  
06/28/2021 11:54:49   Power Failure or Unknown      0     0     0     0     0  
06/28/2021 15:37:30   Power Failure or Unknown      0     0     0     1     0  
08/04/2021 15:39:30   Power Failure or Unknown      0     0     0     1     0  
08/04/2021 18:03:26   stack merge                   0     0     0     1     58 
08/05/2021 11:00:07   stack merge                   0     0     0     16    0  
08/05/2021 11:06:41   Reload Slot Command           0     0     0     0     0  
--------------------------------------------------------------------------------


@MichaelBalzer77353 wrote:
08/04/2021 18:03:26   stack merge                   0     0     0     1     58 
08/05/2021 11:00:07   stack merge                   0     0     0     16    0 

Hello there, my "friend".  Nice to see you again.  (F*ck!)


@MichaelBalzer77353 wrote:
15      -rw-         11033250   Aug 5 2021 10:57:35 +02:00  sw_2_RP_0-system-report_2_20210805-105728-CEST.tar.gz
13      -rw-          9819246   Aug 4 2021 18:00:54 +02:00  sw_2_RP_0-system-report_2_20210804-180048-CEST.tar.gz

@MichaelBalzer77353,

There several known bugs with the 9200/9200L, 9300/9300L that involves "stack merge".  This is a very "generic" message that revolves around the when something "blows up" with the switch stack process, like a memory leak.  

I cannot guarantee anything but if you can attach the two system reports, I may be able to determine what is causing the stack merge.  Again, I cannot guarantee anything -- I do not work for Cisco or Cisco TAC and I have very simple methods of "looking" (vs analyzing) those crashinfo files.  

Yes I joined that switch #2 yesterday in the stack but this morning when I checked it was "Removed". So what I did a bit earlier today is to "reload slot 2" and now it looks all good. But I wonder how could this happen to be "Removed" on it's own...

 

Here are the output you have asked for:

 

sw#dir flash-2:
Directory of flash-2:/

8106    -rw-          2097152   Aug 5 2021 12:09:30 +02:00  nvram_config_bkup
8112    -rw-          2097152   Aug 5 2021 12:09:30 +02:00  nvram_config
40481   drwx             4096   Aug 5 2021 11:10:46 +02:00  .installer
8109    -rw-              556   Aug 5 2021 11:09:30 +02:00  vlan.dat
48579   drwx             4096   Aug 5 2021 11:09:26 +02:00  license_evlog
56673   drwx             4096   Aug 5 2021 11:09:09 +02:00  .prst_sync
8107    -rw-            15139   Aug 5 2021 11:08:04 +02:00  rdope_out.txt
8105    -rw-                0   Aug 5 2021 11:08:04 +02:00  dope_hist
8108    -rw-               89   Aug 5 2021 11:08:01 +02:00  rdope.log
8102    -rw-           134458   Aug 5 2021 11:05:55 +02:00  memleak.tcl
8098    -rw-             2130   Aug 5 2021 11:05:40 +02:00  boothelper.log
80984   drwx             4096   Aug 5 2021 11:05:38 +02:00  dc_profile_dir
8099    -rw-              407   Aug 5 2021 11:05:19 +02:00  bootloader_evt_handle.log
8110    drwx             4096   Aug 5 2021 11:04:08 +02:00  .rommon_sync
8101    -rw-             2130   Aug 5 2021 10:59:05 +02:00  boothelper.log.old
40491   drwx             4096  Jun 28 2021 11:57:05 +02:00  pnp-tech
48578   drwx             4096  May 24 2021 12:17:45 +02:00  .rollback_timer
40518   -rw-         40648801  May 24 2021 12:12:24 +02:00  cat9k_lite-rpboot.17.03.03.SPA.pkg
40514   -rw-             4919  May 24 2021 12:12:24 +02:00  packages.conf
40517   -rw-         11031572  May 24 2021 12:11:00 +02:00  cat9k_lite-webui.17.03.03.SPA.pkg
40516   -rw-          4133912  May 24 2021 12:11:00 +02:00  cat9k_lite-srdriver.17.03.03.SPA.pkg
40515   -rw-        426931224  May 24 2021 12:11:00 +02:00  cat9k_lite-rpbase.17.03.03.SPA.pkg
40486   drwx             4096  May 24 2021 12:05:48 +02:00  .dbpersist
40484   drwx             4096  May 24 2021 12:02:56 +02:00  core
48580   drwx             4096  May 24 2021 12:02:36 +02:00  pnp-info
40488   drwx             4096  May 24 2021 12:02:35 +02:00  onep
89057   drwx             4096  May 24 2021 12:01:21 +02:00  .USWAP
113345  drwx             4096  May 24 2021 11:56:40 +02:00  Tbot
105249  drwx             4096  May 24 2021 11:56:39 +02:00  .CRFT
80986   drwx             4096  May 24 2021 11:56:33 +02:00  sys_report
80961   drwx             4096  May 24 2021 11:56:32 +02:00  tech_support
56676   drwx             4096  May 24 2021 11:56:32 +02:00  ss_disc
8100    -rw-          5242880  May 24 2021 11:56:32 +02:00  ssd

1956904960 bytes total (1359216640 bytes free)

sw#dir flash-2:core
Directory of flash-2:/core/

40490   -rw-                1   Aug 5 2021 14:09:26 +02:00  .callhome
64769   drwx             4096  May 24 2021 11:56:29 +02:00  modules

1956904960 bytes total (1359216640 bytes free)

sw#dir crashinfo-2:
Directory of crashinfo-2:/

29313   drwx            24576   Aug 5 2021 14:19:42 +02:00  tracelogs
15      -rw-         11033250   Aug 5 2021 10:57:35 +02:00  sw_2_RP_0-system-report_2_20210805-105728-CEST.tar.gz
14      -rw-          2932907   Aug 4 2021 19:48:30 +02:00  sw_2_RP_0_trace_archive_0-20210804-194826.tar.gz
13      -rw-          9819246   Aug 4 2021 18:00:54 +02:00  sw_2_RP_0-system-report_2_20210804-180048-CEST.tar.gz
11      -rw-          2797408   Aug 4 2021 18:00:46 +02:00  sw_2_RP_0_trace_archive_0-20210804-180041.tar.gz
12      -rw-                0  Dec 12 2020 05:36:51 +01:00  koops.dat

825753600 bytes total (751304704 bytes free)

sw#sh log on switch 2 uptime detail
--------------------------------------------------------------------------------
UPTIME SUMMARY INFORMATION
--------------------------------------------------------------------------------
First customer power on : 05/24/2021 12:01:55
Total uptime            :  0  years  0  weeks  1  days  0  hours  43 minutes
Total downtime          :  0  years  10 weeks  2  days  1  hours  20 minutes
Number of resets        : 8
Number of slot changes  : 1
Current reset reason    : Reload Slot Command
Current reset timestamp : 08/05/2021 11:06:41
Current slot            : 2
Chassis type            : 247
Current uptime          :  0  years  0  weeks  0  days  3  hours  0  minutes
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
UPTIME CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Time Stamp          | Reset                       | Uptime
MM/DD/YYYY HH:MM:SS | Reason                      | years weeks days hours minutes
--------------------------------------------------------------------------------
05/24/2021 12:01:55   Power Failure or Unknown      0     0     0     0     0  
05/24/2021 12:20:42   Image Install                 0     0     0     0     15 
05/24/2021 12:24:27   Reload Command                0     0     0     0     0  
06/28/2021 11:54:49   Power Failure or Unknown      0     0     0     0     0  
06/28/2021 15:37:30   Power Failure or Unknown      0     0     0     1     0  
08/04/2021 15:39:30   Power Failure or Unknown      0     0     0     1     0  
08/04/2021 18:03:26   stack merge                   0     0     0     1     58 
08/05/2021 11:00:07   stack merge                   0     0     0     16    0  
08/05/2021 11:06:41   Reload Slot Command           0     0     0     0     0  
--------------------------------------------------------------------------------

It crashed again and now I have the full log from the buffer:

 

Aug  5 15:29:15: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)
Aug  5 15:29:15: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_DOWN)
Aug  5 15:29:15: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_REDUNDANCY_STATE_CHANGE)
Aug  5 15:29:15: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 2 gone DOWN!
Aug  5 15:29:15: %STACKMGR-4-SWITCH_REMOVED: Switch 1 R0/0: stack_mgr: Switch 2 has been removed from the stack.
Aug  5 15:29:15: %RF-5-RF_RELOAD: Peer reload. Reason: EHSA standby down
Aug  5 15:29:16: %IOSXE_REDUNDANCY-6-PEER_LOST: Active detected switch 2 is no longer standby
Aug  5 15:29:16: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 2: EMP_RELAY: Channel DOWN!

 

        >....

Aug  5 15:29:15: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)

  Are there any other related messages around the same time before this message (see above)

 M. 



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !
Review Cisco Networking products for a $25 gift card