cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2768
Views
1
Helpful
4
Replies

3650 - Switches boot in a loop after renumbering

yleduc
Level 1
Level 1

Hello all,

I have 3 new switches that were stacked and working properly. They were numbered 1 - 3. I renumbered them to 3 - 5 as they will end up in an existing stack.

After rebooting, the whole stack is stuck into a loop. I followed someone's comment to  get to the switch: prompt, do a set, then unset STACK_1_1 and finally reboot. When doing that, it boot to the IOS but does not sustain a reboot - goes back into a loop.

 

any idea to solve this.

 

==================================================

switch: set
ABNORMAL_RESET_COUNT=3
ASIC_PCI_RESET=1
BOOT=flash:packages.conf
BOOT_LOADER_UPGRADE_DISABLE=1
BSI=0
CFG_MODEL_NUM=WS-C3650-48TD-S
CLEI_CODE_NUMBER=IPMV410BRE
CSR_PCIERST_DISCONNECTED=yes
DC_COPY=yes
D_STACK_DOMAIN_NUM=1
ECI_CODE_NUMBER=469507
LICENSE_BOOT_LEVEL=ipbasek9,all:C3650_48;
MAC_ADDR=70:6D:15:C6:F7:00
MANUAL_BOOT=no
MODEL_NUM=WS-C3650-48TD
MODEL_REVISION_NUM=Q0
MOTHERBOARD_ASSEMBLY_NUM=73-15896-05
MOTHERBOARD_REVISION_NUM=A0
MOTHERBOARD_SERIAL_NUM=xxxxxxxxxxx
RANDOM_NUM=1111594249
RECOVERY_BUNDLE=sda9:cat3k_caa-recovery.bin
RET_2_RCALTS=1645049601
STACK_1_1=1_0
SWITCH_IGNORE_STARTUP_CFG=0
SWITCH_NUMBER=3
SWITCH_PRIORITY=15
SYSTEM_SERIAL_NUM=xxxxxxxxxx
TAN_NUM=800-41239-03
TAN_REVISION_NUMBER=D0
TEMPLATE=advanced
TERMLINES=0
VERSION_ID=V04

 

=========================================================

Booting...
Interface GE 0 link down***ERROR: PHY link is down
The "IP_ADDR" environment variable is not set.

Getting rest of image
Reading full image into memory...Check base package header ...: done = 16384
Getting rest of image
Reading full image into memory....done
Reading full base package into memory...: done = 30877263
Bundle Image
--------------------------------------
Kernel Address : 0x537783bc
Kernel Size : 0x368a28/3574312
Initramfs Address : 0x53ae0de4
Initramfs Size : 0x19b106b/26939499
Compression Format: mzip

Bootable image at @ ram:0x537783bc
Bootable image segment 0 address range [0x81100000, 0x81c145b0] is in range [0x80180000, 0x90000000].
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@boot_system: 623
Loading Linux kernel with entry point 0x816e5650 ...
Bootloader: Done loading app on core_mask: 0xf

### Launching Linux Kernel (flags = 0x5)

%IOSXEBOOT-38672a0c4940f675bd0f0df3c8783244-new_cksum: (rp/0): 4
%IOSXEBOOT-38672a0c4940f675bd0f0df3c8783244-saved_cksum: (rp/0): 4

Waiting for 120 seconds for other switches to boot
############
Switch number is 4
All switches in the stack have been discovered. Accelerating discovery

Chassis 4 reloading, reason - Active/standby selection failed in 1+1 Mode
Feb 16 23:02:34.812: %PMAN-5-EXITACTION: F0/0: pvp: Process manager is exiting: reload fp action requested
Feb 16 23:02:37.680: %PMAN-5-EXITACTION: R0/0: pvp: Process manager is exiting: rp processes exit with reload switch code

reboot: Restarting system

 

Booting...

==============================================================

1 Accepted Solution

Accepted Solutions

Fixed the issue.

 

One suggestion was to re-image the switch. Was not sure about that as it would boot if removing STACK_1_1 variable.

 

Seems like the value of this variable STACK_1_1=1_0 has something to do with the switch number. After renumbering switch 1 to switch 3, the switch software could no longer find switch 1. therefore looping - not sure if it is the right behaviour - kind of funny the failover fails !!

 

In order to fix the problem, I renumbered switch 3 back to 1 through the ROMMON and save it to nvram using set_param command and rebooted. It went through the boot cycle twice - first time complaining about authentication.

 

Once I was in the IOS, I issued the command: sh switch stack-mode and noticed that it was 1+1. I did the switch clear stack-mode and rebooted. This time it came back to N+1. I recreated the stack and rebooted a few times. One time, it came back with 1+1, so decided to go in ROMMON on each switch and set STACK_1_1=0_0 and did the set_param and rebooted.

 

After a few reboot to confirm, I renumbered switch 1 to 3 and we are back in business the way it was planned.

View solution in original post

4 Replies 4

balaji.bandi
Hall of Fame
Hall of Fame

This looks for me some config issue messed up due to re-number follow below thread see if that fix the issue '

https://community.cisco.com/t5/switching/bootloop-c3850xs-in-vss-configuration-how-to-disable-vss-from/td-p/3192441

 

if not i suggest removing the stack, check each switch to make it a fresh stack and that should work.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

will try to remove the stack cables and reboot them one at the time to see which switch is affected by this problem.

As mentioned, the unset and boot works but only for one time. The second time (reload), we are back with the loop.

will try the following:

 switch: STACK_1_1=0_0

 switch:boot

 

switch clear stack-mode

no stackwise-virtual 

 

and see if that makes a difference. Will keep you posted

Fixed the issue.

 

One suggestion was to re-image the switch. Was not sure about that as it would boot if removing STACK_1_1 variable.

 

Seems like the value of this variable STACK_1_1=1_0 has something to do with the switch number. After renumbering switch 1 to switch 3, the switch software could no longer find switch 1. therefore looping - not sure if it is the right behaviour - kind of funny the failover fails !!

 

In order to fix the problem, I renumbered switch 3 back to 1 through the ROMMON and save it to nvram using set_param command and rebooted. It went through the boot cycle twice - first time complaining about authentication.

 

Once I was in the IOS, I issued the command: sh switch stack-mode and noticed that it was 1+1. I did the switch clear stack-mode and rebooted. This time it came back to N+1. I recreated the stack and rebooted a few times. One time, it came back with 1+1, so decided to go in ROMMON on each switch and set STACK_1_1=0_0 and did the set_param and rebooted.

 

After a few reboot to confirm, I renumbered switch 1 to 3 and we are back in business the way it was planned.

Leo Laohoo
Hall of Fame
Hall of Fame

@yleduc wrote:

Feb 16 23:02:34.812: %PMAN-5-EXITACTION: F0/0: pvp: Process manager is exiting: reload fp action requested
Feb 16 23:02:37.680: %PMAN-5-EXITACTION: R0/0: pvp: Process manager is exiting: rp processes exit with reload switch


Funny, I just finished fixing two brand new (fresh from a box) 9300-48H with similar behaviour.  

 

NOTE:  This is a very well-known issue with IOS-XE platform.  

 

In order to fix this, a copy of the firmware file must be in a USB thumb drive.  Here are the steps: 

  1. Remove the power cable. 
  2. Remove all stacking cables. 
  3. Attach a console cable. 
  4. While holding down the Mode button, apply power.  
  5. Keep holding down the Mode button for about 15 seconds and then let go of the Mode button. 
  6. Look at the switch prompt.  The switch should have boot into ROMMON. 
  7. If in ROMMON, proceed.  If not, start again. 
  8. In ROMMON, enter the command to force the switch to un-pack the firmware from the USB thumb drive:  emergency-install usbflash0:IOS_FILENAME.bin
  9. Rinse-n-Repeat until all switch members have recovered.  

Hope this helps.

Review Cisco Networking for a $25 gift card