01-13-2020 06:58 AM - edited 01-13-2020 07:52 AM
When specifying a file path for our config archive, the standby and member switches reboot. While the active switch stays up, the other switches in the stack reboot as soon as the command is entered. If the command is not removed before the switches complete their reboot sequence, they boot into ROMmon.
Is this expected behavior, config issue or bug? Any help is greatly appreciated. Thanks!
Hardware and versions of tested stacks.
Bootloader: 16.12.2r
IOS:16.12.2 and 16.9.4
9300-48A-UXM
9300-48A-P
Here are the commands entered:
(config)# archive
(config-archive)# path flash:/Netops/Rollback/
Prior to entering the commands:
Switch# Role Mac Address Priority Version State
-------------------------------------------------------------------------------------
*1 Active XXXX.XXXX.XXXX 15 V02 Ready
2 Standby XXXX.XXXX.XXXX 5 V02 Ready
3 Member XXXX.XXXX.XXXX 1 V02 Ready
Immediately After:
Labs-1FL-SS#sh switch
Switch/Stack Mac Address : XXXX.XXXX.XXXX - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Switch# Role Mac Address Priority Version State
-------------------------------------------------------------------------------------
*1 Active XXXX.XXXX.XXXX 15 V02 Ready
2 Member 0000.0000.0000 0 V02 Removed
3 Member 0000.0000.0000 0 V02 Removed
Solved! Go to Solution.
01-16-2020 08:59 AM
01-13-2020 07:27 AM
01-13-2020 07:38 AM
01-13-2020 07:51 AM
Thanks. Yes just typo on the post here.
01-13-2020 07:50 AM
Yeah I could not find any bugs. Here is the log output.
001288: .Jan 9 21:03:04.818: Config Sync: Bulk-sync failure due to PRC mismatch. Please check the full list of PRC failures via:
show redundancy config-sync failures prc
001289: .Jan 9 21:03:04.818: Config Sync: Starting lines from PRC file:
archive
! <submode> "archive"
- path flash:/NetOps/Rollback
! </submode> "archive"
001290: .Jan 9 21:03:04.818: Config Sync: Bulk-sync failure, Reloading Standby
001291: .Jan 9 21:03:05.825: %RF-5-RF_TERMINAL_STATE: Terminal state reached for (SSO)
001292: .Jan 9 21:03:06.274: %RF-5-RF_RELOAD: Peer reload. Reason: Bulk Sync Failure
001293: .Jan 9 21:03:06.644: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 3 gone DOWN!
001294: .Jan 9 21:03:06.657: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)
001295: .Jan 9 21:03:06.657: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_DOWN)
001296: .Jan 9 21:03:06.657: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_REDUNDANCY_STATE_CHANGE)
001297: .Jan 9 21:03:06.578: %STACKMGR-6-STACK_LINK_CHANGE: Switch 1 R0/0: stack_mgr: Stack port 2 on Switch 1 is down
001298: .Jan 9 21:03:06.620: %STACKMGR-6-STACK_LINK_CHANGE: Switch 2 R0/0: stack_mgr: Stack port 1 on Switch 2 is down
001299: .Jan 9 21:03:08.149: %RF-5-RF_RELOAD: Peer reload. Reason: EHSA standby down
01-13-2020 08:33 AM
01-16-2020 08:25 AM
Hi,
I just had this issue.
What i did and it seems to fix the issue was to create the archive folder on each switches flash.
If i just created it just on flash: it would reboot the other members. Since the other stack member cant save it to its local directory as it doesn't exist.
Try creating the directories on the other switches.
example
mkdir flash-1:Netops/Rollback/
mkdir flash-2:Netops/Rollback/
mkdir flash-2:Netops/Rollback/
01-16-2020 08:59 AM
11-23-2020 01:16 PM - edited 11-23-2020 01:17 PM
This problem appeared after I implemented config archiving on Friday on about 20 switches. 5 of those were stacked switches (sets of 2-4), of those 5, 4 stacks were IOS 16.12.4 (2 of which had this issue, one stack of 2x 9300's and one of 2x 3850's), the 5th one is a 3750 ver 12.2(44)SE5 stack of 4 (it had no issues). After adding the archive folder to the other switches in the stack I have not had any stacks reload the standby switches into ROMmon anymore.
11-03-2023 01:33 PM
This issue is still present on c9300, c9200, c3850, c3650. Looks like code dependent, as it hit most all of our c9300 running 17.3.4 and 17.6.5 but not all c9200 and only some c3850 and c3650 which are running multiple legacy code versions.
Workaround is crate directory in each switch stack flash or do not use folder at all in archive path configuration (just store it to root folder). Still, what would happen if you connect new switch to a stack with archive configured? Since there is no folder confgured on the new unit, it would most likely end up in rommon anyway.
I can't believe such catastrophic issue is not addressed in years - you can literally break your whole network with management plane only related configuration and since the only remediation is booting manually from rommon via console it may take very long time.
04-15-2024 12:50 PM - edited 04-15-2024 12:51 PM
Hello
just got hit by this bug also ! on 9200L-48P-4X on version 17.9.4
luckily i saw it on the first stack we implemented. i just rolled back the "no archive" and stack recovered (luckily it didn't boot into rommon, so it recovered automatically after some boots)
but i have the same remark as above: what happens when config archive is deployed in a dir and you add a switch to the stack later on ? it won't be stable until you add the directory manually. and becuase i feel implemementing config archive in the root is messy, i am going to stop using archive, too bad.
07-12-2024 12:00 PM
This bug is still alive and kicking stack members into a reboot loop.
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvc49871
We've just hit the bug on July 5th 2024.
I'm both surprised and disappointed that Cisco has done nothing to fix this.
They are definitely declining in terms of quality and reliability
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide