09-28-2018 09:50 AM - edited 03-08-2019 04:16 PM
I have been upgrading stacks of 2960x switches across our environment from 15.0.X to 15.2.6E1 and I have been experiencing the same issue repeatedly.
After loading the new IOS on the switch stack and reloading (either from console or vty) the stack will drop a member and report its stack cables as DOWN/DOWN and you lose the remote control of the dropped member, reseating the stack cables do not effect dropped member, the only way the member can rejoin the stack is after a power cycle. Obviously this is a serious complication due to the nature of physical intervention required to remediate the issue.
I have upgraded a few dozen stacks in the past few months and have seen this issue over 10 times now. Anyone else experience this before?
I have found a work-around for this, if you apply change the boot variable of all switches then reload their slot from the master, they will boot and join the stack in VERSION MISMATCH mode. Then you can reload the master and the stack will converge at the new IOS version. I have only tested this in a non-pro environment though... just kind of a clunky work around :/
09-28-2018 12:14 PM
Can you connect the laptop to console and post the full booting process.
09-28-2018 02:01 PM
I cant provide any console output or logs related to this issue. I will say that observing the console output from the stack master, the election timer goes off, the master is elected, and the switches will join the stack *except* for the affected member that will drop its membership and report a broken stack ring.
09-28-2018 02:12 PM
how many switches in stack, you do have priory set for master and rest of the members ?
09-28-2018 07:09 PM - edited 09-28-2018 07:11 PM
@HungryDog100 wrote:
I have been upgrading stacks of 2960x switches across our environment from 15.0.X to 15.2.6E1 and I have been experiencing the same issue repeatedly.
Is this from an 15.0(2)EX3 (and earlier)? If this is the case then there is a ROMmon bug, CSCut90593, and requires the power cable to the unit to be pulled ("reload" command is not sufficient).
09-29-2018 04:28 AM
Hi,
This seems like a "Bug CSCut90593" or "CSCuu00752". Is Uplink connected using an SFP? If yes then try to remove the SFP and hard reboot the switch.
If it is possible then share console logs of the boot process.
Regards,
Deepak Kumar
09-29-2018 06:48 AM
we have asked before for the console logs for the boot process, he made comment that he can not provide at this moment, we dont know the reason.
10-01-2018 09:53 AM
I have encountered the bug CSCuu00752 on another switch in my environment, and that was very obvious from the logs claiming "POST: ACT2 Authentication : End, Status Passed FlexStack Module SmartChip Authentication Failed"
This issue is only marked by the dropped member in the stack, and logs that claim that the stack ring has been broken, not errors were observed from the console output during boot.
CSCut90593 does not apply in this instance because the affected switch will boot itself every time, but fail to join the stack and report DOWN/DOWN stack cables.
10-03-2018 01:05 PM
Attempting one last bump at this. Someone has got to have experienced this same issue...?
09-29-2018 07:25 AM
10-01-2018 09:40 AM
For my method of upgrading the stacks:
I use the .bin file and FTP it to the master switch, then I copy the .bin from the master's flash to each of the member's flash
After, i use "Boot system switch all flash:IOS-NAME.Bin" then write the config, and reload the stack either from SSH or console, I have seen this issue arise from both the SSH and console sessions.
As i said in the initial post, the work around I have discovered is to reload members individually to force a VERSION MISMATCH status, then reload the master which allows all switches in the stack to converge as READY with the new IOS version. I have not had an issue with a member retrieving or loading up the new IOS.
10-03-2018 03:32 PM
I have just encountered this error again in a 2-switch stack. Sw1 was the stackmaster and sw2 was a member, upon reloading the stack after changing the boot variable the following is observed:
Switch/Stack Mac Address : 204c.9e2f.8a80
H/W Current
Switch# Role Mac Address Priority Version State
----------------------------------------------------------
1 Member 0000.0000.0000 0 0 Provisioned
*2 Master 204c.9e2f.8a80 10 4 Ready
Stack Port Status Neighbors
Switch# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
2 Down Down None None
Switch 1 booted its new IOS .bin, but dropped from the stack and the stack cables are reporting down/down. What gives? I cannot be the only individual seeing this type of behavior, I have encountered this dozens of times now.
06-03-2020 09:21 AM
I have had this problem too (multiple times) even up through version 15.2-7.E0a
If you can monitor the console port you'd see this error
000071: Jun 2 22:40:39.898 PDT: %ILET-1-DEVICE_AUTHENTICATION_FAIL: The FlexStack Module inserted in this switch may not have been manufactured by Cisco or with Cisco's authorization. If your use of this product is the cause of a support issue, Cisco may deny operation of the product, support under your warranty or under a Cisco technical support program such as Smartnet. Please contact Cisco's Technical Assistance Center for more information
My resolution is cold booting switch (per the bugs listed in this link)
Issue 2
The hardware defect in Cisco bug ID CSCuu00752 applies specifically to the FlexStack Plus module (C2960X-STACK=) only. These errors might be seen when a 2960X is booted up with an affected Flexstack Plus module. Note that this issue affects less than 0.03% of the install base.
Solution
Issue 1
In order to resolve Cisco bug IDs CSCul88801, CSCur56395, and CSCut53599, upgrade the software to release 15.2(2)E4, 15.2(3)E3, or 15.2(4)E or later and then hard boot (unplug the power cable in order to power off/on the switch) the switch. If a switch stack is in use, hard boot each switch in the stack. If RPS is in use, hard boot the RPS as well.
Why a One Time Hard Boot is Required
This issue has to do with the internal i2c bus getting into a bad state. The releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later images have the fix, but the switch might require a hard boot (unplug the power cable in order to power off/on the switch) in order to reset power to the bus if the bus was already in the bad state prior to the upgrade. The code upgrade procedure itself initiates a soft boot in order to load the image, but the bus maintains power through that process so an existing bad bus state might not get cleared. Once the issue is cleared after the hard boot, releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later will ensure it does not come back in the future even during future reloads or power outages. If a switch has not yet experienced the issue, then an upgrade to releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later without a hard boot will be enough to avoid the issue in the future.
06-03-2020 06:29 PM
@gwarn wrote:
%ILET-1-DEVICE_AUTHENTICATION_FAIL
If you're getting this error message, RMA the switch. The "one time hard reboot" doesn't work most of the time. The problem will re-surface at a later date.
This is a known issue: FN - 63972 - Cisco C2960X-STACK= Module Might Display Authentication Failure Error Message
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide