cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6307
Views
10
Helpful
13
Replies

2960x stack members drop after IOS upgrade/reboot

HungryDog100
Level 1
Level 1

I have been upgrading stacks of 2960x switches across our environment from 15.0.X to 15.2.6E1 and I have been experiencing the same issue repeatedly.

 

After loading the new IOS on the switch stack and reloading (either from console or vty) the stack will drop a member and report its stack cables as DOWN/DOWN and you lose the remote control of the dropped member, reseating the stack cables do not effect dropped member, the only way the member can rejoin the stack is after a power cycle. Obviously this is a serious complication due to the nature of physical intervention required to remediate the issue. 

 

I have upgraded a few dozen stacks in the past few months and have seen this issue over 10 times now. Anyone else experience this before? 

 

I have found a work-around for this, if you apply change the boot variable of all switches then reload their slot from the master, they will boot and join the stack in VERSION MISMATCH mode. Then you can reload the master and the stack will converge at the new IOS version. I have only tested this in a non-pro environment though... just kind of a clunky work around :/

13 Replies 13

balaji.bandi
Hall of Fame
Hall of Fame

Can you connect the laptop to console and post the full booting process.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

I cant provide any console output or logs related to this issue. I will say that observing the console output from the stack master, the election timer goes off, the master is elected, and the switches will join the stack *except* for the affected member that will drop its membership and report a broken stack ring. 

how many switches in stack, you do have priory set for master and rest of the members ?

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Leo Laohoo
Hall of Fame
Hall of Fame

@HungryDog100 wrote:

I have been upgrading stacks of 2960x switches across our environment from 15.0.X to 15.2.6E1 and I have been experiencing the same issue repeatedly. 



Is this from an 15.0(2)EX3 (and earlier)?  If this is the case then there is a ROMmon bug, CSCut90593, and requires the power cable to the unit to be pulled ("reload" command is not sufficient).

Deepak Kumar
VIP Alumni
VIP Alumni

Hi,

This seems like a "Bug CSCut90593" or "CSCuu00752". Is Uplink connected using an SFP? If yes then try to remove the SFP and hard reboot the switch.

If it is possible then share console logs of the boot process.

 

Regards,

Deepak Kumar

 

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

we have asked before for the console logs for the boot process, he made comment that he can not provide at this moment, we dont know the reason.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

I have encountered the bug  CSCuu00752 on another switch in my environment, and that was very obvious from the logs claiming "POST: ACT2 Authentication : End, Status Passed FlexStack Module SmartChip Authentication Failed"

 

This issue is only marked by the dropped member in the stack, and logs that claim that the stack ring has been broken, not errors were observed from the console output during boot. 

 

CSCut90593 does not apply in this instance because the affected switch will boot itself every time, but fail to join the stack and report DOWN/DOWN stack cables. 

Attempting one last bump at this. Someone has got to have experienced this same issue...?

Hello You don't state how you are upgrading these stacks- Via what method? Have you tried a manual approach such as copying each bin file to each switch changing the boot variable for the whole stack and then reload the whole stack ? Do you have auto-upgrade enabled if so try disabling this feature sh boot no boot auto-copy-sw

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

For my method of upgrading the stacks:

 

I use the .bin file and FTP it to the master switch, then I copy the .bin from the master's flash to each of the member's flash

 

After, i use "Boot system switch all flash:IOS-NAME.Bin" then write the config, and reload the stack either from SSH or console, I have seen this issue arise from both the SSH and console sessions. 

 

As i said in the initial post, the work around I have discovered is to reload members individually to force a VERSION MISMATCH status, then reload the master which allows all switches in the stack to converge as READY with the new IOS version. I have not had an issue with a member retrieving or loading up the new IOS.

HungryDog100
Level 1
Level 1

I have just encountered this error again in a 2-switch stack. Sw1 was the stackmaster and sw2 was a member, upon reloading the stack after changing the boot variable the following is observed:

 

Switch/Stack Mac Address : 204c.9e2f.8a80
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
----------------------------------------------------------
 1       Member 0000.0000.0000     0      0       Provisioned         
*2       Master 204c.9e2f.8a80     10     4       Ready               

         Stack Port Status             Neighbors     
Switch#  Port 1     Port 2           Port 1   Port 2
--------------------------------------------------------
  2       Down       Down             None     None

 

Switch 1 booted its new IOS .bin, but dropped from the stack and the stack cables are reporting down/down. What gives? I cannot be the only individual seeing this type of behavior, I have encountered this dozens of times now.

gwarn
Level 1
Level 1

I have had this problem too (multiple times) even up through version 15.2-7.E0a

If you can monitor the console port you'd see this error

000071: Jun 2 22:40:39.898 PDT: %ILET-1-DEVICE_AUTHENTICATION_FAIL: The FlexStack Module inserted in this switch may not have been manufactured by Cisco or with Cisco's authorization. If your use of this product is the cause of a support issue, Cisco may deny operation of the product, support under your warranty or under a Cisco technical support program such as Smartnet. Please contact Cisco's Technical Assistance Center for more information

 

My resolution is cold booting switch (per the bugs listed in this link)

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-2960-x-series-switches/118837-technote-catalyst-00.html

 Issue 2

The hardware defect in Cisco bug ID CSCuu00752 applies specifically to the FlexStack Plus module (C2960X-STACK=) only. These errors might be seen when a 2960X is booted up with an affected Flexstack Plus module. Note that this issue affects less than 0.03% of the install base.

  • "POST: ACT2 Authentication : End, Status Passed FlexStack Module SmartChip Authentication Failed". Note that POST is the "Power On Self Test" that is run when the switch boots up. ACT2 is the Smart Chip responsible for hardware authentication.
  • "%ILET-1-DEVICE_AUTHENTICATION_FAIL: The FlexStack Module inserted in this switch might not have been manufactured by Cisco or with Cisco's authorization. If your use of this product is the cause of a support issue, Cisco might deny operation of the product, support under your warranty or under a Cisco technical support program such as Smartnet. Please contact Cisco's Technical Assistance Center (TAC) for more information."

 Solution

 Issue 1

In order to resolve Cisco bug IDs CSCul88801, CSCur56395, and CSCut53599, upgrade the software to release 15.2(2)E4, 15.2(3)E3, or 15.2(4)E or later and then hard boot (unplug the power cable in order to power off/on the switch) the switch. If a switch stack is in use, hard boot each switch in the stack. If RPS is in use, hard boot the RPS as well.

 Why a One Time Hard Boot is Required

This issue has to do with the internal i2c bus getting into a bad state. The releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later images have the fix, but the switch might require a hard boot (unplug the power cable in order to power off/on the switch) in order to reset power to the bus if the bus was already in the bad state prior to the upgrade. The code upgrade procedure itself initiates a soft boot in order to load the image, but the bus maintains power through that process so an existing bad bus state might not get cleared. Once the issue is cleared after the hard boot, releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later will ensure it does not come back in the future even during future reloads or power outages. If a switch has not yet experienced the issue, then an upgrade to releases 15.2(2)E4, 15.2(3)E3, and 15.2(4)E or later without a hard boot will be enough to avoid the issue in the future.


@gwarn wrote:

%ILET-1-DEVICE_AUTHENTICATION_FAIL


If you're getting this error message, RMA the switch.  The "one time hard reboot" doesn't work most of the time.  The problem will re-surface at a later date.

This is a known issue:  FN - 63972 - Cisco C2960X-STACK= Module Might Display Authentication Failure Error Message

Review Cisco Networking for a $25 gift card