cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
602
Views
0
Helpful
4
Replies

3650 Stack won't boot with one specific switch

stonent01
Level 1
Level 1

This issue takes place of the course of a few years.  I'll say over a year or two ago we had a 3650 stack (3 switches) lock up on us and it wouldn't boot even when fully power cycled (done by my predecessor).  It was determined by my predecessor that if switch 1 was unplugged from the stacking cables, the other switches would boot fine.  So in the interest of time since the stack was not fully populated, switch 1 was disconnected and all ethernet cabling was migrated.  
When I took over management of the switches from him, I noticed that rack with the top switch not connected and I asked about it and was told "Don't plug it in, that switch got fried a while ago"
Again we weren't hurting for ports so I let it be.  Fast forward 2 days ago and we are now hurting for ports and I notice someone powered on that switch at some point.  I serial consoled into it and it's been running with no data cables to it for over a year (not connected to the stack.

So I asked again and they were surprised the switch seemed to be working so they said "ok, put it back in the stack".

I was serial consoled into Switch 1 and SSH'd into the 2 and 3 part of the stack.  As soon as I hot plugged the switch into the stack, the stack locked up.  It stopped pinging over the management SVI and got no activity at all on the serial console.  Switch 1 however was unaffected.  Nothing showing in the logs, and no console messages.

I unplugged the stack cables from switch 1 and instantly got serial output on switch 2 and 3 that IOS was booting. 

So yeah something still going on with that switch.  The following day I hot plugged in another 3650 and it was recognized correctly and requested I do a OS upgrade because the new switch 1 was running from a BIN file and the stack was running from packages.
The install took place and the stack was happy with no connectivity loss.

Has anyone seen this situation before?  I'd like to get the switch replaced but not sure if they are going to want to do a lot of troubleshooting. We know everything in the rack is good (power on UPS, stack cables work on the "new" switch)
I can't think of what I would do to troubleshoot it because to me it almost seems like a hardware issue in the switch or an internal short on the stack ports that would cause a lock up like that consistently.

4 Replies 4

stonent01
Level 1
Level 1

Hey! I think I may have solved my issue.  I powered up this switch today to wipe it in the event of an RMA and after the write erase and reload, I'm getting this, so I think it may have a bad system board.

Reload command is being issued on Active unit, this will reload the whole stack
Proceed with reload? [confirm]

Chassis 1 reloading, reason - Reload command
Sep 1 10:20:46.095 FP0/0: %PMAN-5-EXITACTION: Process manager is exiting: reload fp action requested
Sep 1 10:20:51.407 RP0/0: %PMAN-5-EXITACTION: Process manager is exiting: process exit w
octeon_wdt: WDT device closed unexpectedly. WDT will not stop!
reboot: Restarting system

 

Booting...
Interface GE 0 link down***ERROR: PHY link is down

Getting rest of image
Reading full image into memory...Check base package header ...: done = 16384
Getting rest of image
Reading full image into memory....done
Reading full base package into memory...: done = 27904017
Bundle Image
--------------------------------------
Kernel Address : 0x5342f3bc
Kernel Size : 0x365d89/3562889
Initramfs Address : 0x53795145
Initramfs Size : 0x16ddecc/23977676
Compression Format: mzip

Bootable image at @ ram:0x5342f3bc
Bootable image segment 0 address range [0x81100000, 0x81bffb30] is in range [0x80180000, 0x90000000].
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@boot_system: 377
Loading Linux kernel with entry point 0x816e0330 ...
Bootloader: Done loading app on core_mask: 0xf

### Launching Linux Kernel (flags = 0x5)

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

Mainboard hardware authentication failed. Abort init ...

how many switches in the stack - Looks like master switch looks bad, you can isolate the master switch and see other switches works as expected?

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help


@stonent01 wrote:
Mainboard hardware authentication failed.

This is a very well-known issue/feature with the switch.  There is nothing else to do (or fix) except to RMA.

Yeah I'll have the new switch on Tuesday.
Review Cisco Networking for a $25 gift card