cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
17354
Views
0
Helpful
21
Replies

CSCvf33653 - Controller port error, Power given, but State Machine Power Good wait timer timed out - 2

Having this bug also running 16.12.4 on multiple 3850's. Work around was to power cycle that stack member. After power cycling POE was restored on all the ports. This is not a permanent solution and would like a more stable solution. As mentioned before this bug has not been resolved using ver 16.12.4

21 Replies 21

Leo Laohoo
Hall of Fame
Hall of Fame

You need to raise a TAC Case and get your Cisco AM/SE/PSS involved. 

3850 is on it's last "leg":  Cisco has announced the end-of-sale for the 16.12.X train.  This means maybe one final release and then the curtains fall.  

Get your Cisco AM/SE/PSS involved so they can help "push" for this bug to be fixed in last/final release.  

I have same problem on 16.12.3a code which supposed to have fix for this bug. So, this bug was not properly fixed for all situations. Another fix is needed. 

 

Hello ,

 

I have been working with TAC and so far we have one solution that has worked for me. It is to input the command " Power inline  static max 30000" into the port configuration. The port will not come up but leave it for about 1 minute and the interface and POE will start to work. Hope this helps. We are working on a more permanent solution which is to install a (Software Maintenance Upgrade) (link below) but haven't tested that option yet. 

 

https://software.cisco.com/download/home/284455429/type/286308587/release/Gibraltar-16.12.4

 

 

Thank you. I will give it a try when next time this issue will show up. About upgrade to 16.12.4 I would doubt that this will fix issue because I saw multiple posts when people reported having same issue on 16.12.4 code. 

 


@Art Astafiev wrote:

Thank you. I will give it a try when next time this issue will show up. About upgrade to 16.12.4 I would doubt that this will fix issue because I saw multiple posts when people reported having same issue on 16.12.4 code. 


Do not, under and circumstances, upgrade to 16.12.4. 

I am considering calling IOS-XE version 16.12.4 a train-wreck in the middle of a forest fire -- this version should be pulled from getting downloaded.  

Thank you. Looks like Cisco recognized issues in 16.12.4 track on old hardware such as 3850 - they changed recommended release for this hardware to be 16.9.6. Probably 16.12.4 on this hardware is really bad because 16.9.x track has last software maintenance day at Jan-31-2021, which basically mean that after January Cisco will no longer fix bugs on this track (only security patches). To make code on this track be "most recommended" you need to be aware about big issues on more recent code. I will wait with 3850s upgrade until new 16.12.5 will be released. 


@Art Astafiev wrote:

I will wait with 3850s upgrade until new 16.12.5 will be released. 


There is no actual guarantee the PoE bug(s) will be fixed in 16.12.5.  Remember, Cisco no longer tests their codes.  It is up to paying customers to report back to Cisco if the bug appears or not.  

From what I am seeing, this bug appears after >2 weeks uptime.  

I have over 300pcs 3850s on my network with thousands PD devices. So, I am sure that I will face this issue again soon. Then I will try command "Power inline  static max 30000" and also will keep broken switch running with issue until TAC can fix it without reboot on my 16.12.3a code. I will update on results. 

I just received two responds on my TAC case. I will list both one next to another

-------------------------------------------

Respond 1

---------------------------------------

I’ve been researching about this issue, and it was already identified that on 16.12.x trail there are some issues with PoE, this issue has been recreated and fixed. The fix will come with the 16.12.5 code release which is due out tentative on Jan22, 2021.

Please check this out:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvv50628

Cat3850 : PoE doesn't work - Power given, but State Machine Power Good wait timer timed out

CSCvv50628

Description

Symptom:

Switches and/or stack running versions Gibraltar 16.12.3 and Gibraltar 16.12.3a stop providing PoE on certain ports. This issue is seen after the following log is seen:

%ILPOWER-3-CONTROLLER_PORT_ERR: Controller port error, Interface Gix/0/y: Power given, but State Machine Power Good wait timer timed out.

The impacted ports could experience the following scenarios:

a) The PoE device will power up for a few seconds (5-45) and then it dies, there have been cases where the device powers up for up to 5 minutes.
c) No power at all is seen.
Disconnecting/reconnecting the cable or a shut/no shut the impacted port does not resolve the issue.

Conditions:

It looks like it is a matter of time (weeks) when the issue is seen.

Workaround:

So far the only workaround found is to reload the impacted stack/switch

Further Problem Description:

Once one single port reaches this status all the ports are likely to experience the same issue, this means: if a single port has this issue in stack, if any other port or ports is/are disconnected/re-connected these other ports will experience the same issue

Alternatively, a software patch (SMU) for 16.12.3a is tentatively planned for Oct 23, 2020

The SMU fix for this PoE issue is cat3k_caa-universalk9.16.12.03a.CSCvv28324.SPA.smu.bin. Please note that this patch is for the 16.12.3a release only:

https://software.cisco.com/download/home/284455380/type/286308587/release/Gibraltar-16.12.3a

More info regarding SMU upgrade procedures in general can be found here:

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3850/software/release/16-12/configuration_guide/sys_mgmt/b_1612_sys_mgmt_3850_cg/software_maintenance_upgrade.html

--------------

Respond 2

-----------------

I was looking at the TAC case, per the TAC engineer the fix will be on version 16.12.5 tentative release Jan 22, 2021. Also, there is a software maintenance upgrade (SMU) that was released to fix this issue, below is the link from where we can get the SMU to fix the PoE related issue. In the description of this SMU its mentioned that it is for SNMP, but it has been working fine for PoE issue as well. https://software.cisco.com/download/home/284455429/type/286308587/release/Gibraltar-16.12.4

 

File name: “cat3k_caa-universalk9.16.12.04.CSCvv28324.SPA.smu.bin”

This SMU will solve the issue with the %ILPOWER-3-CONTROLLER_PORT_ERR: Controller port error, Interface Gix/0/y: Power given, but State Machine Power Good wait timer timed out

Please see below the steps to make the install of the SMU:

Copy the SMU to the switch
And then use #install add file flash:<filename> active commit
Example: #install add file flash: cat3k_caa-universalk9.16.12.04.CSCvv28324.SPA.smu.bin active commit

It will ask you to reload the switch
After the reload it will be install
Use show install summary and in the state need to be "C" = Activated & Committed


@Art Astafiev wrote:

The SMU fix for this PoE issue is cat3k_caa-universalk9.16.12.03a.CSCvv28324.SPA.smu.bin. Please note that this patch is for the 16.12.3a release only:


Thanks for the update.  

1.  SMU filename cat3k_caa-universalk9.16.12.03a.CSCvv28324.SPA.smu.bin not "exclusive" to 16.12.3a because it can be downloaded and applied on stacks running on 16.12.3, 16.12.3a and 16.12.4 (Example:  cat3k_caa-universalk9.16.12.03.CSCvv28324.SPA.smu.bin & cat3k_caa-universalk9.16.12.04.CSCvv28324.SPA.smu.bin).  This bug is present since 16.12.1 and has not been fixed since.  

2.  I have several stacks running 16.12.4 with this specific SMU applied.  Just this week, one of the stack members (16.12.4 and SMU applied) have STOPPED providing PoE.  I have had to reboot the switch member in order to get the PD to work.  The stack only had a 2 week uptime.  

 


@Art Astafiev wrote:

Use show install summary and in the state need to be "C" = Activated & Committed


The command "sh install summary" will cause the terminal to "hang" and feel like "<expletive>, (did) the stack crash?" moment (aka "code brown").  A "less stressful" option is to use the "sh version" command.  

 


@Art Astafiev wrote:

The fix will come with the 16.12.5 code release


This bug was first reported on 16.12.1 but has not been fixed.  Fingers crossed if 16.12.5 will really fix this bug with "quality control" not really a priority.

 


@LizsanderRoque1725 wrote:

We are working on a more permanent solution which is to install a (Software Maintenance Upgrade) (link below) but haven't tested that option yet. 


I have 16.12.4 with the SMU (cat3k_caa-universalk9.16.12.04.CSCvv28324.SPA.smu.bin) applied and the issue re-appeared two weeks later.

Hi,

are you still facing this issue? We are on 16.12.4 planning to install SMU for the POE issue, is there any other work around for this?

Hello,

 

So far after installing the SMU on an handful of 3850 it has been a month and the issue has not reappeared on those 3850.

 

Hope this help!

Give it time, @LizsanderRoque1725.

 

FYI:  16.12.5 is scheduled for release on 21 January 2021.