cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
13819
Views
5
Helpful
30
Replies

Power state on Chassis # is redundancy-failed

HamR
Level 1
Level 1

Hi,

I know this error used to be common a while back, but I'm running 2.0(2q) with four apparently healthy PSUs per chassis in n+1 mode and consuming using less than 800 watts across four B200 M2 blades. Help?

Thanks,

Hamish

30 Replies 30

Robert Burns
Cisco Employee
Cisco Employee

Can you paste the output of the following commands:

scope chassis x

show psu detail

show psu-control detail

show fault

Regards,

Robert

David Alpizar
Cisco Employee
Cisco Employee

Hi Hamish ,

If you access the UCSM, do you see any PSU with a N/A status?

Does UCSM report any alerts such as upper non recoverable, thermal alerts or power redundancy lost?

Do you see any amber light on the power supplies?

You can reseat the power supplies one by one in order to see if they come back online.

Sometimes the chassis can generate thermal alerts and it can be related to a fan or even the IOM.

Go ahead and check the status of the power supplies physically and on UCSM.

Also collect the information from the commands Robert suggested.

Hello,

I'm having the same issue in 2.0(4d). I have a UCS-system with two 6120XP and a three 5108 chassis.

The system is configured for N+1 redundancy:

show psu-control detail

Psu Control:

    Redundancy: NPlus1

    Input Power: Ok

    Output Power: Ok

    Cluster Power: Slot 2 Master

    Overall Status: Failed

    Config Error: Redundancy Lost

C61UCSscto01-B /chassis # show fault

Severity  Code     Last Transition Time     ID       Description

--------- -------- ------------------------ -------- -----------

Major     F0408    2013-05-14T18:33:53.694    740475 Power state on chassis 1 is redundancy-failed

show psu detail

PSU:

    PSU: 1

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: 2500W 200-240VAC PSU for UCS 5108 Blade Server Chassis

    PID: N20-PAC5-2500W

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM1616016M

    HW Revision: 0

    PSU: 2

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: 2500W 200-240VAC PSU for UCS 5108 Blade Server Chassis

    PID: N20-PAC5-2500W

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM161703MM

    HW Revision: 0

    PSU: 3

    Overall Status: N/A

    Operability: N/A

    Threshold Status: N/A

    Power State: PwrSave

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: N/A

    Product Name: 2500W 200-240VAC PSU for UCS 5108 Blade Server Chassis

    PID: N20-PAC5-2500W

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM161703L8

    HW Revision: 0

    PSU: 4

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: 2500W 200-240VAC PSU for UCS 5108 Blade Server Chassis

    PID: N20-PAC5-2500W

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM170304LJ

    HW Revision: 0

Any clue?

Regards,

Javier

Hi,

Not sure why this is the first reply I've recieved notification for.

I opened a case with Cisco on this and was advised to change the policy to Grid, then back to N+1. This cleared the error for me.

Hope this helps.

Hamish

The problem might be with the subordinate IOM.  The active one, which returned the outputs requested above looks good.

Please do the following from the UCSM CLI:

ssh to fabric Interconnect A

connect iom x (x = chassis # showing redundancy lost)

show platform software cmcctrl power redundancy

ssh to fabric interconnect B

connect iom x

show platform software cmcctrl power redundancy

Thanks,

Robert

Hi Robert,

Here is the output of the command:

FabricB (active)

fex-1# show platform software cmcctrl power redundancy

==============================

Last update TS                 : 1718362

Stale TS                 : 1718422

Now                         : 1718378

Cluster master                 : yes

Policy                        : N+1

State                        : Lost

Total power available        : 7500

Total power usage        : 1713

Power budget requested        : 5472

-----------

Grid                        : 0

Active PS        : 0 1 3

Spare PS        : 2

Unavailable PS        :

-----------

==============================

FabricA (subordinate)

fex-1# show platform software cmcctrl power redundancy

==============================

Last update TS                 : 1718486

Stale TS                 : 1718546

Cluster master                 : no

Policy                        : N+1

State                        : Lost

Total power available        : 7500

Total power usage        : 1732

Power budget requested        : 5472

-----------

Grid                        : 0

Active PS        : 0 1 3

Spare PS        : 2

Unavailable PS        :

-----------

==============================

Thanks!

Have you tried to change the power policy to non-redundant and then back to N+1?

This is not disruptive.

Hi,

Yes, we did it, but didn't work for us...

Regards,

Javier

Javier,

Please open a TAC case, and attach the chassis (one for each chassis) and UCSM tech support to the case.

-Kenny

Please open a TAC case as Kenny suggested, but also look in to the following known defect and applying the workaround if applicable:

CSCub53747

Hi,

According the bug toolkit, the CSCub53747 should be fixed in 2.0(4a). We're now in 2.0(4d).

We've opened a TAC case. I'll put here the conclussions.

Thanks!

Javier

Hi Javier

Did the TAC provide you with a workable solution ? I seem to have the same issue as well on my end. However , in my case its a UCS 5108 with 4 Power Supplies connected and on N+1 option.

Hi,

Not yet. We're working on that, but the TAC recommends to go to 2.0(5b). These bugs seems to be hitting us:

CSCue49366, CSCud48637/CSCue33889.

Javier

Hi Javier,

Have you already solved the issue ?

I'm having the same problem and I'm on

2.1(1e)

Regards,

Bruno

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card