cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
12465
Views
5
Helpful
30
Replies

Power state on Chassis # is redundancy-failed

HamR
Level 1
Level 1

Hi,

I know this error used to be common a while back, but I'm running 2.0(2q) with four apparently healthy PSUs per chassis in n+1 mode and consuming using less than 800 watts across four B200 M2 blades. Help?

Thanks,

Hamish

30 Replies 30

Bruno,

Could you please run the following commands and attach them here:

*connect local a

*connect iom #   <<< # of the chassis where you see the power prob

*show platform soft cmc thermal  status

*show platform soft cmc power redundancy

Next

*connect local b

*connect  iom # <<< again same chassis number

*show platform soft cmc power redundancy

-Kenny

Hi Kenny,

See the attached files regarding Fabric-A and Fabric-B

FI-BERNA-A /chassis # show psu-control detail

Psu Control:

    Redundancy: NPlus1

    Input Power: Ok

    Output Power: Ok

    Cluster Power: Slot 1 Master

    Overall Status: Failed

    Config Error: Redundancy Lost

FI-BERNA-A /chassis # show fault

Severity  Code     Last Transition Time     ID       Description

--------- -------- ------------------------ -------- -----------

Major     F0408    2013-06-20T11:03:06.262    341965 Power state on chassis 2 is redundancy-failed

FI-BERNA-A /chassis # show psu detail

PSU:

    PSU: 1

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM163000M7

    HW Revision: 0

    Firmware Version: N/A

    PSU: 2

    Overall Status: Operable

    Operability: Operable

    Threshold Status: N/A

    Power State: PwrSave

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: N/A

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM163000MT

    HW Revision: 0

    Firmware Version: N/A

    PSU: 3

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM162900A0

    HW Revision: 0

    Firmware Version: N/A

    PSU: 4

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM162900A1

    HW Revision: 0

    Firmware Version: N/A

Kind regards,

Bruno Fernandes

Bruno,

Thanks for the information.

So PSU is in a power save mode:

PSU: 2

    Overall Status: Operable

    Operability: Operable 

    Power State: PwrSave  <<<

From the Active IOM, I can see this:

fex-1# show platform software cmcctrl power redundancy

==============================

Cluster master                 : yes   <<< Shows we are in the primary IOM

Policy                        : N+1

State                        : Lost   <<< This is the only problem cause the PSU is fine

Total power available        : 7500  <<< 3 PSUs available

Total power usage        : 856 <<<< 1 PSU is more than enough to cover this

Power budget requested        : 5472 < However the chassis asks for 3 PSUs to be active, this is not a expected value

-----------

Grid                        : 0

        Active PS        : 0 2 3

        Spare PS        : 1    <<<< 1 is actually PSU 2, which shows up in power save mode

        Unavailable PS        :

-----------

==============================

Actions suggested:

1-Change the power policy from N+1 to Grid and vice versa

2-Follow the instructions in the bug CSCty64894 (Note those steps are not disruptive)

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCty64894

I hope this helps, otherwise, let us know.

-Kenny

Kenny,

Just to confirm with you:

Regarding the suggested actions:

1-Change the power policy from N+1 to Grid and vice versa

2-Follow the instructions in the bug CSCty64894 (Note those steps are not disruptive)

Neither step 1 or 2 area disruptive correct ? Step 2 has stated it's not....regarding step 1 I see no reason for being disruptive but I'm not 100% confident, sorry for the basic question, but I have no spare UCS to confirm and this is already in production....so I need to be 100% confident

Kind regards,

Bruno

Bruno,

Totally save steps, no disruption whatsoever since all your PSUs show up as operable.

-Kenny

Hi Kenny,

I have done both steps with no result, but then this morning juste repeated step 1 and waitted a little longer and the fault went gone, also the chassis recovered is healthy state (regarding poert redundancy). But still the the same PSU has a strange result "Threshold and Voltage Status"

Could this be that since we are using N+1 and in this case we area using only 3 psu ?????

FI-BERNA-B /chassis # show psu detail

PSU:

    PSU: 1

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM163000M7

    HW Revision: 0

    Firmware Version: N/A

    PSU: 2

    Overall Status: Operable

    Operability: Operable

   Threshold Status: N/A

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: N/A

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM163000MT

    HW Revision: 0

    Firmware Version: N/A

    PSU: 3

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM162900A0

    HW Revision: 0

    Firmware Version: N/A

    PSU: 4

    Overall Status: Operable

    Operability: Operable

    Threshold Status: OK

    Power State: On

    Presence: Equipped

    Thermal Status: OK

    Voltage Status: OK

    Product Name: Platinum AC PSU for N20-C6508 Blade Server Chassis

    PID: UCSB-PSU-2500ACPL

    VID: V00

    Vendor: Cisco Systems Inc

    Serial (SN): DTM162900A1

    HW Revision: 0

    Firmware Version: N/A

Kind regards,

Bruno Fernandes

Bruno,

Thanks for the feedback, I am glad the power redundancy error message is gone now.

In regards to the power supply not showing all the correct status info, I will recommend you to open a case, like Javier mentioned, this can be a I2C bus issue, where your PSU is not either being able to deliver his status messages or the primary IOM is just not receiving it, but this definitely needs further/deeper analysis.

Please open a TAC case.

-Kenny

Hi Bruno,

Not yet. TAC engineers are still working in the case (625642101). We're waiting for an RMA of the 4 PSUs in one chassis. One PSU seems to be caussing errors in the I2C bus. We'll probably upgrade to a 2.1 due to compatibility with new

SAN equipment (also to solve the bugs that seems to be affecting the system).

Regards

Saad Jalees
Level 1
Level 1

Hi Guys

Just to give all an update ,

 

1-Change the power policy from N+1 to Grid and vice versa

Worked for us.

Hi,

We recently upgrade to 2.1(2a). All seems to be working fine. Let's see how it behaves from now on...

Regards,

Javier

Got  notification this week only reagrding the power supplies for the Chassis.

UCS B-Series chassis power supplies have an issue which can cause shutdown when activated in a redundancy switchover.

Affected units can be identified by the version and serial number format defined in below link.

http://www.cisco.com/en/US/ts/fn/636/fn63628.html

Hi,

Thanks for the info. We have 2 chassis potentially affected by this issue. We have to check the deviation label.

Regards

Hello All,

If you happen to be affected by this Field Notice, please remember you need a TAC Service Request Number and just make reference to this FN#.  If you may attach screenshots/pics that will make processes to be faster and that way TAC does not have to ask for any further information.

Also, please remember that there is no need for a single case for each PSU; you may confirm how many of these PSUs have problems and then just specify the quantity in the form with the Serial Numbers separated by commas, if this will include more than 4,000 characters including blank spaces and commas, then you will need to fill up more forms.

-Kenny

i am getting this information on UCSM of UCSB:

one PSU for 1 chasis and one PSU for 2nd chassis.

 

 

Please tell me how can fix it as all four power supply connected and ok.

 

Warm Regard's
Amit Sahrma
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: