cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2264
Views
0
Helpful
5
Replies

ASR 9010 power supply over temperature troubleshooting

Stephen Craven
Level 4
Level 4

We have several 9010 chassis deployed and one of them is experiencing repeated PWR-6KW-AC-V3 power module over temperature conditions.

Since there is nothing special about this location (same cabinet & power supplies, similar environment, etc.) and the cabinet meets the stated specs for clearance and door perforation, we filed a TAC case and had the power supplies replaced.

The events have persisted with the replaced power supplies.

pwr_mgmt[371]: %PLATFORM-PWR_MGMT-2-MODULE_FAILURE : Power-module 0/PM1/0/SP failure condition raised : Module shutdown due to input state over temperature condition

There are multiple commands to show current and historical environmental conditions for the line cards, but I cannot find any of these commands that work on the power supplies. Is there a way to view power supply temperatures?

Our power supplies are running a newer version of FPGA code than is recommended for IOS XR version 5.3.2. Would downgrading their code help with this issue?

             SW Version

fpga11   4.01+

fpga12   4.00+

fpga13   4.01+ 

The routers all have four power supplies installed, even though a single supply could carry the router. I assume that all four could overheat at the same time. Is there anything we can do to automatically restore the router should this happen?

Thank you,

Stephen

1 Accepted Solution

Accepted Solutions

Try issuing 'run pwr_mgmt_debug show' from CLI. This will give more details including inlet and outlet temperatures.

Thanks,

Sam

View solution in original post

5 Replies 5

smilstea
Cisco Employee
Cisco Employee

Hi Stephen,

You are likely running into CSCva25435, by default once the over temp alarm is cleared the power supplies should come back up but due to this bug only an OIR will resolve.

There is a SMU available in 5.3.2

Thanks,

Sam

Sam,

Thank you. We'll give that a try.

However this still wouldn't explain why the power supplies are overheating in a cabinet that meets the spec in a room that is at 70 degrees F. Do you know of any way we can determine the root cause of the overheating?

None of the measured card inlet temperatures to the cards are above 32 degrees C, but I know of no way to poll the power supplies for environmental parameters.

Are others experiencing this?

Thanks,

Stephen

Try issuing 'run pwr_mgmt_debug show' from CLI. This will give more details including inlet and outlet temperatures.

Thanks,

Sam

Thank you. This command is exactly what I was looking for.

Is this data available in the MIB for SNMP to poll? If not, is there some other easy way to collect this data to see if there is a correlation between inlet temperature and these overtemperature events?

Thanks,

Stephen

I am not aware of a MIB that polls for this data, doesn't mean one doesn't exist (I'll have to do some digging to see if one exists). This command only provides the real-time data and not historical data.

One way to track would be to tie an EEM to the overtemp syslog to automatically gather this CLI output to a file.

Sam