cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1266
Views
0
Helpful
2
Replies

ASR9006 LC Crash

jedielbarreto
Level 1
Level 1

Greetings, 

 

I have problems with a ASR9k Line card A9K-8T-L

 

LC/0/0/CPU0:Aug 20 02:41:47.954 : pfm_node_lc[283]: %PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR : Set|prm_server_tr[159827]|0x1008004|NP DOUBLE ECC ERROR, NP=4, memId=18, subMemId=0x2
LC/0/0/CPU0:Aug 20 02:41:47.958 : pfm_node_lc[283]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 159827 (prm_server_tr), Fault Sev: 0, Target node: 0/0/CPU0, CompId: 0x1f, Device Handle: 0x1008004, CondID: 1001, Fault Reason: NP DOUBLE ECC ERROR, NP=4, memId=18, subMemId=0x2
LC/0/0/CPU0:Aug 20 02:41:47.958 : syslog_dev[88]: pfm_node_lc[283] PID-159820: Request Graceful Reboot via Sysmgr: Reason: pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 159827 (prm_server_tr), Fault Sev: 0, Target node: 0/0/CPU0, CompId: 0x1f, Device Handle: 0x1008004, CondID: 1001, Fault Reason: NP DOUBLE ECC ERROR, NP=4, memId=18, subMemId=0x2
LC/0/0/CPU0:Aug 20 02:41:47.958 : syslog_dev[88]: pfm_node_lc[283] PID-159820:
RP/0/RSP0/CPU0:Aug 20 02:41:48.186 : shelfmgr[401]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/0/CPU0 indicates it is doing a kernel dump.
RP/0/RSP0/CPU0:Aug 20 02:41:48.186 : shelfmgr[401]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:IOS XR FAILURE
RP/0/RSP0/CPU0:Aug 20 02:41:48.192 : shelfmgr[401]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:BRINGDOWN

 

I will appreciate any information about this.

 

Thanks

2 Accepted Solutions

Accepted Solutions

Hello,

 

I found the bug below:

 

prm_server should not crash/abort upon encountering HW problem
CSCte19077
Description
Symptom:
prm_server crash after certain HW problems are detected on the Network Processor (NP), for example " %PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR ". There will also be another syslog displaying "NP0 fails to setup"

Conditions:
Hardware problem found on the network processor

Workaround:
none


Recovery:

none

Further Problem Description:

View solution in original post

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello @jedielbarreto ,

>> %PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR : 

 

This looks like to be ah hardware issue you can try to remove the linecard wait a few minutes and then insert the linecard again in the hope to make a hard reset.

 

However, if this does not fix I would consider an RMA of the linecard.

Edit:

The bug that @Georg Pauwen has found looks like similar but it applies only to IOS XR 3.9.0 that is a quite old version now.

But some error messages is missing in your logs like the "NP0 fails to setup".

 

Hope to help

Giuseppe

 

 

 

View solution in original post

2 Replies 2

Hello,

 

I found the bug below:

 

prm_server should not crash/abort upon encountering HW problem
CSCte19077
Description
Symptom:
prm_server crash after certain HW problems are detected on the Network Processor (NP), for example " %PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR ". There will also be another syslog displaying "NP0 fails to setup"

Conditions:
Hardware problem found on the network processor

Workaround:
none


Recovery:

none

Further Problem Description:

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello @jedielbarreto ,

>> %PLATFORM-NP-0-HW_DOUBLE_ECC_ERROR : 

 

This looks like to be ah hardware issue you can try to remove the linecard wait a few minutes and then insert the linecard again in the hope to make a hard reset.

 

However, if this does not fix I would consider an RMA of the linecard.

Edit:

The bug that @Georg Pauwen has found looks like similar but it applies only to IOS XR 3.9.0 that is a quite old version now.

But some error messages is missing in your logs like the "NP0 fails to setup".

 

Hope to help

Giuseppe

 

 

 

Review Cisco Networking products for a $25 gift card