cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1013
Views
0
Helpful
4
Replies

Cisco ASR 9001 XR Line Card Reboot

ezzaariyouness
Level 1
Level 1

Hello everyone  ,

I have Cisco ASR 9001 XR version 6.7.3 which has Line Card Reboot frequently as the log below.

LC/0/0/CPU0:Aug 23 21:36:03.763 EDT: prm_server_ty[322]: %PLATFORM-NP-3-ECC : prm_ser_check: Completed NP fast reset to successfully recover from soft error on NP 1. No further corrective action is required.
RP/0/RSP0/CPU0:Aug 23 21:36:04.655 EDT: bgp[1087]: %ROUTING-BGP-5-MAXPFX : No. of IPv4 Unicast prefixes received from 206.108.34.112 has reached 161306, max 200000
LC/0/0/CPU0:Aug 23 21:36:07.292 EDT: prm_server_ty[322]: %PLATFORM-NP-0-LC_RELOAD: NP1 had 3 fast resets within an hour, initiating NPdatalog collection and automatic LC reboot
LC/0/0/CPU0:Aug 23 21:36:09.060 EDT: prm_server_ty[322]: %PLATFORM-NP-0-LC_RELOAD : Too many fast reset attempts, LC reboot needed to recover the NP
LC/0/0/CPU0:Aug 23 21:36:09.067 EDT: pfm_node_lc[309]: %PLATFORM-NP-0-FAILED_LC_RELOAD : Set|prm_server_ty[143440]|0x1008000|An unrecoverable error has been detected. The linecard will be reloaded.
LC/0/0/CPU0:Aug 23 21:36:09.070 EDT: pfm_node_lc[309]: %PLATFORM-PFM-0-CARD_RESET_REQ : Card reset requested by: Process ID: 143440 (prm_server_ty), Target node: 0/0/CPU0, CondID: 1048, Fault Reason: An unrecoverable error has been detected. The linecard will be reloaded.
LC/0/0/CPU0:Aug 23 21:36:09.070 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335: Request Graceful Reboot via Sysmgr: Reason: Card reset requested by: Process ID: 143440 (prm_server_ty), Target node: 0/0/CPU0, CondID: 1048, Fault Reason: An unrecoverable error has been detected. The linecard will be reloaded.
LC/0/0/CPU0:Aug 23 21:36:09.070 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335:
LC/0/0/CPU0:Aug 23 21:36:09.111 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335: reboot internal : cause code 671088667 cause Card reset requested by: Process ID: 143440 (prm_server_ty), Target node: 0/0/CPU0, CondID: 1048, Fault Reason: An unrecoverable error has been detected. The linecard will be reloaded.
LC/0/0/CPU0:Aug 23 21:36:09.111 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335:
LC/0/0/CPU0:Aug 23 21:36:09.118 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335: reboot_internal timeout 30 is graceful no
LC/0/0/CPU0:Aug 23 21:36:09.118 EDT: syslog_dev[89]: pfm_node_lc[309] PID-139335:
RP/0/RSP0/CPU0:Aug 23 21:36:09.210 EDT: shelfmgr[437]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/0/CPU0 indicates it is doing a kernel dump.

Can You help me fix that?

Best Regards

Younes

4 Replies 4

%PLATFORM-NP-3-ECC : prm_ser_check: Completed NP fast reset to successfully recover from soft error on NP 1. No further corrective action is required.

...

%PLATFORM-NP-0-LC_RELOAD: NP1 had 3 fast resets within an hour, initiating NPdatalog collection and automatic LC reboot
LC/0/0/CPU0:Aug 23 21:36:09.060 EDT: prm_server_ty[322]: %PLATFORM-NP-0-LC_RELOAD : Too many fast reset attempts, LC reboot needed to recover the NP

 

Symptoms are similar to CSCux16959, which results from a parity error ("ECC" is Error Correction Code memory, which can detect and try to correct parity errors). From the CSCux16959 description:

"An NP fast reset is used to recover from soft errors in KMEM memory. In rare situations, it's possible that the memory corruption will not be cleared by the fast reset or another memory access, so the error could continue to recur until 3 NP fast resets are hit which would cause the linecard to reload."

 

However, CSCux16959 fix was integrated into releases before your 6.7.3 so this may not be exactly the same defect. A TAC case can determine whether this is a new or old issue.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

hello Ramblin ,

thank You for your replay.

I found this SMU 

Reboot/Optional SMU, Continuous fib_mgr crash observed on 9902 causing lc to reload .
asr9k-px-6.7.3.CSCwb71982.tar
is that related to my issue ?
 
Best regards
Younes

 

 

Hi @ezzaariyouness 

I do not have access to any details for CSCwb71982, just its title: "Continuous fib_mgr crash observed on 9902 causing lc to reload". However, I do not see an obvious connection between your syslog messages and a fib_mgr crash.

If you have service contract, TAC is your best path forward on this issue to determine if your hardware is faulty or if your linecard reloads are being caused by a fixable software defect.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

Thanks for your reply  Ramblin .

I will make a ticket For that. 

Best Regards

Younes