08-20-2016 06:29 AM - edited 03-05-2019 04:32 AM
Hello!
I am using an ASR1001-X as Internet Gateway and it crashes every few days with the fllowing problem:
ROM: IOS-XE ROMMON
asr1001x uptime is 4 hours, 58 minutes
Uptime for this control processor is 4 hours, 59 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System restarted at 10:00:41 CEST Sat Aug 20 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134
The router has a redundant layer 2 connection to two Juniper Routers on the interfaces Gi0/0/0 and Gi0/0/1 and a common layer 3 interface on BDI302.
<<< snippet >>>
asr1001x#sh run int gi0/0/0
Building configuration...
Current configuration : 202 bytes
!
interface GigabitEthernet0/0/0
description Internet Uplink
mtu 1600
no ip address
negotiation auto
service instance 302 ethernet
encapsulation untagged
bridge-domain 302
!
end
asr1001x#sh run int gi0/0/1
Building configuration...
Current configuration : 202 bytes
!
interface GigabitEthernet0/0/1
description Internet Uplink
mtu 1600
no ip address
negotiation auto
service instance 302 ethernet
encapsulation untagged
bridge-domain 302
!
end
asr1001x#sh run int bdi302
Building configuration...
Current configuration : 164 bytes
!
interface BDI302
description Internet Uplink
ip address <MY IPv4 ADDRESS>
ip nat outside
ipv6 address <MY IPv6 ADDRESS>
ipv6 enable
end
asr1001x#
<<< snippet end >>>
Is there any idea where this comes from and how to prevent it?
Thanks in advance for any answers
08-20-2016 07:39 AM
anything relevant captured in sh log or syslog history leading up to the times of the crashes?
08-21-2016 02:31 AM
Hi There!
Thanks a lot for your quick reply. The router crashed again:
<<< snippet >>>
ROM: IOS-XE ROMMON
asr1001x uptime is 4 hours, 27 minutes
Uptime for this control processor is 4 hours, 28 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System restarted at 06:37:09 CEST Sun Aug 21 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134
<<< snippet end >>>
Yes there are some coredumps from 2 modules and there is some interesting info in the tracelogs from these modules just before the crash!
These are some of the suspect messages from: cpp_cp_F0-0.log.27026.20160821063453:
<<< snippet >>>
08/21 06:33:32.636 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:063 TS:00000073990571644692 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:02.439 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:010 TS:00000074020419229602 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:02.639 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:105 TS:00000074020575052006 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:23.895 [buginf]: [27026]: (debug):
cpp_dal_rsrc_read_generic: We're in LOCKDOWN!! Process doesn't have key!!
08/21 06:34:23.897 [buginf]: [27026]: (debug):
-Traceback=1#7e71e38960af0756153706fb4c79c525 cpp_common_os:7F6EBCAC0000+11875 cpp_dmap:7F6EC9024000+3A5F8 cpp_dmap:7F6EC9024000+3EFD2 cpp_palci_svr_lib:7F6EC62C9000+585A evlib:7F6EBB82C000+BAD0 evlib:7F6EBB82C000+E200 cpp_common_os:7F6EBCAC0000+13E42 :400000+5F76 c:7F6EAA198000+1E514 :400000+5BD9
08/21 06:34:23.898 [errmsg]: [27026]: (ERR): %CPPDRV-3-LOCKDOWN: QFP0.0 CPP Driver LOCKDOWN encountered due to previous fatal error (HW: QFP interrupt).
08/21 06:34:23.898 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy
08/21 06:34:24.896 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy
08/21 06:34:25.896 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy
<<< snippet end >>>
And here the mesages from: fman_fp_image_pmanlog_F0-0.log.28047.20160821063453:
<<< snippet >>>
08/21 06:34:23.775 [fman_fp_image_pmanlog]: [28047]: (std):
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): fman_onefw_primary_init
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): HW primary init done fman_cpp_tcp_init: startedfman_cpp_tcp_aom_init: aom init for glb config objectfman_cpp_tcp_init: doneCWS:HW primary init done.IDENTITY:HW primary init done.IDENTITY: primary init is successful
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std):
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): [cpp_otv_client_bind:615] fman_cidb_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)fman_cpp_tcp_secondary_init: startedfman_cpp_tcp_secondary_init: doneHW Secondary init started.fman_cws_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)HW Secondary init done.IDENTITY: Secondary init is called
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): HW Secondary init started.fman_identity_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)IDENTITY: Secondary init is done
08/21 06:34:23.778 [fman_fp_image_pmanlog]: [28047]: (note): SIGUSR1 ignore, /tmp/fp/pvp/work/switchover_done_sentinel exists
08/21 06:34:23.778 [fman_fp_image_pmanlog]: [28047]: (note): Do SIGUSR1 return: 1
08/21 06:34:23.779 [fman_fp_image_pmanlog]: [28047]: (note): SIGUSR1, exit handler
08/21 06:34:23.779 [fman_fp_image_pmanlog]: [28047]: (note): Wait for signal or process exit: 28360
08/21 06:34:42.651 [fman_fp_image_pmanlog]: [28047]: (std): /tmp/sw/fp/0/0/fp/mount/usr/binos/conf/pman.sh: line 809: 28360 Aborted (core dumped) $PREPROC $eval_preproc $RESOLVED_PROCESS $PROCESS_ARGUMENTS
08/21 06:34:42.651 [fman_fp_image_pmanlog]: [28047]: (note): Not SIGUSR1/SIGTERM or not trapped, break from while loop
08/21 06:34:42.653 [fman_fp_image_pmanlog]: [28047]: (std): 27596: old priority -6, new priority 0
08/21 06:34:42.654 [fman_fp_image_pmanlog]: [28047]: (note): Process EXIT
08/21 06:34:42.654 [fman_fp_image_pmanlog]: [28047]: (std): /tmp/sw/fp/0/0/fp/mount/usr/binos/conf/pman.sh: line 844: cbr_pman_check_restart_type: command not found
08/21 06:34:42.655 [fman_fp_image_pmanlog]: [28047]: (note): Exited due to signal received 6
08/21 06:34:42.674 [fman_fp_image_pmanlog]: [28047]: (note): /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_state marked helddown
08/21 06:34:42.800 [fman_fp_image_pmanlog]: [28047]: (note): Exiting pman.sh for fman_fp_image
08/21 06:34:42.801 [fman_fp_image_pmanlog]: [28047]: (note): gdb port 9909 released
08/21 06:34:42.802 [fman_fp_image_pmanlog]: [28047]: (note): stored exit state helddown in /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_state
08/21 06:34:42.802 [fman_fp_image_pmanlog]: [28047]: (note): stored exit code 134 in /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_exitcode
08/21 06:34:42.803 [fman_fp_image_pmanlog]: [28047]: (note): Cleanup process scoreboard /tmp/fp/process/fman_fp_image%fp_0_0%0
08/21 06:34:42.807 [fman_fp_image_pmanlog]: [28047]: (note): Cleanup /tmp/fp/pvp/process/fman_fp_image%fp_0_0%0#27596
<<< snippet end >>>
II attached a zip file with 2 tar archives with the core dumps and the traces.
I also configured an external syslog server to eventually get more information before the next system crash. I also enabled "cdp" on the router becuase of the "unexpected type: 0x2004" messages.
I would be glad again if somebody has further Ideas.
Thanks and regards
08-20-2016 07:43 AM
Is the Router generating any 'crashinfo' file?
Try looking for that file with the 'dir' command.
You might need to open a case with the Cisco TAC, the corresponding team should analyze and decode the crashinfo file in order to investigate further if the unexpected reboot was due to a Software or Hardware failure.
Best Regards.
08-21-2016 02:39 AM
Hi Hector Gustavo,
please see my previouse answer to the comment of "wayfaring"
Would be nice if you have some further Ideas.
Thanks and regards
08-21-2016 02:58 AM
2nd crash today!
ROM: IOS-XE ROMMON
asr1001x uptime is 6 minutes
Uptime for this control processor is 6 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134
Pleae help!
Thanks and regards.
08-21-2016 03:05 AM
I forgot to mention, there is NO relevant syslog info right before the crash.
Regards
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide