cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2267
Views
0
Helpful
6
Replies

Cisco ASR1001-X crashes every few days

peterklapper1
Level 1
Level 1

Hello!

I am using an ASR1001-X as Internet Gateway and it crashes every few days with the fllowing problem:


ROM: IOS-XE ROMMON

asr1001x uptime is 4 hours, 58 minutes
Uptime for this control processor is 4 hours, 59 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System restarted at 10:00:41 CEST Sat Aug 20 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134


The router has a redundant layer 2 connection to two Juniper Routers on the interfaces Gi0/0/0 and Gi0/0/1 and a common layer 3 interface on BDI302.


<<< snippet >>>

asr1001x#sh run int gi0/0/0
Building configuration...

Current configuration : 202 bytes
!
interface GigabitEthernet0/0/0
 description Internet Uplink
 mtu 1600
 no ip address
 negotiation auto
 service instance 302 ethernet
  encapsulation untagged
  bridge-domain 302
 !
end

asr1001x#sh run int gi0/0/1
Building configuration...

Current configuration : 202 bytes
!
interface GigabitEthernet0/0/1
 description Internet Uplink
 mtu 1600
 no ip address
 negotiation auto
 service instance 302 ethernet
  encapsulation untagged
  bridge-domain 302
 !
end

asr1001x#sh run int bdi302
Building configuration...

Current configuration : 164 bytes
!
interface BDI302
 description Internet Uplink
 ip address <MY IPv4 ADDRESS>
 ip nat outside
 ipv6 address <MY IPv6 ADDRESS>
 ipv6 enable
end

asr1001x#

<<< snippet end >>>


Is there any idea where this comes from and how to prevent it?


Thanks in advance for any answers

6 Replies 6

wayfaring
Level 1
Level 1

anything relevant captured in sh log or syslog history leading up to the times of the crashes?

Hi There!

Thanks a lot for your quick reply. The router crashed again:

<<< snippet >>>

ROM: IOS-XE ROMMON

asr1001x uptime is 4 hours, 27 minutes
Uptime for this control processor is 4 hours, 28 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System restarted at 06:37:09 CEST Sun Aug 21 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134

<<< snippet end >>>

Yes there are some coredumps from 2 modules and there is some interesting info in the tracelogs from these modules just before the crash!

These are some of the suspect messages from: cpp_cp_F0-0.log.27026.20160821063453:

<<< snippet >>>

08/21 06:33:32.636 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:063 TS:00000073990571644692 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:02.439 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:010 TS:00000074020419229602 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:02.639 [cpp-dp]: [27026]: (verbose): QFP:0.0 Thread:105 TS:00000074020575052006 L2CP Handling: unsupported PDU with Cisco MAC but unexpected type: 0x2004.
08/21 06:34:23.895 [buginf]: [27026]: (debug):
cpp_dal_rsrc_read_generic: We're in LOCKDOWN!! Process doesn't have key!!

08/21 06:34:23.897 [buginf]: [27026]: (debug):
 -Traceback=1#7e71e38960af0756153706fb4c79c525   cpp_common_os:7F6EBCAC0000+11875 cpp_dmap:7F6EC9024000+3A5F8 cpp_dmap:7F6EC9024000+3EFD2 cpp_palci_svr_lib:7F6EC62C9000+585A evlib:7F6EBB82C000+BAD0 evlib:7F6EBB82C000+E200 cpp_common_os:7F6EBCAC0000+13E42 :400000+5F76 c:7F6EAA198000+1E514 :400000+5BD9

08/21 06:34:23.898 [errmsg]: [27026]: (ERR): %CPPDRV-3-LOCKDOWN: QFP0.0 CPP Driver LOCKDOWN encountered due to previous fatal error (HW: QFP interrupt).
08/21 06:34:23.898 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy
08/21 06:34:24.896 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy
08/21 06:34:25.896 [cpp-palcmn]: [27026]: (warn): PALCI periodic: DAL rsrc_read failed - 'cpp/driver' detected the 'fatal' condition 'CPP driver clientlib error': Device or resource busy

<<< snippet end >>>

And here the mesages from: fman_fp_image_pmanlog_F0-0.log.28047.20160821063453:

<<< snippet >>>

08/21 06:34:23.775 [fman_fp_image_pmanlog]: [28047]: (std):

08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std):  fman_onefw_primary_init
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): HW primary init done fman_cpp_tcp_init: startedfman_cpp_tcp_aom_init: aom init for glb config objectfman_cpp_tcp_init: doneCWS:HW primary init done.IDENTITY:HW primary init done.IDENTITY: primary init is successful
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std):

08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): [cpp_otv_client_bind:615] fman_cidb_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)fman_cpp_tcp_secondary_init: startedfman_cpp_tcp_secondary_init: doneHW Secondary init started.fman_cws_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)HW Secondary init done.IDENTITY: Secondary init is called
08/21 06:34:23.776 [fman_fp_image_pmanlog]: [28047]: (std): HW Secondary init started.fman_identity_hw_secondary_init(): HW Layer:: Setting Event Context successful (0 - Success)IDENTITY: Secondary init is done
08/21 06:34:23.778 [fman_fp_image_pmanlog]: [28047]: (note): SIGUSR1 ignore, /tmp/fp/pvp/work/switchover_done_sentinel exists
08/21 06:34:23.778 [fman_fp_image_pmanlog]: [28047]: (note): Do SIGUSR1 return: 1
08/21 06:34:23.779 [fman_fp_image_pmanlog]: [28047]: (note): SIGUSR1, exit handler
08/21 06:34:23.779 [fman_fp_image_pmanlog]: [28047]: (note): Wait for signal or process exit: 28360
08/21 06:34:42.651 [fman_fp_image_pmanlog]: [28047]: (std): /tmp/sw/fp/0/0/fp/mount/usr/binos/conf/pman.sh: line 809: 28360 Aborted                 (core dumped) $PREPROC $eval_preproc $RESOLVED_PROCESS $PROCESS_ARGUMENTS
08/21 06:34:42.651 [fman_fp_image_pmanlog]: [28047]: (note): Not SIGUSR1/SIGTERM or not trapped, break from while loop
08/21 06:34:42.653 [fman_fp_image_pmanlog]: [28047]: (std): 27596: old priority -6, new priority 0
08/21 06:34:42.654 [fman_fp_image_pmanlog]: [28047]: (note): Process EXIT
08/21 06:34:42.654 [fman_fp_image_pmanlog]: [28047]: (std): /tmp/sw/fp/0/0/fp/mount/usr/binos/conf/pman.sh: line 844: cbr_pman_check_restart_type: command not found
08/21 06:34:42.655 [fman_fp_image_pmanlog]: [28047]: (note): Exited due to signal received 6
08/21 06:34:42.674 [fman_fp_image_pmanlog]: [28047]: (note): /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_state marked helddown
08/21 06:34:42.800 [fman_fp_image_pmanlog]: [28047]: (note): Exiting pman.sh for fman_fp_image
08/21 06:34:42.801 [fman_fp_image_pmanlog]: [28047]: (note): gdb port 9909 released
08/21 06:34:42.802 [fman_fp_image_pmanlog]: [28047]: (note): stored exit state helddown in /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_state
08/21 06:34:42.802 [fman_fp_image_pmanlog]: [28047]: (note): stored exit code 134 in /tmp/fp/pvp/process_state/fman_fp_image%fp_0_0%0#27596_exitcode
08/21 06:34:42.803 [fman_fp_image_pmanlog]: [28047]: (note): Cleanup process scoreboard /tmp/fp/process/fman_fp_image%fp_0_0%0
08/21 06:34:42.807 [fman_fp_image_pmanlog]: [28047]: (note): Cleanup /tmp/fp/pvp/process/fman_fp_image%fp_0_0%0#27596

<<< snippet end >>>

II attached a zip file with 2 tar archives with the core dumps and the traces.

I also configured an external syslog server to eventually get more information before the next system crash. I also enabled "cdp" on the router becuase of the "unexpected type: 0x2004" messages.

I would be glad again if somebody has further Ideas.

Thanks and regards

Is the Router generating any 'crashinfo' file?

Try looking for that file with the 'dir' command.

You might need to open a case with the Cisco TAC, the corresponding team should analyze and decode the crashinfo file in order to investigate further if the unexpected reboot was due to a Software or Hardware failure.

Best Regards.

Hi Hector Gustavo,

please see my previouse answer to the comment of "wayfaring"

Would be nice if you have some further Ideas.

Thanks and regards

peterklapper1
Level 1
Level 1

2nd crash today!

ROM: IOS-XE ROMMON

asr1001x uptime is 6 minutes
Uptime for this control processor is 6 minutes
System returned to ROM by reload at 08:51:41 CEST Tue Aug 16 2016
System image file is "bootflash:asr1001x-universalk9.03.17.01.S.156-1.S1-std.SPA.bin"
Last reload reason: critical process fault, fman_fp_image, fp_0_0, rc=134

Pleae help!

Thanks and regards.

I forgot to mention, there is NO relevant syslog info right before the crash.

Regards

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card