cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2547
Views
5
Helpful
18
Replies

ASR 1001-X QFP/ESP crash - CPPHA-3-FAULT: CPP:0.0

paul amaral
Level 4
Level 4

Hi, I have a Cisco ASR 1001-X that has been solid for years. Suddenly its crashing. I has crashed 3 times in the last 5 hours. From what I can tell its an error with the ESP, although I'm not sure if it's a remote vulnerability or bug. If it's a bug It's one I never hit before while running 16.12.5 XE for years. I have searched I cannot seem to find what is causing this issue, below I have all relevant information including the error. If anyone has any ideas it would be tremendously helpful. 

TIA, P

 

 

System returned to ROM by Reload reason not captured at 00:05:00 est Fri Feb 26 2021
System restarted at 21:07:36 est Tue Feb 14 2023
System image file is "bootflash:asr1001x-universalk9_noli.16.12.05.SPA.bin"
Last reload reason: Reload reason not captured
Directory of bootflash:/core/

696257  drwx             4096  Jan 22 2019 01:53:43 -05:00  modules
680066  -rw-                1  Feb 14 2023 21:53:05 -05:00  .callhome
680067  -rw-        216677893  Feb 14 2023 20:26:52 -05:00  EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-202648-est.tar.gz
680068  -rw-            20115  Feb 14 2023 20:27:09 -05:00  EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-202648-est-info.txt
680069  -rw-        131926947  Feb 14 2023 21:03:05 -05:00  EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-210302-est.tar.gz
680070  -rw-            16610  Feb 14 2023 21:03:17 -05:00  EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-210302-est-info.txt

 

-- Logs begin at Tue 2023-02-14 20:28:46 est, end at Tue 2023-02-14 21:03:19 est. --
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPHA-3-FAULT: CPP:0.0 desc:JTB_CSR32_JTB_ERR_JTB_LEAF_INT__INT_SPI4_PLL_LOSTLOCK det:DRVR(interrupt) class:OTHER sev:FATAL id:6628 cppstate:STOPPED res:UNKNOWN flags:0x7 cdmflags:0x0
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPOSLIB-3-ERROR_NOTIFY: cpp_ha encountered an error -Traceback= 1#32e24f4e1e5bb3c64e4c8160ec8a858f   errmsg:7FA514D74000+A80 cpp_common_os:7FA5199AB000+DB8C cpp_common_os:7FA5199AB000+1BA6E cpp_drv_cmn:7FA517453000+47C37 :400000+28849 :400000+2830C :400000+27D64 :400000+15ABD :400000+1499C cpp_common_os:7FA5199AB000+11DF0 cpp_common_os:7FA5199AB000+124E6 evlib:7FA512E5C000+8F37 evlib:7FA512E5C000+997C cpp_common_os:7FA5199AB000+14142 :400000+FD7F c:7FA508D27000+209B2 :400000+A629
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPHA-3-FAULTCRASH: CPP 0.0 unresolved fault detected, initiating crash dump.
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPHA-3-FAULTCRASH: CPP 0.0 unresolved fault detected, initiating crash dump.
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_cdm[25456]: CPP crashed, collecting state.
Feb 14 21:02:02 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_driver[25086]: %CPPDRV-6-INTR: Luke(0) Interrupt : 23-Feb-14 21:02:02.715253 UTC-0500:FIRST:HALT:JTB_CSR32_JTB_ERR_JTB_LEAF_INT__INT_SPI4_PLL_LOSTLOCK
Feb 14 21:02:03 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_cp[25318]: %CPPDRV-3-LOCKDOWN: QFP0.0 CPP Driver LOCKDOWN encountered due to previous fatal error (HW: QFP interrupt).
Feb 14 21:02:03 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPDRV-3-LOCKDOWN: QFP0.0 CPP Driver LOCKDOWN encountered due to previous fatal error (HW: QFP interrupt).
Feb 14 21:02:03 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 fman_fp_image[24173]: %CPPDRV-3-LOCKDOWN: QFP0.0 CPP Driver LOCKDOWN encountered due to previous fatal error (HW: QFP interrupt).
Feb 14 21:02:03 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 audispd[498]: type=ANOM_ABEND msg=audit(1676426523.365:100): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=24173 comm="fman_fp_image" exe="/tmp/sw/mount/asr1001x-espbase.16.12.05.SPA.pkg/usr/binos/bin/fman_fp_image" sig=6
Feb 14 21:02:05 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_cdm[25456]: CPP crashed, generating core file.
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: %CPPHA-3-CDMDONE: CPP 0 microcode crashdump creation completed.
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_cdm[25456]: Shutting down CPP MDM while client(s) still connected
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: Shutting down CPP MDM while client(s) still connected
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_ha[24905]: Shutting down CPP CDM while client(s) still connected
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: QFP0.0: Fatal Fault: HW reported: QFP interrupt
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 25456: Unregistered all subdevices for access error
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 24905: Unregistered all subdevices for access error
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 root[13748]: %PMAN-3-PROCHOLDDOWN: The process cpp_ha_top_level_server has been helddown (rc 69)
Feb 14 21:02:35 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 root[13749]: %PMAN-3-PROCHOLDDOWN: The process cpp_cdm_svr has been helddown (rc 69)
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 xinetd[13835]: execve /usr/bin/rsync
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_cp[25318]: Shutting down CPP MDM while client(s) still connected
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_stats[24561]: %CPPOSLIB-3-ERROR_NOTIFY: cpp_stats encountered an error -Traceback= 1#0086d91d8299ea2d9c5e950517ddceba   errmsg:7F67393F1000+A80 cpp_common_os:7F673B715000+DB8C cpp_common_os:7F673B715000+1BA6E cpp_cdm:7F674A29F000+248D cpp_cdm:7F674A29F000+1EFD cpp_cdm:7F674A29F000+1C99 cpp_cdm:7F674A29F000+1B0C
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 25086: Unregistered all subdevices for access error
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 cpp_sp[24724]: Shutting down CPP MDM while client(s) still connected
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 24724: Unregistered all subdevices for access error
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 24561: Unregistered all subdevices for access error
Feb 14 21:02:36 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 25318: Unregistered all subdevices for access error
Feb 14 21:02:46 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 fman_rp[18031]: %FMANRP-3-PEER_IPC_STUCK: IPC to fman-log-bay0-peer0 is stuck for more than 30 seconds
Feb 14 21:02:49 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: CPP 0 pid 24173: Unregistered all subdevices for access error
Feb 14 21:02:49 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 root[14237]: %PMAN-3-PROCHOLDDOWN: The process fman_fp_image has been helddown (rc 134)
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 pvp[14270]: %PMAN-5-EXITACTION: Process manager is exiting: process exit with reload fru code
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: WARNING: SPA 1 is not EBFC, ignore
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[14747]: %SERVICES-3-INVALID_CHASFS: Thread 0x7f5d13320380 has no global chasfs context
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[14747]: %SERVICES-2-NORESOLVE_ACTIVE: Error resolving active FRU: BINOS_FRU_RP
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[14877]: %SERVICES-3-INVALID_CHASFS: Thread 0x7f476600d380 has no global chasfs context
Feb 14 21:02:50 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[14877]: %SERVICES-2-NORESOLVE_ACTIVE: Error resolving active FRU: BINOS_FRU_RP
Feb 14 21:02:51 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 xinetd[14950]: execve /usr/bin/rsync
Feb 14 21:02:51 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 xinetd[14954]: execve /usr/bin/rsync
Feb 14 21:02:52 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 xinetd[14959]: execve /usr/bin/rsync
Feb 14 21:02:58 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: cpp_pdma_proc_thp_chan_info: len: 2856
Feb 14 21:02:58 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: cpp_pdma_proc_thp_chan_info: len: 2783
Feb 14 21:02:58 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: cpp_pdma_proc_thp_chan_info: len: 2839
Feb 14 21:02:58 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: cpp_pdma_proc_thp_chan_info: len: 2835
Feb 14 21:03:16 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 fman_rp[18031]: %FMANRP-3-PEER_IPC_RESUME: IPC to fman-log-bay0-peer0 has returned to normal after previous stuck
Feb 14 21:03:17 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 pvp[16246]: %PMAN-3-PROCESS_NOTIFICATION: System report core/EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-210302-est.tar.gz (size: 128835 KB) generated and System report info at core/EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-210302-est-info.txt
Feb 14 21:03:18 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 kernel: LSMPI: Deregister dual stack diverter
Feb 14 21:03:19 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 pvp[17464]: %PMAN-5-EXITACTION: Process manager is exiting: reload fru action requested
Feb 14 21:03:19 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[17544]: %SERVICES-3-INVALID_CHASFS: Thread 0x7f0e6a0ba380 has no global chasfs context
Feb 14 21:03:19 EDGE-FRV-MA-ASR1001X-EDGE_RP_0 btman_rotate_immediate[17544]: %SERVICES-2-NORESOLVE_ACTIVE: Error resolving active FRU: BINOS_FRU_RP

 

18 Replies 18

AmitPal
Level 1
Level 1

Do you have NAT configured on this device?

Looks like the device is hitting a bug, please contact TAC

CSCvf11949 

 

No NAT all all. 

Can I see the output to the following command:  dir bootflash:tracelogs/*.log

Your basic shutdown log files, on the journal log it does show the CPPA fault as posted above. 

Ok, one last trick:  Can I see the output to the command "sh log on uptime"?

Just  "reset local software" I dont see anything strange. I have it isolated the router now and its still rebooting, slowly taking out the SPA, ram etc to see what happens I will report back, although the latest reboot reason actually showed "critical process fman_fp_image fault on fp_0_0 (rc=134). I could only find this, https://bst.cisco.com/bugsearch/bug/CSCvt04864  

No, I want to see the output.  

I want to determine if the router has a hardware failure to the ESP or not.  

The error message code of "rc=134" is a generic way of saying "I have no friggin idea why I have to crash so I will just generate this number '134'".

See below for command output. Since I removed a DIMM it has been up for 5 hours. I 1st removed the SAP and it continued to reboot but since I removed the DIMM it looks like its holding steady. I will know more if it continues to stay up. I guess the original error 

 

 

%CPPHA-3-FAULT: CPP:0.0

Can be related to anything not just the ESP since almost all packets run through it and ofcourse ram plays an important role on the ESP and almost all hardware components.

 

 

 

Slot            Reset reason       Power On
-----------------------------------------------------------------
  R0          reset power on       02/16/23 11:23:47
  R0    reset local software       02/16/23 05:55:40
  R0    reset local software       02/16/23 05:31:30
  R0    reset local software       02/16/23 04:40:39
  R0    reset local software       02/16/23 03:51:09
  R0    reset local software       02/16/23 03:44:59
  R0    reset local software       02/16/23 03:40:09
  R0    reset local software       02/16/23 03:28:34
  R0    reset local software       02/16/23 03:21:34
  R0    reset local software       02/16/23 03:16:19
  R0    reset local software       02/16/23 03:06:04
  R0    reset local software       02/16/23 02:29:06
  R0    reset local software       02/16/23 02:21:32
  R0    reset local software       02/16/23 02:00:56
  R0    reset local software       02/16/23 01:54:48
  R0    reset local software       02/16/23 01:47:41
  R0    reset local software       02/16/23 01:39:00
  R0    reset local software       02/16/23 01:33:33
  R0    reset local software       02/16/23 00:42:54
  R0    reset local software       02/15/23 20:16:32
  R0          reset power on       02/15/23 14:39:41
  R0    reset local software       02/15/23 12:59:28
  R0    reset local software       02/15/23 09:28:54
  R0    reset local software       02/15/23 08:15:40
  R0    reset local software       02/15/23 06:02:44
  R0    reset local software       02/15/23 05:24:49
  R0    reset local software       02/15/23 04:04:29
  R0    reset local software       02/15/23 03:54:41
  R0    reset local software       02/15/23 03:27:48
  R0    reset local software       02/15/23 02:50:22
  R0    reset local software       02/15/23 02:45:06
  R0    reset local software       02/15/23 02:30:17
  R0    reset local software       02/15/23 01:45:04
  R0    reset local software       02/14/23 23:16:19
  R0    reset local software       02/14/23 21:06:42
  R0    reset local software       02/14/23 20:30:41
  R0          reset power on       05/07/21 00:14:08
  R0     upgrade flash reset       02/26/21 00:33:55
  R0    reset local software       02/26/21 00:08:35
  R0    reset local software       03/20/20 00:22:57
  R0    reset local software       03/17/20 00:07:03
  R0          reset power on       01/19/19 02:47:06

 

 

 

RP is toast.  Contact Cisco and organize an RMA.

Leo, did you conclude that be the amount of resets? I'm assuming those are always full router reloads? It's acutal up for 8 hours straight since DIMM was removed. Do you think bad ram could be the cause of the RP resetting ? 

Thanks for the information you have been giving me so far. 
Paul


@paul amaral wrote:
Leo, did you conclude that be the amount of resets?

No, I based my response on the last time the router was "properly" rebooted, 07 May 2021.  

And if the router "suddenly" combusted then I am leaning towards a hardware failure.  

Another test is to wipe the config of the router.  If I am right, the router will still continue to crash even without any config.

Leo Laohoo
Hall of Fame
Hall of Fame

@paul amaral wrote:
EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-210302-est-info.txt
EDGE-FRV-MA-ASR1001X-EDGE_RP_0-system-report_20230214-202648-est-info.txt

Please attach both files.

Hi Leo, thanks for taking the time to look at this. 

P

marce1000
VIP
VIP

 

 - Upgrade to latest advisory  https://software.cisco.com/download/home/284932298/type/282046477/release/Bengaluru-17.6.3a  , check if that can help. You can also connect to the ASA with https://cway.cisco.com/cli/ , then at the top right press 'Crashdump Analyzer'

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card