cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
861
Views
6
Helpful
26
Replies

ISR4431 reboot loop with %CPPHA-3-INITFAIL (0x43)

d_sergienko
Level 1
Level 1

Hi all,

One of the two ISR4431 started to boot into errors after power outage. I can enter ROMMON, able to boot many different IOSes (including external TFTP download via Gi0 interface), but all of them got stuck & reboot with the messages below (more or less verbose depending on IOS version).
I can't upgrade rommon because I'm unable to boot any IOS. ROMMON is 15.x, so only IOS-XE 3S is supported. Below is 3.15.01c. I'm not sure if this is a hardware fault or this is a software bug. I've tried to replace DIMMs for both CP and FFP taken from working unit, it did not help. Replacing PSU (both) are also has no effect.

I've seen similar threads here like this one, but no particular resolution found.

Appreciate your help.

 

 

Press RETURN to get started!

*Jul 28 11:36:50.411: %IOS_LICENSE_IMAGE_APPLICATION-6-LICENSE_LEVEL: Module name = esg Next reboot level = ipbasek9 and License = ipbasek9 *Jul 28 11:36:51.455: %ISR_THROUGHPUT-6-LEVEL: Throughput level has been set to 500000 kbps *Jul 28 11:36:59.699: %SPANTREE-5-EXTENDED_SYSID: Extended SysId enabled for type vlan *Jul 28 11:37:01.031: %LINK-3-UPDOWN: Interface Lsmpi0, changed state to up *Jul 28 11:37:01.031: %LINK-3-UPDOWN: Interface EOBC0, changed state to up *Jul 28 11:37:01.031: %LINK-3-UPDOWN: Interface GigabitEthernet0, changed state to down *Jul 28 11:37:01.031: %LINK-3-UPDOWN: Interface LIIN0, changed state to up *Jul 28 11:36:32.178: %CMRP-3-PFU_MISSING:cmand: The platform does not detect a power supply in slot 1 *Jul 28 11:36:53.873: %CMLIB-6-THROUGHPUT_VALUE:cmand: Throughput license found, throughput set to 500000 kbps *Jul 28 11:36:55.575: %CPPHA-7-START:cpp_ha: CPP 0 preparing ucode *Jul 28 11:36:55.605: %CPPHA-7-START:cpp_ha: CPP 0 startup init *Jul 28 11:37:03.313: %IOSXE_MGMTVRF-6-CREATE_SUCCESS_INFO: Management vrf Mgmt-intf created with ID 1, ipv4 table-id 0x1, ipv6 table-id 0x1E000001 *Jul 28 11:37:03.365: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down *Jul 28 11:37:03.365: %LINEPROTO-5-UPDOWN: Line protocol on Interface Lsmpi0, changed state to up *Jul 28 11:37:03.366: %LINEPROTO-5-UPDOWN: Line protocol on Interface EOBC0, changed state to up *Jul 28 11:37:03.366: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0, changed state to down *Jul 28 11:37:03.366: %LINEPROTO-5-UPDOWN: Line protocol on Interface LIIN0, changed state to up *Jul 28 11:37:06.707: %LINK-5-CHANGED: Interface GigabitEthernet0/0/0, changed state to administratively down *Jul 28 11:37:06.711: %LINK-5-CHANGED: Interface GigabitEthernet0/0/1, changed state to administratively down *Jul 28 11:37:06.712: %LINK-5-CHANGED: Interface GigabitEthernet0/0/2, changed state to administratively down *Jul 28 11:37:06.714: %LINK-5-CHANGED: Interface GigabitEthernet0/0/3, changed state to administratively down *Jul 28 11:37:06.722: %LINK-5-CHANGED: Interface Vlan1, changed state to administratively down *Jul 28 11:37:07.708: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0/0, changed state to down *Jul 28 11:37:07.710: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0/1, changed state to down *Jul 28 11:37:07.713: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0/2, changed state to down *Jul 28 11:37:07.714: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0/3, changed state to down *Jul 28 11:37:08.690: %CPPHA-3-INITFAIL:cpp_ha: CPP 0 initialization failed - startup init (0x43) *Jul 28 11:37:08.690: %CPPHA-3-INITFAIL:cpp_ha: CPP 0 initialization failed - start CPP (0x43) *Jul 28 11:37:08.690: %CPPHA-3-FAULT:cpp_ha: CPP:0.0 desc:Platform Collection Error: Message STARTUP_INIT to client cpp_driver0 failed with error [Link has been severed] det:HA class:CLIENT_SW sev:FATAL id:2 cppstate:RUNNING res:UNKNOWN flags:0x0 cdmflags:0x0 *Jul 28 11:37:08.692: %IOSXE-6-PLATFORM:cpp_ha: Shutting down CPP MDM while client(s) still connected *Jul 28 11:37:09.000: %PMAN-3-PROCHOLDDOWN:pman.sh: The process cpp_driver has been helddown (rc 255) *Jul 28 11:37:09.003: %PMAN-3-PROCHOLDDOWN:pman.sh: The process cpp_ha_top_level_server has been helddown (rc 69) *Jul 28 11:37:09.265: %PMAN-0-PROCFAILCRIT:pvp.sh: A critical process cpp_driver has failed (rc 255) *Jul 28 11:37:13.321: %IOSXE-6-PLATFORM:cpp_cdm: Shutting down CPP MDM while client(s) still connected *Jul 28 11:3Jul 28 11:37:34.163 R0/0: %PMAN-5-EXITACTION: Process manager is exiting: reload fru action requested

Initializing Hardware ...

 

Regards,

Dmitry

 

 

26 Replies 26

Try upgrading the firmware. 

If the same error messages crop up, RMA the router.

Hello @d_sergienko 

Thanks again for that outputs and your feedback.

Logs indicate deep system-level mismatches ; either missing binaries, miscompiled code, or absent IPC structures expected by that platform.

That platform could remain unstable even after an upgrade if the underlying hardware and software are mismatch, or if the upgrade does not address the specific incompatibility or coruption causing the CPP HA subsystem to fail...

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Hello @d_sergienko 

You have also this:

no valid BOOT image found

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Hello M02@rt37 

This is because bootvar is missing. Then ROMMON found correct image:


Located isr4400-universalk9.03.15.01c.S.155-2.S1c-std.SPA.bin

And it was booted then, as shown above. This is definitely not the root cause of the issue.

OK.

What about your supposition yesterday about thermal paste ?

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

It did not help. Re-seated both heatsinks. The result is unchanged. Now we'll try to leave the router unplugged for a couple of hours and then retry. It looks like some capacitor affects this.

Thanks for that feedback

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

More than 2 hours of being unplugged changed nothing. Leaving the router powered off until tomorrow.

Hi,

Unfortunately, today the box has not started. 

d_sergienko
Level 1
Level 1

Meanwhile did a complete re-assembly. About 20 minutes being powered off. That did not work as well.
You may find the high resolution photos of both sides of the mainboard attached.

 

d_sergienko
Level 1
Level 1

Dear all,

Thanks @Leo LaohooM02@rt37 and @MHM Cisco World for your advices.

At this point I can affirm the problem is 100% hardware related. We found a way to revive the box: disassemble the router, pull out motherboard, connect power supply, fans and start without enclosure with caution of heatsinks overheating. We booted successfully about 10 times, no CPP errors any more. After that ROMMON and IOS were upgraded to 16.12(2r) and 17.9.5f accordingly. 
I guess the root cause is MB is slightly bend, and it is causing bad contact problem somewhere inside MB or chips. For sure, cases like this are RMA only, otherwise you may have a huge PITA when put into production. It may act as a backup/testing hardware, and this is our case.

 

rtr#sh ver
Cisco IOS XE Software, Version 17.09.05f
Cisco IOS Software [Cupertino], ISR Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 17.9.5f, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2025 by Cisco Systems, Inc.
Compiled Sun 16-Feb-25 16:29 by mcpre


Cisco IOS-XE software, Copyright (c) 2005-2025 by cisco Systems, Inc.
All rights reserved.  Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY.  You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0.  For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.


ROM: 16.12(2r)

rtr uptime is 5 minutes
Uptime for this control processor is 7 minutes
System returned to ROM by Reload Command
System image file is "bootflash:isr4400-universalk9.17.09.05f.SPA.bin"
Last reload reason: Reload Command



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.



Suite License Information for Module:'esg'

--------------------------------------------------------------------------------
Suite                 Suite Current         Type           Suite Next reboot
--------------------------------------------------------------------------------
FoundationSuiteK9     None                  Smart License  None
securityk9
appxk9

AdvUCSuiteK9          None                  Smart License  None
uck9
cme-srst
cube


Technology Package License Information:

-----------------------------------------------------------------
Technology    Technology-package           Technology-package
              Current       Type           Next reboot
------------------------------------------------------------------
appxk9           None             Smart License    appxk9
uck9             None             Smart License    None
securityk9       None             Smart License    securityk9
ipbase           ipbasek9         Smart License    ipbasek9

The current throughput level is 500000 kbps


Smart Licensing Status: Smart Licensing Using Policy

cisco ISR4431/K9 (JUNO-1RU) processor with 7797598K/6147K bytes of memory.
Processor board ID FJC********
Router operating mode: Autonomous
4 Gigabit Ethernet interfaces
32768K bytes of non-volatile configuration memory.
16777216K bytes of physical memory.
30584831K bytes of flash memory at bootflash:.

Configuration register is 0x2102

rtr#sh platform
Chassis type: ISR4431/K9

Slot      Type                State                 Insert time (ago)
--------- ------------------- --------------------- -----------------
0         ISR4431/K9          ok                    00:06:13
 0/0      ISR4431-X-4x1GE     ok                    00:04:47
R0        ISR4431/K9          ok, active            00:06:13
F0        ISR4431/K9          ok, active            00:06:13
P0        PWR-4430-AC         ok                    00:05:16
P1        Unknown             empty                 never
P2        ACS-4430-FANASSY    ok                    00:05:16

Slot      CPLD Version        Firmware Version
--------- ------------------- ---------------------------------------
0         15010638            16.12(2r)
R0        15010638            16.12(2r)
F0        15010638            16.12(2r)

rtr#

Rgds,
D

Thanks for that feedback @d_sergienko 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.