09-24-2013 03:18 AM
Hi
After upgrade ASR9006 from 4.2.0 to 4.3.2 the line card do not boot up properly.
I tryed to upgrade FPD on line card, but had not success. What i can do now?
RP/0/RSP0/CPU0:LAB-9k-440#sh plat
Tue Sep 24 17:53:20.083 UTC
Node Type State Config State
-----------------------------------------------------------------------------
0/RSP0/CPU0 A9K-RSP440-TR(Active) IOS XR RUN PWR,NSHUT,MON
0/0/CPU0 A9K-40GE-L BRINGDOWN PWR,NSHUT,MON
0/2/CPU0 A9K-MOD80-TR IN-RESET PWR,NSHUT,MON
RP/0/RSP0/CPU0:LAB-9k-440(admin)#upgrade hw-module fpd rommon location 0/0/CPU0
Tue Sep 24 17:57:56.274 UTC
***** UPGRADE WARNING MESSAGE: *****
* This upgrade operation has a maximum timout of 160 minutes. *
* If you are executing the cmd for one specific location and *
* card in that location reloads or goes down for some reason *
* you can press CTRL-C to get back the RP's prompt. *
* If you are executing the cmd for _all_ locations and a node *
* reloads or is down please allow other nodes to finish the *
* upgrade process before pressing CTRL-C. *
% RELOAD REMINDER:
- The upgrade operation of the target module will not interrupt its normal
operation. However, for the changes to take effect, the target module
will need to be manually reloaded after the upgrade operation. This can
be accomplished with the use of "hw-module <target> reload" command.
- If automatic reload operation is desired after the upgrade, please use
the "reload" option at the end of the upgrade command.
- The output of "show hw-module fpd location" command will not display
correct version information after the upgrade if the target module is
not reloaded.
NOTE: Chassis CLI will not be accessible while upgrade is in progress.
Continue? [confirm]
09-24-2013 05:51 AM
Hi,
FPDs cannot be updated when a card is booting.
BRINGDOWN just means the card is reloading.
IN-RESET means the card has failed to boot too many times so the system disables it. You can get out of this state via manual intervention such as the hw-mod reload command.
What is the highest node state the card reach before resetting? Do the cards hit present, rommon, mbi-boot, mbi-run, or xr-run?
This will help to determine what other commands to look at and why the cards do not boot up all the way.
Can you also send the output of 'show log' snipped for card related messages? Something like 'show log | i 0/2/CPU0'
Thanks,
Sam
09-25-2013 01:07 AM
Hi Sam,
Thank you for your attention,
node 0/2/CPU0 passed through the following state:
mbi-boot => mbi-run => ios XR PREP => rommon => MBI-boot => in-reset
node 0/0/cpu0 passed through the following state:
0/0/CPU0
PRESENT
ROMMON
BRINGDOWN
log
RP/0/RSP0/CPU0:Sep 26 15:22:08.813 : config[65744]: %MGBL-SYS-5-CONFIG_I : Configured from console by admin
RP/0/RSP0/CPU0:Sep 26 15:23:10.636 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:PRESENT
RP/0/RSP0/CPU0:Sep 26 15:23:10.692 : config[65861]: %MGBL-CONFIG-6-DB_COMMIT_ADMIN : Configuration committed by user 'admin'. Use 'show configuration commit changes 2000000016' to view the changes.
RP/0/RSP0/CPU0:Sep 26 15:23:12.855 : config[65861]: %MGBL-SYS-5-CONFIG_I : Configured from console by admin
RP/0/RSP0/CPU0:Sep 26 15:25:40.912 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 1 Timeout ID: 10
RP/0/RSP0/CPU0:Sep 26 15:25:40.935 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000)
RP/0/RSP0/CPU0:Sep 26 15:25:40.935 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:ROMMON
RP/0/RSP0/CPU0:Sep 26 15:28:11.214 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 3 Timeout ID: 10
RP/0/RSP0/CPU0:Sep 26 15:28:11.236 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000)
RP/0/RSP0/CPU0:Sep 26 15:28:11.237 : shelfmgr[389]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-40GE-L state:BRINGDOWN
RP/0/RSP0/CPU0:Sep 26 15:28:11.238 : invmgr[255]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: BRINGDOWN
RP/0/RSP0/CPU0:Sep 26 15:30:41.513 : shelfmgr[389]: %PLATFORM-SHELFMGR-3-FSMTIMEOUT_RESET : Node 0/0/CPU0 is reset due to failed bootup. Node state was: 7 Timeout ID: 10
RP/0/RSP0/CPU0:Sep 26 15:30:41.537 : canb-server[150]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/0/CPU0 , Power Cycle (0x05000000)
I attached log file for node
node 0/2/CPU0
Nicolay.
PS: i did my upgrade follow this link
http://www.cisco.com/web/Cisco_IOS_XR_Software/pdf/ASR9000_Upgrade_Procedure_432.pdf
10-02-2013 12:51 PM
Hi Nicolay,
Sorry for the delay I was on vacation until today.
The following logs are of interest to me, mostly the NP init failure.
lda_server[65]: %L2-SPA-5-STATE_CHANGE : SPA in bay 0 type A9K-MPA-4x10GE Initing
LC/0/2/CPU0:Sep 25 15:33:37.422 : prm_server_ty[303]: %PLATFORM-NP-0-INIT_ERR : In spite of 3 Cold restarts, NP init unsuccessful...exitting!!
LC/0/2/CPU0:Sep 25 15:33:38.502 : sysmgr[91]: %OS-SYSMGR-3-ERROR : prm_server_ty(1) (jid 303) exited, will be respawned with a delay (slow-restart)
LC/0/2/CPU0:Sep 25 15:33:38.501 : sysmgr[91]: prm_server_ty(1) (jid 303) (pid 524413) (fail_count 2) abnormally terminated, restart scheduled
LC/0/2/CPU0:Sep 25 15:33:38.504 : sysmgr[91]: %OS-SYSMGR-3-ERROR : prm_server_ty(303) (fail count 2) will be respawned in 10 seconds
LC/0/2/CPU0:Sep 25 15:33:38.504 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : prm_server_ty[303] (pid 524413) has not sent proc-ready within 45 seconds
LC/0/2/CPU0:Sep 25 15:33:48.484 : pifibm_server_lc[292]: %OS-PLATFORM_LPTS_PIFIB-7-ERR_CONN_INIT : Failed to connect to PRM sever: Improper link
LC/0/2/CPU0:Sep 25 15:33:48.655 : sysmgr[91]: %OS-SYSMGR-3-ERROR : inline_service_proc(1) (jid 209) exited, will be respawned with a delay (slow-restart)
LC/0/2/CPU0:Sep 25 15:33:48.659 : sysmgr[91]: %OS-SYSMGR-3-ERROR : inline_service_proc(209) (fail count 1) will be respawned in 10 seconds
LC/0/2/CPU0:Sep 25 15:33:48.651 : dumper[56]: %OS-DUMPER-7-DUMP_REQUEST : Dump request for process pkg/bin/pifibm_server_lc
LC/0/2/CPU0:Sep 25 15:33:48.662 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : inline_service_proc(1) (jid 209) did not signal end of initialization
LC/0/2/CPU0:Sep 25 15:33:48.653 : sysmgr[91]: inline_service_proc(1) (jid 209) (pid 524400) (fail_count 1) abnormally terminated, restart scheduled
LC/0/2/CPU0:Sep 25 15:33:48.727 : pm[294]: %PLATFORM-VKG_PM-3-ERROR_INIT : PM: initialization error encountered, reason=failed to initialize prm stats, pm exits!
LC/0/2/CPU0:Sep 25 15:33:48.941 : sysmgr[91]: pm(1) (jid 294) (pid 524371) (fail_count 1) abnormally terminated, restart scheduled
LC/0/2/CPU0:Sep 25 15:33:48.941 : sysmgr[91]: %OS-SYSMGR-3-ERROR : pm(1) (jid 294) exited, will be respawned with a delay (slow-restart)
LC/0/2/CPU0:Sep 25 15:33:48.942 : sysmgr[91]: %OS-SYSMGR-3-ERROR : pm(294) (fail count 1) will be respawned in 10 seconds
LC/0/2/CPU0:Sep 25 15:33:48.998 : fib_mgr[176]: %PLATFORM-PLAT_FIB_HAL-3-ERR_INFO : fib HAL failed to initialize engine hardware : 18 : pkg/bin/fib_mgr : (PID=524398) : -Traceback= 4db19210 4d8f2b4c 40003f38 40001da4 4ba73a44 4ba71554 400003f0 4000211c 40003078 400000e4 40172470
LC/0/2/CPU0:Sep 25 15:33:49.003 : fib_mgr[176]: %ROUTING-FIB-2-INIT : FIB initialization failed on this node. Reason: Platform init returned hard error. Decoded error reason: Improper link
LC/0/2/CPU0:Sep 25 15:33:49.162 : sysmgr[91]: fib_mgr(1) (jid 176) (pid 524398) (fail_count 1) abnormally terminated, restart scheduled
LC/0/2/CPU0:Sep 25 15:33:49.163 : sysmgr[91]: %OS-SYSMGR-3-ERROR : fib_mgr(1) (jid 176) exited, will be respawned with a delay (slow-restart)
LC/0/2/CPU0:Sep 25 15:33:49.164 : sysmgr[91]: %OS-SYSMGR-3-ERROR : fib_mgr(176) (fail count 1) will be respawned in 10 seconds
LC/0/2/CPU0:Sep 25 15:33:49.164 : sysmgr[91]: %OS-SYSMGR-7-DEBUG : fib_mgr(1) (jid 176) did not signal end of initialization
LC/0/2/CPU0:Sep 25 15:33:49.324 : prm_server_ty[303]: %PLATFORM-NP-0-INIT_ERR : In spite of 3 Cold restarts, NP init unsuccessful...exitting!!
LC/0/2/CPU0:Sep 25 15:33:49.655 : ipv6_mfwd_partner[245]: %ROUTING-IPV4_MFWD-3-ERR_MLIB_INIT : Failed to initialize Multicast Library Improper link
Can you open a TAC case for this?
This typically indicates a HW failure.
Thanks,
Sam
10-09-2013 11:53 PM
Hi Sam,
I have opened TAC case and initiated RMA procedure, but I don`t understand so why this is happened.
Did I need remove line card before upgrade?
Thank you.
Nicolay.
10-15-2013 07:18 AM
Hi Nicolay,
This basically means faulty HW, no faults from anything you did based upon the above logs.
Thanks,
Sam
01-16-2015 04:32 AM
Hi,
we had a failure on one LC. Is this a SW or HW failure?
We are running ASR9010 with 4.3.1 and LC is A9K-8T-L.
Here are the logs:
LC/0/0/CPU0:Jan 15 03:56:40.775 : prm_server_tr[292]: %PLATFORM-NP-4-FAULT : prm_process_parity_tm_cluster: 1 Unrecoverable error(s) found. Reset NP4 now
LC/0/0/CPU0:Jan 15 03:56:42.858 : ipv4_mfwd_partner[230]: %ROUTING-IPV4_MFWD-4-FROM_MRIB_UPDATE : MFIB couldn't process update from MRIB : failed to create route 0xe0000000:(10.120.3.77,239.192.4.40/32) - 'asr9k-ipmcast' detected the 'warning' condition 'Platform MFIB: Platform Lib not ready; NP Not running'
LC/0/0/CPU0:Jan 15 03:56:52.185 : pfm_node_lc[282]: %PLATFORM-NP-0-TMB_CLUSTER_PARITY : Set|prm_server_tr[155731]|Network Processor Unit(0x1008004)|TMb cluster parity interrupt. Indicates an internal SRAM problem in TMb cluster, NP=4 memId=6, mask=0x2000000, PMask=0x2000000 SRAMLine=166 Rec=1 Rewr=1
LC/0/0/CPU0:Jan 15 03:56:52.187 : pfm_node_lc[282]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 155731 (prm_server_tr), Fault Sev: 0, Target node: 0/0/CPU0, CompId: 0x1f, Device Handle: 0x1008004, CondID: 1008, Fault Reason: TMb cluster parity interrupt. Indicates an internal SRAM problem in TMb cluster, NP=4 memId=6, mask=0x2000000, PMask=0x2000000 SRAMLine=166 Rec=1 Rewr=1
RP/0/RSP1/CPU0:Jan 15 03:56:52.380 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/0/CPU0 indicates it is doing a kernel dump.
RP/0/RSP1/CPU0:Jan 15 03:56:52.381 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:IOS XR FAILURE
RP/0/RSP1/CPU0:Jan 15 03:56:52.384 : ospf[1011]: %ROUTING-OSPF-5-ADJCHG : Process 8000, Nbr 10.100.96.204 on TenGigE0/0/0/1 in area 0 from FULL to DOWN, Neighbor Down: BFD session down, vrf default vrfid 0x60000000
RP/0/RSP1/CPU0:Jan 15 03:56:52.397 : shelfmgr[394]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/0/CPU0 A9K-8T-L state:BRINGDOWN
01-16-2015 06:45 AM
to "close" on this and making sure that it is addressed pasting comments from the other discussion on the same item:
ah this: PLATFORM-NP-4-FAULT : prm_process_parity_tm_cluster: 1 Unrecoverable error(s) found.
it means that the NP number 4 on the linecard in slot 0 incurred a memory parity error on the traffic manager portion of the NPU (the portion that handles Q'ing and scheduling) and it could not correct that error and therefore decided to reinit and crash.
Generally with memory parity errors we always advice to catch it once, monitor it and if this happens again to replace the card.
If you are uncomfortable "waiting" until a next event, you could decide to replace it now, but many times parity errors are transient and caused by a what we used to call "cosmic radiation" which is merely an assembly of uncommon not likely to happen events such as a power spike or drop, or other intangible events.
cheers
xander
03-22-2016 10:44 AM
Hello All,
I get the following messages and the card A9K-40GE-B keeps cycling through IOS XR PREP,MBI-BOOTING,MBI-RUNNING. and it finally putting it IN_RESET state.Any help is truly appreciated.
0/1/CPU0 A9K-40GE-B MBI-BOOTING PWR,NSHUT,MON
RP/0/RSP0/CPU0:Router(admin)#LC/0/1/CPU0:Mar 22 17:31:53.057 : prm_server_tr[305]: %PLATFORM-NP-0-INIT_ERR : (0x8000B002) : Setting up NP0 Failed
LC/0/1/CPU0:Mar 22 17:32:50.031 : pfm_node_lc[293]: %PLATFORM-NP-0-NP_INIT_FAILURE : Set|prm_server_tr[151634]|Network Processor Unit(0x1008000)|Persistent Initialization Failure.
LC/0/1/CPU0:Mar 22 17:32:50.036 : pfm_node_lc[293]: %PLATFORM-PFM-0-CARD_RESET_REQ : pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 151634 (prm_server_tr), Fault Sev: 0, Target node: 0/1/CPU0, CompId: 0x1f, Device Handle: 0x1008000, CondID: 1027, Fault Reason: Persistent Initialization Failure.
--------------------------------------------------------------------------------
RP/0/RSP0/CPU0:Router(admin)#RP/0/RSP0/CPU0:Mar 22 17:42:43.665 : shelfmgr[410]: %PLATFORM-SHELFMGR-0-MAX_BOOTREQ_BRINGDOWN : Node 0/1/CPU0 A9K-40GE-B has reset itself in multiple (11) unsuccessful boot attempts, putting it IN_RESET state. The probable cause is an unexpected event on the node. Please Refer to the Cisco ASR 9000 System Error Message Reference Guide for further information if needed.
RP/0/RSP0/CPU0:Router(admin)#sh ver
Tue Mar 22 17:44:02.089 UTC
Cisco IOS XR Software, Version 5.1.0[Default]
Copyright (c) 2013 by Cisco Systems, Inc.
ROM: System Bootstrap, Version 1.06(20120210:003513) [ASR9K ROMMON],
Router uptime is 55 minutes
System image file is "bootflash:disk0/asr9k-os-mbi-5.1.0/0x100000/mbiasr9k-rp.vm"
cisco ASR9K Series (MPC8641D) processor with 4194304K bytes of memory.
MPC8641D processor at 1333MHz, Revision 2.2
ASR 9006 AC Chassis with PEM Version 2
2 Management Ethernet
219k bytes of non-volatile configuration memory.
975M bytes of compact flash card.
67988M bytes of hard disk.
1605616k bytes of disk0: (Sector size 512 bytes).
1605616k bytes of disk1: (Sector size 512 bytes).
Configuration register on node 0/RSP0/CPU0 is 0x2102
03-22-2016 10:46 AM
Hi!!
the card has an NP init problem, it was trying to set itself up and tests it attached memory and that failed. after a few tries it gave up and put itself in IN-RESET.
you would want to RMA this board and have it replaced.
xander
03-22-2016 11:06 AM
NP init problem= NO power initialization problem ?
I apologize for not knowing this, I'm new with the ASR line.
Thanks for your response.
03-22-2016 11:15 AM
oh, it means an np initialization error. when the np boots, it tests its memory for search, stats, tcam and packet buffers, if these fail, it is called an np init error.
from the logs you provided I can't tell which mem failed precisely, but regardless, it can't be repaired or salvaged without a hw replacement.
and oh, if you're new and want to see some more, check out cisco Live ID 2904 from orlando, sanfran and sandiego. Possibly also the brkarc id 2003 for some good stuff on a9k. and of course here on the forums! :)
cheers
xander
03-22-2016 12:12 PM
Hi,
NP is Network Processor and your
Correct me if I am wrong about NP number.
04-13-2016 11:33 PM
Hi Alexander
How i can install add tar from O/RSP1/CPU0 ??
or i need reload RP to install tar file
04-14-2016 12:12 AM
Hi,
you want to upgrade to a newer IOS-XR version?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide