08-15-2022 06:43 AM
HI Experts
I am meeting a strange issue of ASR9K MOD80. This MOD80 Card cannot power up, and always output log as below:
shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-MOD80-SE state:MBI-RUNNING
shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/1/CPU0 indicates it is doing a kernel dump.
shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-MOD80-SE state:IOS XR FAILURE
shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-MOD80-SE state:BRINGDOWN
invmgr[249]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/1/CPU0, state: BRINGDOWN
canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/1/CPU0 , Power Cycle (0x05000000)
canb-server[151]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/1/CPU0 , Power Cycle (0x05000000)
...
I have installed the Card to another A9K ( which version is same as the A9K) , and the MOD80 card can boot up and IOS XR RUN.
Did you have occur this issue ? Please help me to check it. many thks.
Attachment logging after MOD80 OIR collect..
Solved! Go to Solution.
08-19-2022 07:36 AM
I had the same problem.LC tested ok on another ASR9000. But it can't PWR UP any slot on this A9K chassis .I try redundancy switchover, and then LC can boot up.
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/5/CPU0 A9K-MOD80-SE state:MBI-RUNNING
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/6/CPU0 indicates it is doing a kernel dump.
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-MOD80-SE state:IOS XR FAILURE
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-MOD80-SE state:BRINGDOWN
invmgr[264]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/6/CPU0, state: BRINGDOWN
canb-server[158]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/6/CPU0 , Power Cycle (0x05000000)
canb-server[158]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/6/CPU0 , Power Cycle (0x05000000)
ASR9010 with RSP440-TR
08-31-2022 04:06 AM
@DUNGThanks for the case you mentioned ,I also try to switchover the RSP ,and OIR line card ,then the issue have been resloved..
08-15-2022 07:20 AM
Are you Looking for Waikiki Health Portal? If you want to Access Waikiki Health Portal Official webpage then here you can see the official Waikiki Health Portal link. You can access all… https://e-healthportal.org/
08-15-2022 06:17 PM
Hi @Rps-Cheers,
this LC is having kernel crashes
shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/1/CPU0 indicates it is doing a kernel dump.
Open a TAC case to get these dumps decoded.
Eduardo
08-15-2022 07:08 PM
Yes, i have noticed this crash event and check it. I found there are some crashinfo ... i will try to get it.
I feel most strange that is the LC Insert to another A9K, it will run well. But it cannot PWR UP if i insert it to this A9K chassis any slot.
08-15-2022 08:44 PM
this could be related to the differences between both ASR 9000s. Can you please share the show platform and show install active summary of both ASR 9000?
Eduardo
08-15-2022 08:47 PM
08-17-2022 03:17 PM
hi @Rps-Cheers
What about FPD version on both ASRs?
Eduardo,
08-17-2022 06:34 PM
Normal one:
HW Current SW Upg/
Location Card Type Version Type Subtype Inst Version Dng?
============ ======================== ======= ==== ======= ==== =========== ====
0/RSP0/CPU0 A9K-RSP440-TR 1.0 lc cbc 0 16.116 No
lc fpga1 0 0.10 No
lc fpga3 0 4.09 No
lc fpga2 0 1.06 Yes
lc rommon 0 0.73 No
--------------------------------------------------------------------------------
0/0/CPU0 A9K-MOD80-TR 1.0 lc cbc 0 20.118 No
lc fpga2 0 1.04 No
lc fpga4 0 1.05 No
lc rommon 0 3.02 Yes
--------------------------------------------------------------------------------
0/2/CPU0 A9K-MOD80-SE 1.0 lc cbc 0 20.118 No
lc fpga2 0 1.04 No
lc fpga4 0 1.05 No
lc rommon 0 3.03 Yes
--------------------------------------------------------------------------------
<omit>
Abnormal one
HW Current SW Upg/
Location Card Type Version Type Subtype Inst Version Dng?
============ ======================== ======= ==== ======= ==== =========== ====
0/RSP0/CPU0 A9K-RSP440-TR 1.0 lc cbc 0 16.116 No
lc fpga3 0 4.09 No
lc fpga2 0 1.06 Yes
lc fpga1 0 0.10 No
lc rommon 0 0.73 No
--------------------------------------------------------------------------------
0/0/CPU0 A9K-MOD80-SE 1.0 lc cbc 0 20.118 No
lc fpga2 0 1.04 No
lc fpga4 0 1.05 No
lc rommon 0 3.03 Yes
--------------------------------------------------------------------------------
0/1/CPU0 A9K-MOD80-TR 1.0 lc cbc 0 20.118 No
lc fpga4 0 1.05 No
lc fpga2 0 1.04 No
lc rommon 0 3.02 Yes
--------------------------------------------------------------------------------
<omit>
08-17-2022 09:19 PM
FPDs seem to be ok in both setups, I recommend you to open a case with TAC to get these dumps decoded. It will be very useful to share this info with the TAC engineer.
Eduardo
08-17-2022 09:22 PM
well, i will. Many thxs.
08-19-2022 07:36 AM
I had the same problem.LC tested ok on another ASR9000. But it can't PWR UP any slot on this A9K chassis .I try redundancy switchover, and then LC can boot up.
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/5/CPU0 A9K-MOD80-SE state:MBI-RUNNING
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_KERNEL_DUMP_EVENT : Node 0/6/CPU0 indicates it is doing a kernel dump.
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-MOD80-SE state:IOS XR FAILURE
shelfmgr[427]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/6/CPU0 A9K-MOD80-SE state:BRINGDOWN
invmgr[264]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/6/CPU0, state: BRINGDOWN
canb-server[158]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/6/CPU0 , Power Cycle (0x05000000)
canb-server[158]: %PLATFORM-CANB_SERVER-7-CBC_PRE_RESET_NOTIFICATION : Node 0/6/CPU0 , Power Cycle (0x05000000)
ASR9010 with RSP440-TR
08-19-2022 11:43 AM
So to answer you and the original poster (OP) there are many reasons why the card may not transition to XR RUN from MBI RUNNING, in short the main reason is for lack of an XR heartbeat message. That can be due to the control ethernet (CE) or also known as EOBC dropping the packet, XR VM not sending the heartbeat, etc etc. There was a bug way back that was super difficult to detect that it could be CSCuv65231. It affects all version before 5.3.3. It is caused by a slow leak in the mqueue process. Your best bet as the bug says is to connect to the console of the LC as its booting to see what is going on, you can do that on the asr9k with attachcon.
RP/0/RSP1/CPU0:ASR9006-B#run attachCon 0/0/cpu0
Wed Aug 29 14:02:41.699 UTC
attachCon: Starting console session to node 0/0/cpu0
attachCon: To quit console session type 'detach'
Current Baud 115200
Setting Baud to 9600
ksh: e: not found
#
In the above example the card is already up and running in IOS-XR and this takes me to the ksh of the linecard.
Below is of a linecard starting to boot up normally.
Selecting ROMMON Image... B
DDR in Interleaved mode
POST 1 : PASSED : code 0 : DDR2 Memory
System Bootstrap, Version 1.03(20100211:014208) [ASR9K ROMMON],
Copyright (c) 1994-2010 by Cisco Systems, Inc.
Compiled Wed 10-Feb-10 17:42 by
CPU Reset Reason = 0x0005
PPC 8641D (partnum 0x0003), Revision 0.2, (Core Version 2.20136)
M8641 CLKIN: 66 Mhz
Core Clock: 1333 Mhz
MPX Clock: 533 Mhz
LBC Clock: 33 Mhz
Daughter Board Present: yes
Daughter Board ID: 7
Memory Option: 1
Board Type: 40221
POST 3 : PASSED : code 0 : Slot ID/Board Type
Loading Field Programmable Devices:
FPGA 0-B PROGRAMMED : image: 0xfd800028 - 0xfe1dbb28, et: 480ms
FPGA 1-B PROGRAMMED : image: 0xfd800028 - 0xfe1dbb28, et: 930ms
FPGA 2-B PROGRAMMED : image: 0xc0300028 - 0xc039f270, et: 29ms
Main Board: 0x40221, rev 0x3
PLD: 1.2
Bridge0: 1.2
Bridge1: 1.2
CBC: 6.2
Daughter Board: 0x7, rev 0x2, bom 0x2
DB Main PLD: 1.3
DB PLD: 0.8
IO FPGA: 0.11
PCI-E1: Ready as Root Complex
PCI-E2: Ready as Root Complex
ASR9K (8641D PPC) platform with 4096 Mb of main memory
CARD_SLOT_NUMBER: 2
CPU_INSTANCE: 1
MBI Validation starts ...
tsec_init_hw: configuring TSEC (port 1) for: 1GB, Full Duplex
tsec_init_interface: hardware initialization completed
Interface link changed state to UP.
Interface link state up.
MBI validation sending request.
HIT CTRL-C to abort
.
mbi_val_process_packet: received repsonse
Remote image to boot : tftp:/disk0/asr9k-os-mbi-4.2.1/lc/mbiasr9k-lc.vm
IP_ADDRESS: 127.0.1.2
IP_SUBNET_MASK: 255.255.0.0
DEFAULT_GATEWAY: 127.0.1.1
TFTP_SERVER: 127.0.1.1
TFTP_FILE: /disk0/asr9k-os-mbi-4.2.1/lc/mbiasr9k-lc.vm
Performing tftpdnld
tsec_init_hw: configuring TSEC (port 1) for: 1GB, Full Duplex
Sam
08-20-2022 02:32 AM
Our device is running 6.2.3.I want to know what it is LC HW problem or software bug.It doesn't hit bug CSCuv65231.Thanks!
08-20-2022 03:22 AM
I guess it's a software issue. As your meaning, it will recovery after RSP switchover, just as RSP reload..
08-23-2022 08:16 AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide