04-26-2012 03:42 AM
Hi all,
One of our ACE-20's crash recently with little info as to why - fortunately it was the FT standby module so service wasn't impacted but obviously keen to determine the cause of the crash, and potential resolution.
Running A2 (3.5).
last boot reason: NP 1 Failed : NP Core Reset - Cause Unknown
There is nothing obvious from the switch perspective:
Apr 17 14:52:35.775 bst: SP: The PC in slot 9 is shutting down. Please wait ...
Apr 17 14:52:45.780 bst: SP: PC shutdown completed for module 9
510497: Apr 17 14:52:55.781 bst: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Reset)
510498: Apr 17 14:57:58.277 bst: %DIAG-SP-6-RUN_MINIMUM: Module 9: Running Minimal Diagnostics...
510499: Apr 17 14:57:58.537 bst: %DIAG-SP-6-DIAG_OK: Module 9: Passed Online Diagnostics
510500: Apr 17 14:57:59.213 bst: %OIR-SP-6-INSCARD: Card inserted in slot 9, interfaces are now online
510501: Apr 17 14:58:06.974 bst: %SVCLC-5-FWTRUNK: Firewalled VLANs configured on trunks
Has anyone come across this issue before ? Any particular way to further diagnose the fault ?
Any help is appreciated.
Thanks,
Anthony
04-26-2012 05:00 AM
Found an ixp1_crash.txt in core: filesystem.
Most of it doesn't mena much to me, but I did find reference to :
Shutdown[0,0] S/C/F=40/4/0 C/D=fe005fec/fe051ed8
[0]PID-TID=172052-11 P/T FL=00000010/85020000 "proc/boot/loadBalance_g_ns"
armbe context[fee16abc]:
0000: 00000000 3b300000 0005a3b2 0005a3b3 83900000 0017a9e0 00000000 83324200
0020: 00183900 4f8d758f 0017ae60 0179cfbc 849eb190 0179cf40 000003b3 00104c68
0040: 2000001f
instruction[00104c68]:
e5 0b c0 2c e1 a0 e3 20 e2 0e 20 01 e3 52 00 01 e5 0b 20 40 0a 00 00 14 e5 1b
stack[0179cf40]:
0000: 00000000 00000000 849eb190 00000000 00000000 00000000 00000000 00000002
0020: 00000010 b0c00000 00000001 00000004 00010000 00000001 00040000 00000000
0040: 0005a3b3 00200000 b0c05040 82e000a0 849eb190 00000000 00000000 00000000
0060: 00000000 00000000 00000000 00000000 00000000 0179cfc0 0012d2a8 00104b58
System image version: A2(3.5) 3.0(0)A2(3.5) adbuild_16:16:16-2011/08/04_/auto/adbure_nightly4/renumber/rel_a2_3_5_throttle/REL_3_0_0_A2_3_5
IXP CAUSE = NP Core Reset - Cause Unknown
And
<3>% IXP 1 XScale Core reset detected !!
Will probably raise a ticket, and see what comes from that.
04-30-2013 03:22 AM
Hi
did you open a TAC case on this? If yes, what result you got?
thanks:
jonagy
04-30-2013 04:11 AM
Hi Anthony,
seems like a known defect in A2(3.5)
Please read the release notes:
—When the configuration manager sends a message to TCP and the message has a proxy ID that is out of bounds, the network processor microengine (ME) becomes unresponsive and the ACE reloads with a last boot reason of "NP 1 Failed : NP ME Hung" or "NP 2 Failed : NP ME Hung". Workaround: None.
The defect is fixed in A2(3.6a)
regards,
Ajay Kumar
04-30-2013 04:33 AM
Hi
many thanks for your advise. I have found this bug, but have doubts that we hit exactly, because
I see:
last boot reason: NP 2 Failed : NP Core Reset - Cause Unknown
and not:
last boot reason: NP 2 Failed : NP ME Hung
there was no core file, instead we found two new files on "core:" created exactly at the time of the reload
file "ixp2_crash.txt"
IXP CAUSE = NP Core Reset - Cause Unknown
NO Parity Error DETECTED #regarding SRAM
************************************************************
Kernel Message Ring Buffer Start:
************************************************************
...
<2>Warning:- MTS queue is full for opcode 4062 sap 25137 pid 2455. This warning can be ignored. If you want to recover - close all debug plugin sessions and terminate command execution in all telnet/ssh connections.
<3>% IXP 2 XScale Core reset detected !!
<4>sending signal 17 to SME, pid 954
file "outstanding_syslogs"
which contains:
Sun Apr 28 09:30:17 2013
snmpget reqID : 16167790 ctxId 0 -v 2c -c zeus124 -m all xx.xx.yy.zz iso.3.6.1.2.1.1.3.0
Sun Apr 28 09:31:41 2013
snmpget msgID : 1229056 ctxId 0 -v 3 -u NNMI_USER -l authPriv -m all xx.xx.yy.zz iso.3.6.1.4.1.9.9.109.1.1.1.1.8.1 iso.3.6.1.2.1.1.3.0 iso.3.6.1.4.1.9.9.117.1.2.1.1.2.1 iso.3.6.1.2.1.2.2.1.7.16777224 iso.3.6.1.2.1.2.2.1.8.16777224 iso.3.6.1.2.1.2.2.1.7.16777226 iso.3.6.1.2.1.2.2.1.8.16777226
there was no configuration change during the crash.
BTW, this system runs :
build 3.0(0)A2(3.5)
Moreover on the bug description page, this sw ver. not listed.
Anyway upgrade is advisable.
BR:
jonagy
04-27-2012 09:16 PM
Hello Anthony,
You can take a look of the #dir core:, as you can see in this link:
You may require to get the core dumps out of box and open a TAC case to determine which software defect impacted.
Usually, we may require all the core dumps when the issue happens, the show tech-support of the switch and ace module and syslog messages.
Hope this helps!!!
Jorge
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide