06-12-2013 12:10 PM - edited 03-07-2019 01:51 PM
I have a 4900m with IOS 12.2(53) SG2.
Over the past 6 months, there have been about 5 instances where the switch has rebooted itself. Most of them occuring within the last few weeks. So it seems to be getting worse.
dumping the log data showed this at the end
Jawa Crash Data:
Interrupt Mask: 0xE100
Interrupt: 0x2000
Forerunner CRC Error
Is this telling me i am having possible hardware failures like RAM?
06-12-2013 02:41 PM
You should open a ticket with TAC and send them the crash file and any other info you have. It maybe a memory issue.
HTH
06-12-2013 03:12 PM
Over the past 6 months, there have been about 5 instances where the switch has rebooted itself.
Sounds like an IOS issue.
Can you attach/post the crashinfo files?
06-12-2013 07:57 PM
Hi,
Please send me the show tech . or show ver/show platform crashdump.
Regards
Inayath
06-13-2013 09:01 AM
here is crash dump, sorry i have it only as scanned images.
06-13-2013 01:03 PM
I have a report from a second crash.
similar to the previous dump with a few exceptions;
Machine Check Interrupt Count: 1c9910b
L1 Instruction Cache Parity Errors: 0
L1 Instruction Cache Parity Errors (CPU30): 0
L1 Data Cache Parity Errors: 1c9910b
Jawa Crash Data:
Interrupt Mask: 0xe100
Interrupt: 0x1000
The L1 info, is that related to CPU L1 cache? It is looking like its a hardware issue not a software issue.
I found this on the 12.2 (54)SG release notes:
Parity errors in the CPU's cache cause IOS to crash with a crashdump file like the following:
Switch# show platform crashdump
VECTOR 0
*** CRASH DUMP ***
02/09/2009 10:10:30
Last crash: 02/09/2009 10:10:30
Build: 12.2(20090206:234053) IPBASE
buildversion addr: 13115584
MCSR: 40000000 <--- non-zero value!
.
The key pieces of data are "VECTOR 0" and a MCSR value of 40000000, 20000000, or 10000000.
Workaround: Enter the show platform cpu cache command to lanuch an IOS algorithm that
detects and recovers from parity errors in the CPU's cache. You will obtain a running count of the
number of CPU cache parity errors that have been successfully detected and corrected on a running
system:
Switch# show platform cpu cache
L1 Instruction Cache: ENABLED
L1 Data Cache: ENABLED
L2 Cache: ENABLED
Machine Check Interrupts: 5
L1 Instruction Cache Parity Errors: 3
L1 Instruction Cache Parity Errors (CPU30): 1
L1 Data Cache Parity Errors: 1
I get a similar return with non zero Parity errors. Is the workaround saying that running the command "show platform cpu cache" will fix the errors? Or is that a temporary thing related to the IOS.
I am wondering if i should update the IOS to solve this or if the issue is really a hardware problem.
Thanks
06-13-2013 07:15 PM
Hi,
Just finished analyzing your data and it related to Hardware issue. Kindly go ahead and raise the RMA for the same.
By any chance do you see following msgs in the logs:
%C4K_L3HWFORWARDING-4-PROFILEIDMAPTABLEPARITYERROR: Parity error detected and corrected at profileIdMapTable
HTH
Regards
Inayath
*Plz rate if this info is helpfull.
08-12-2013 07:45 AM
So i ended up replacing the 4900m unit that had crashed at least 3 times with one from the lab that never reported this issue. And now this new one also crashed in the same manner doing a self reboot.
Now i find it hard to beleive it is a hardware issue given that it has happened to different units.
Could this possible be a IOS bug? I am thinking of upgrading to 15, but it next to impossible to reproduce the problem.
any ideas?
08-12-2013 09:16 AM
I have seen issues of software parity errors causing 4900M's to reload, and I have also seen positive impact from upgrading code. Unless you're set on moving to the 15 train, you may find that some of the later versions still in your train, like 12.2(53)SG8, may run more stable. I typically am a bit timid when it comes to the latest releases.
In short, many times parity issues are software instead of hardware. If this is the case, notably when release notes and bug toolkit indicate as such, a software upgrade may be beneficial.
Good luck!
Matt
02-17-2014 06:44 AM
Hi
Did you ever find a solution for this?
I seem to have a similar problem, 4900M booting out of the blue, no syslog entries, and Forerunner CRC Error in the crash dump.
Beat
02-17-2014 01:44 PM
Post the output to the following commands:
1. sh version
2. dir
02-17-2014 11:48 PM
switch#sho vers
Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500e-ENTSERVICESK9-M), Version 12.2(53)SG2, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Tue 16-Mar-10 04:50 by prod_rel_team
Image text-base: 0x10000000, data-base: 0x12794974
ROM: 12.2(44r)SG10
Darkside Revision 0, Jawa Revision 11, Tatooine Revision 140, Forerunner Revision 1.78
switch uptime is 2 days, 5 hours, 6 minutes
System returned to ROM by power-on
System restarted at 03:39:43 MET Sun Feb 16 2014
System image file is "bootflash:cat4500e-entservicesk9-mz.122-53.SG2.bin"
This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.
A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html
If you require further assistance please contact us by sending email to
export@cisco.com.
cisco WS-C4900M (MPC8548) processor (revision 2) with 1048576K bytes of memory.
Processor board ID JAE154709HC
MPC8548 CPU at 1.33GHz, Cisco Catalyst 4900M
Last reset from PowerUp
2 Virtual Ethernet interfaces
16 Gigabit Ethernet interfaces
16 Ten Gigabit Ethernet interfaces
511K bytes of non-volatile configuration memory.
Configuration register is 0x2101
switch#dir
Directory of bootflash:/
6 -rw- 25646261 Dec 1 2013 15:22:16 +01:00 cat4500e-entservicesk9-mz.122-53.SG2.bin
131436544 bytes total (98320384 bytes free)
switch#
02-18-2014 02:44 PM
System returned to ROM by power-on
System restarted at 03:39:43 MET Sun Feb 16 2014
According to your "sh version", your chassis went down/up because of power.
The output of your command "dir" also means that the chassis did not "crash".
02-21-2014 04:10 AM
You're absolutely right. I didn't grasp to the full extent the meaning of by power-on .
So there's not really a problem - but I'll have to check with our data center people for reasons of this power cut.
Thanks, Beat
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide