4900m crash/reboot issue

enpingado · ‎06-12-2013

I have a 4900m with IOS 12.2(53) SG2.

Over the past 6 months, there have been about 5 instances where the switch has rebooted itself. Most of them occuring within the last few weeks. So it seems to be getting worse.

dumping the log data showed this at the end

Jawa Crash Data:

Interrupt Mask: 0xE100

Interrupt: 0x2000

Forerunner CRC Error

Is this telling me i am having possible hardware failures like RAM?

Reza Sharifi · ‎06-12-2013

You should open a ticket with TAC and send them the crash file and any other info you have. It maybe a memory issue.

HTH

Leo Laohoo · ‎06-12-2013

Over the past 6 months, there have been about 5 instances where the switch has rebooted itself.

Sounds like an IOS issue.

Can you attach/post the crashinfo files?

InayathUlla Sharieff · ‎06-12-2013

Hi,

Please send me the show tech . or show ver/show platform crashdump.

Regards

Inayath

enpingado · ‎06-13-2013

here is crash dump, sorry i have it only as scanned images.

enpingado · ‎06-13-2013

I have a report from a second crash.

similar to the previous dump with a few exceptions;

Machine Check Interrupt Count: 1c9910b

L1 Instruction Cache Parity Errors: 0

L1 Instruction Cache Parity Errors (CPU30): 0

L1 Data Cache Parity Errors: 1c9910b

Jawa Crash Data:

Interrupt Mask: 0xe100

Interrupt: 0x1000

The L1 info, is that related to CPU L1 cache? It is looking like its a hardware issue not a software issue.

I found this on the 12.2 (54)SG release notes:

Parity errors in the CPU's cache cause IOS to crash with a crashdump file like the following:
Switch# show platform crashdump
VECTOR 0
*** CRASH DUMP ***
02/09/2009 10:10:30
Last crash: 02/09/2009 10:10:30
Build: 12.2(20090206:234053) IPBASE
buildversion addr: 13115584
MCSR: 40000000 <--- non-zero value!
.
The key pieces of data are "VECTOR 0" and a MCSR value of 40000000, 20000000, or 10000000.

Workaround: Enter the show platform cpu cache command to lanuch an IOS algorithm that
detects and recovers from parity errors in the CPU's cache. You will obtain a running count of the
number of CPU cache parity errors that have been successfully detected and corrected on a running
system:

Switch# show platform cpu cache
L1 Instruction Cache: ENABLED
L1 Data Cache: ENABLED
L2 Cache: ENABLED
Machine Check Interrupts: 5
L1 Instruction Cache Parity Errors: 3
L1 Instruction Cache Parity Errors (CPU30): 1
L1 Data Cache Parity Errors: 1

CSCsx15372

I get a similar return with non zero Parity errors. Is the workaround saying that running the command "show platform cpu cache" will fix the errors? Or is that a temporary thing related to the IOS.

I am wondering if i should update the IOS to solve this or if the issue is really a hardware problem.

Thanks

InayathUlla Sharieff · ‎06-13-2013

Hi,

Just finished analyzing your data and it related to Hardware issue. Kindly go ahead and raise the RMA for the same.

By any chance do you see following msgs in the logs:

%C4K_L3HWFORWARDING-4-PROFILEIDMAPTABLEPARITYERROR: Parity error detected and corrected at profileIdMapTable

HTH

Regards

Inayath

*Plz rate if this info is helpfull.

enpingado · ‎08-12-2013

So i ended up replacing the 4900m unit that had crashed at least 3 times with one from the lab that never reported this issue. And now this new one also crashed in the same manner doing a self reboot.

Now i find it hard to beleive it is a hardware issue given that it has happened to different units.

Could this possible be a IOS bug? I am thinking of upgrading to 15, but it next to impossible to reproduce the problem.

any ideas?

mgalazka · ‎08-12-2013

I have seen issues of software parity errors causing 4900M's to reload, and I have also seen positive impact from upgrading code. Unless you're set on moving to the 15 train, you may find that some of the later versions still in your train, like 12.2(53)SG8, may run more stable. I typically am a bit timid when it comes to the latest releases.

In short, many times parity issues are software instead of hardware. If this is the case, notably when release notes and bug toolkit indicate as such, a software upgrade may be beneficial.

Good luck!

Matt

Beat.Traber · ‎02-17-2014

Hi

Did you ever find a solution for this?

I seem to have a similar problem, 4900M booting out of the blue, no syslog entries, and Forerunner CRC Error in the crash dump.

Beat

Leo Laohoo · ‎02-17-2014

Post the output to the following commands:

1. sh version

2. dir

Beat.Traber · ‎02-17-2014

switch#sho vers
Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500e-ENTSERVICESK9-M), Version 12.2(53)SG2, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Tue 16-Mar-10 04:50 by prod_rel_team
Image text-base: 0x10000000, data-base: 0x12794974

ROM: 12.2(44r)SG10
Darkside Revision 0, Jawa Revision 11, Tatooine Revision 140, Forerunner Revision 1.78

switch uptime is 2 days, 5 hours, 6 minutes
System returned to ROM by power-on
System restarted at 03:39:43 MET Sun Feb 16 2014
System image file is "bootflash:cat4500e-entservicesk9-mz.122-53.SG2.bin"

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

cisco WS-C4900M (MPC8548) processor (revision 2) with 1048576K bytes of memory.
Processor board ID JAE154709HC
MPC8548 CPU at 1.33GHz, Cisco Catalyst 4900M
Last reset from PowerUp
2 Virtual Ethernet interfaces
16 Gigabit Ethernet interfaces
16 Ten Gigabit Ethernet interfaces
511K bytes of non-volatile configuration memory.

Configuration register is 0x2101

switch#dir
Directory of bootflash:/

6 -rw- 25646261 Dec 1 2013 15:22:16 +01:00 cat4500e-entservicesk9-mz.122-53.SG2.bin

131436544 bytes total (98320384 bytes free)
switch#

Leo Laohoo · ‎02-18-2014

System returned to ROM by power-on

System restarted at 03:39:43 MET Sun Feb 16 2014

According to your "sh version", your chassis went down/up because of power.

The output of your command "dir" also means that the chassis did not "crash".

Beat.Traber · ‎02-21-2014

You're absolutely right. I didn't grasp to the full extent the meaning of by power-on .

So there's not really a problem - but I'll have to check with our data center people for reasons of this power cut.

Thanks, Beat