cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5447
Views
31
Helpful
21
Replies

C3750 crash - every few days

g.leonard
Level 1
Level 1

Hi

2nd switch in 2 switch stack keeps crashing every few days or so. Please see lastest crash file below:

Cisco IOS Software, C3750 Software (C3750-IPBASE-M), Version 12.2(25)SEE2, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2006 by Cisco Systems, Inc.
Compiled Fri 28-Jul-06 08:46 by yenanh

Debug Exception (Could be NULL pointer dereference) Exception (0x2000)!

SRR0 = 0x0056B958  SRR1 = 0x00029210  SRR2 = 0x0056A1E8  SRR3 = 0x00021000
ESR = 0x00000000  DEAR = 0x00000000  TSR = 0x8C000000  DBSR = 0x10000000

CPU Register Context:
Vector = 0x00002000  PC = 0x00906338  MSR = 0x00029210  CR = 0x30000005
LR = 0x00906260  CTR = 0x00000000  XER = 0x80000047
R0 = 0x00000000  R1 = 0x026EB788  R2 = 0x00000000  R3 = 0x00000000
R4 = 0xFFFFFFFE  R5 = 0x00000000  R6 = 0x026EB760  R7 = 0x00000000
R8 = 0x00029210  R9 = 0x01660000  R10 = 0x0197F080  R11 = 0x00000000
R12 = 0xA0000000  R13 = 0x00110000  R14 = 0x00578E94  R15 = 0x00000000
R16 = 0x00000000  R17 = 0x00000000  R18 = 0x00000000  R19 = 0x00000000
R20 = 0x00000000  R21 = 0x00000000  R22 = 0x00000000  R23 = 0x026E9F60
R24 = 0x00000000  R25 = 0x00000001  R26 = 0x004DD7D8  R27 = 0x00000000
R28 = 0x02635FB0  R29 = 0x00000030  R30 = 0x00000000  R31 = 0x0000000F

Stack trace:
PC = 0x00906338, SP = 0x026EB788
Frame 00: SP = 0x026EB798    PC = 0x00906238
Frame 01: SP = 0x026EB7A0    PC = 0x005747F4
Frame 02: SP = 0x026EB7D8    PC = 0x00578C98
Frame 03: SP = 0x026EB7F8    PC = 0x00578F48
Frame 04: SP = 0x026EB800    PC = 0x00908064
Frame 05: SP = 0x00000000    PC = 0x008FE62C

Can anybody tell me what is causing this?

21 Replies 21

"Besides, the OP did mention that he can't reboot the switch due to the  importance of the clients so what are the chances of the user upgrading  the IOS?"

Well, if you are going to replace the switch, you would still have to remove it from the stack. How about removing the switch and upgrading the IOS? Takes much less time than having to wait for a replacement.

Well, if you are going to replace the switch, you would still have to remove it from the stack. How about removing the switch and upgrading the IOS? Takes much less time than having to wait for a replacement.

I agree with this recommendation.

Thanks guys for your input.

The switch crashed again on Saturday.

I have been monitoring it since then and have notcied that the Hulc LED Process is being allocated memory constantly without freeing any:

PID     TTY     Allocated      Freed    Holding

110     0         769811656    192       6904

The amount of freed memory has remained at 192 since I started monitoring the switch after its crash on Saturday.

I'm guessing this a memory leak as I would expect memory to be freed if the process was being allocated memory and not increasing the amount it was holding.

Just checked again and its increased to:

PID     TTY     Allocated      Freed    Holding

110     0         772866152    192       6904

Well,

This looks like CSCsj29588

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsj29588&from=summary

Memory leak on Hulc LED Process and RedEarth Rx Mana


A Catalyst 3560 or 3750 might show a memory leak where memory is being consumed
by the  Hulc LED Process and RedEarth Rx Manager processes.

The memory leak has been resolved via an internally found bug.

An upgrade to version 12.2(35)SE1 or later will resolve the memory leak.

Now, 12.2(25)SEE2 was the release known to be affected by this. It is stated that 12.2(25)SEE3 does not see this issue, therefore, its probably a high possibility that 12.2(25)SEE4 might not be affected even though it is stated as affected. I would still look into the possibilites of upgrading into newer releases.

Looks like we have found the culprit! :-D

OK, the amount of memory the process is holding has also started to increase too

PID     TTY     Allocated     Freed          Holding     Process

0          0        48433856    13958000   30762100    *Init*
46        0          911048      1952          782888       Stack Mgr Notifi
110      0       786444352    192            667928      Hulc LED Process
41        0       136772520    241294648  587960      RedEarth Rx Mana

This is the top 4 memory users from "sh proc mem sort" output. RedEarth RX is in there too. I've see this before but guessed it wasn't valid as I had previously read it was fixed in 12.2.25 SEE2

Fear switch is gonna crash again at some point today causing outage for users. I'm hoping to attempt an upgrade to 12.2.25 SEE4 as I should be able to copy image over from master switch.

I've noticed we are running 12.2(25) SEB4 on another floor which has the same hardware (C3750G 48PS) in a two switch stack. Funnily enough this switch has never crashed and this version does not appear on the affected releases for the CSCsj29588 bug. (The switch was deployed at the same time as the one in question).

We do run 12.2(35) SE5 on some newer c3750E switches in another building.

I have been given an outage period tonight so will move both members of the problem switch stack to SEB4 with a view to taking the rest of the building to 12.2(35) or later once I have time to test in a lab.

OK, changed image to 12.2(25) SEB4, to be inline with other floor switches

This didn't work so SEB4 should also appear on the affected releases for the CSCsj29588 bug.

Went to 12.2.35 SE5 as this has run on 2 switches in one of our newer buildings without problem for a couple of years.

This did sort the memory issue and the switch ran fine for a week with no evidence of memory leakage.

However one morning the switch decided that it didn't like some of its ports and they just stopped working. Ran various diagnostics on the switch and checked they hadn't been administratively disabled through port security, spanning tree or any thing else. Ruled out a cabling issue by plugging a laptop directly into the switch. A restart of the switch solved the problem, however my confidence in the switch had deminished so it was swapped out last week.

I'm guessing there was perhaps always an underlying hardware/interface issue which was tripping up the Hulc LED Process and that newer versions of the IOS just deal with the issue better. So I think several people are correct in different ways in this thread. Thanks to all who have helped.

Review Cisco Networking for a $25 gift card