cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
13555
Views
22
Helpful
23
Replies

Serious C3750 memory leaking problems

vladakoci
Level 1
Level 1

This is fyi.

We have a serious kind of memory leaking troubles on our C3750 stacks. We have hundreds of such stacks, but the problem appears only on those that have many features enabled - like many VLANs, many subnets, HSRP, STP root bridge etc.

The symptom is that first we lose SSH, an error message like this can be found in log

Feb 13 13:36:29.336: %AAA-3-ACCT_LOW_MEM_UID_FAIL: AAA unable to create UID for incoming calls due to insufficient processor memory

then we lose telnet ( we normally do not use telnet but enabled on those )

then we are not able to log in through console, getting errors like these on console

%% Low on memory; try again later

and then the switch loses it L3 and L2 functionalities and needs to be restarted.

The whole process takes some time, like two weeks, it develops slowly, it is not a suden strike.

We reported this to Cisco about a month ago, they tried to match to these bugs

CSCtt96255

CSCuc03649

but it looks like it is a new bug.

We tried various IOS v15 versions, but did not help. Provided a lot of info from runing switches, Cisco development team is involved, but the root cause not known yet. Cisco's advice was to get te switch rebooted regurarly.

Internaly we made a decision to downgrade IOS on one of these switches, and went to 122-58.SE2 on 23 Feb and since that we have not had any issues. Of course there is no guarantee we will not have, but so far we are happy and going to downgrade the other ones as well.

23 Replies 23

Hello,

Is there any update on this case? We did an update to the latest IOS v15 from v12 and now we have exactly the same issue.

Do we need to do a downgrade or is there a real sulution available. I cannot believe Cisco didn't provide any solution yet and we don't want to do a downgrade if not necessary.

Tnx!

Unless you have operational features requirement to run 15.0, avoid this version.

I would recommend using 12.2(55)SE7.

If you have 802.1x, avoid 15.0 at all cost.

Sent from Cisco Technical Support Nintendo App

  Sad to see you are still dealing with this type issue as this has been going in the 3750 since early 12.X code .  We had this issue with like 12.2.35 SE  way back 5 years ago .

Hi Glen,

Just want to let you (and anyone else reading this thread), I've started rolling-BACK my fleet of 3560 and 3750 from 15.0(2)SE2 to 12.2(55)SE7.

I'm hitting multiple bugs in this version with our implementation of 802.1x. 

Cisco is still working on the TAC case since February and they have not provided us a clear answer on what the root cause is and how to remedy it.

Definitely it is dependent on the number of MAC addresses as on one of the stack we had never a problem with before we added three more members and more devices connected and got the issue very fast. We do not use dot1x but one of the suspected process on Cisco radar was and maybe still is HULC DOT1X Process.

We took our internal measures and downgraded couple of switches we had the problem with from v15 to 122-58.SE2, and we have not had any troubles since. Generally speaking if we experience this trouble on the C3750 stack with v15 we take the info from it and send to Cisco, but immediately downgrade it to 122-58.SE2, so honestly Cisco does not have much time to dig into it more deeply. They tried to simulate in their LAB, but were not able to reproduce. I understand that as for sure the issue develops in our environment only at very specific conditions as we have hundreds of similar systems and got the issue so far on 6 of them.

We have reported several other issues with v15 on C3750 to Cisco, some cosmetic, some more serious and I think that  v12 is an older software train , many bugs were reported and fixed for it. This is very mature version of IOS. On the other hand, v15 versions are younger and they will follow new business demands, include new features and enhancements. But for the time being we are okay with what is available in v12 so we can use it.

      

The Fix below fixed all out problems so far: 
 
Apparently, in the IOS there's an "Auth manager" that can monitor all 
sessions in a switch. Starting from the 15.0 IOS stream this feature is 
enabled by default. As the bug describes:
"Auth Manager continues to hold more memory in Processor Pool locking 
out access to the 3750 stack unless switch is rebooted"
 
To prevent this 'leakage' of memory to the processor, you have to 
disable this session monitoring by issuing the following command:
 
no macro auto monitor

Memory and CPU has been behaving very stable so far....

Thanks arseus001 the senarion you provided solve my trouble with C3750X memory leakage 

mopepwilliams
Level 1
Level 1

I had mem leaks and high cpu issues with both v15 and 12.2.58 on our 3750/3750x stacks. Followed Leo's recommendation(12.2.55) a couple months back and have been solid since.

'no macro auto monitor' did not help in our case.

We have the latest news from Cisco TAC:


---------

I am writing to update you about my findings. I have worked yesterday on another memory leak on 3750 switches and I have found another bugs related to memory leak on v15 software for this platform :

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCud60602

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCuf32893

Both of them have a root cause in below bugs ( confirmed by DE in other SR's that they are duplicated of below ) :

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCub85948

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCue92705

 

When we add to this previously found bugs CSCuc03649 , CSCtt96255 I believe that risk of memory leak on 15.0.2-SE is quite big. The strange thing is that in those bugs I have noticed that those bugs are noted to be found in v15 for 3750 but it is not fixed in those versions ( it is fixed only in 15.3 and 15.2 which are not available for 3750 ).

I am contacting the development team what is the reason that fix for this bug is not present in 15.0.2-SE software train.


Vlad,

I stand by my recommendation.  Avoid using 12.2(58)SE, 15.0(1)SE and 15.0(2)SE.  Downgrade to 12.2(55)SE7 instead.

If you really, really have operational feature requirement in 15.0 then consider 15.0(2)SE2.

WARNING:  If you plan to use 15.0(2)SE2 on 3560 and 3750, make sure you are not using Dot1x.  If you are using Dot1x, then 15.0(2)SE2 will not be a good version for you.

Cisco inform the bugs CSCud60602, CSCuf32893, CSCub85948, CSCue92705 are fixed in the most recent version

15.0(2)SE3 released on 5 June.

We will give it a try and will upgrade one of the problematic switches to this version.

15.0(2)SE3 released on 5 June.

BREAKING NEWS:  DO NOT, under any circumstances, upgrade to 15.0(2)SE3.  If you have TACACs configured and yoru switch boots up to this level, you will NOT be able to access your switch via Telnet, SSH or console.

The only way is to "break-in" via the "Mode" button.

There are two methods to take back control of your switch.  And they are:

1.  Easy Method:

  • See if you can find a spare USB thumb drive (not a portable HDD), size can be between 128 mb to 16 Gb);
  • Format with FAT16;
  • Copy lower IOS (BIN file extension) into USB stick;
  • Insert to your 2960S;
  • Boot into ROMmon (hold down the "Mode" button);
  • Enter command "flash_init";
  • In ROMmon, enter the command "boot usbflash0:OLD_IOS.BIN"
  • Once you've bootup, DOWNGRADE the IOS.

2.  Slightly Easy Method

  • Boot into ROMmon;
  • Enter command "flash_init";
  • Rename "config.text" into something else, like BLAH.text
  • Enter command "boot";
  • When switch has bootup with no config, in enable mode, enter this:  copy flash:BLAH.text run
  • DOWNGRADE the IOS.

Hope this helps.

I'm seeing this and just upgraded to 15.0(2)SE4. Looks like the bug is still around.      

I'm seeing this and just upgraded to 15.0(2)SE4.

It shouldn't be re-appearing.  I've got fleets of 15.0(2)SE4 and I can confirm the issue has been fixed.

Review Cisco Networking for a $25 gift card