02-22-2013 08:59 AM - edited 03-07-2019 11:53 AM
Hi,
I updated a switch (C3560G-48PS-S) from 12.2(58)SE2 to IOS 15.0(2)SE1. Some time after the upgrade I got error messages about memory allocations to our syslog server in regular intervals (each 30 seconds).
I wasn't able to connect to the switch over SSH anymore. It wasn't even possible to access the CLI over the console cable (error message: "Low on memory; try again later"). After a reboot of the switch, it went fine for some hours, but the error appeared again. It seems that the switching process still works fine as there aren't any complaints of users about network issues.
Important: I installed the same IOS version to another switch of the type C2960-24PC-L and the memory allocation appeared there after some hours as well. I thought that this issue is maybe solved with the newest release of IOS for that particular device. But even with IOS 15.0(2)SE2 on the C2960-24PC-L the memory allocation error happens again. Just the traceback is a little bit different.
Does anyone have the same issue with IOS 15.0(2) as well? Could maybe give me someone a hint what to do for solving that issue?
Thanks.
Error Message of C3560G-48PS-S with IOS 15.0(2)SE1
031513: Feb 22 11:00:26.848: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x2C13A88, alignment 0
Pool: Processor Free: 693180 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "CDP Protocol", ipl= 0, pid= 205
-Traceback= 1FBAAF4z 2BF7F08z 2BFEAE4z 2C13A8Cz 1EB4DF4z 1EB8A0Cz 1EB8B00z 1E82974z 1A2ABF0z 1A2F1B0z 12EA4FCz 12EE85Cz 19EDC84z 19E83D8z
Error Message of C2960-24PC-L with IOS 15.0(2)SE1
113163: Feb 21 08:53:33.218: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x1441F44, alignment 0
Pool: Processor Free: 421908 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "CDP Protocol", ipl= 0, pid= 179
-Traceback= E08B40z 14263C4z 142CFA0z 1441F48z D02E40z D06A58z D06B4Cz CD09C0z 880BF0z 87E4C4z 87E600z 8845F0z C53660z C5397Cz C4FB20z 14A10ACz
Error Message of C2960-24PC-L with IOS 15.0(2)SE2
011537: Feb 22 17:01:49.415: %SYS-2-MALLOCFAIL: Memory allocation of 65536 bytes failed from 0x1442328, alignment 0
Pool: Processor Free: 329472 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "CDP Protocol", ipl= 0, pid= 179
-Traceback= E08F18z 14267A8z 142D384z 144232Cz D0321Cz D06E34z D06F28z CD0D
02-22-2013 12:45 PM
02-22-2013 01:19 PM
Unless you critically need features found in these two versions, try 12.2(55)SE6 or SE7
Sent from Cisco Technical Support Nintendo App
02-25-2013 11:48 AM
Basically I don't need any additional features of IOS 15.x, but the latest IOS of 12.x was already published in July 2011. Due to the security policy of my company I've to update the operating systems of our switches. Based on the download section it seems that Cisco publishes only newer versions of 15.x and 12.x won't be supported anymore.
Nevertheless, I changed the C3560G-48PS-S back to IOS 12.2(58)SE2 in the meantime and it works fine again. I'll try to isolate the issue with the C2960-24PC-L which still runs the newest IOS. Maybe I'm able to printout the commands provided by Nick anyhow.
02-25-2013 11:16 AM
Stefan,
Let's focus on one of these, as the issue on all of them is likely the same. Could you please upload the following outputs from one of these devices when it is seeing these memory allocation failures:
-show version
-show mem all totals
-show mem sum
-show proc mem
What was the version that was previously running on these devices? Based on your description, this issue is only seen AFTER the upgrade, correct?
-Nick
02-25-2013 11:40 AM
Nick,
Correct, the issue happens after the upgrade. I changed now the switch C3560G-48PS-S back to IOS 12.2(58)SE2. It works fine again. So it definitly seems that the memory allocation error happens due to the IOS upgrade. The C2960-24PC-L still runs with IOS 15.2(2)SE2 and produces the mentioned errors.
Unfortunately I'm not able to access the switch remotely to type in the provided commands and it's even not possible to connect to the console. While trying to use the console cable, the message "Low on memory; try again later" appears. The only possibilty to get access to the command line would be to reboot the switch and to connect to it immediately. Unfortunately to that time the memory allocation errors don't appear as it runs for some hours without any issues.
I guess the output of the commands after a reboot wouldn't be useful, right?
Thanks. Stefan
02-25-2013 12:09 PM
Nick,
I'm able to provide at least some information as we weekly save the output of 'show tech-support'. The upgrade happened some days prior to the last save and the switch already produced error messages about memory allocation to that time. While having a look at the output I found out that the log file is filled by another message:
010225: Feb 22 16:13:50.670: AAA/ATTR(00000000): cannot alloc new sublist
Any idea about that error? Did maybe something change with the authentication method from IOS 12.x to 15.x?
Stefan
02-25-2013 12:22 PM
Stefan,
Running the commands immediately after a reboot would provide us with a baseline. After the reboot, how many hours does it take before you start seeing memory allocation failures? These low end switches operate within very tight tolerances with regards to memory usage, and a small leak can result in this type of behavior. Looking at the output you provided there are two processes that I am concerned with:
152 0 321956340 318071124 3874320 0 0 Auth Manager
179 0 8375196 3826736 1471724 0 0 CDP Protocol
If we knew that it was say 4 hours before the issue occurred after a reboot, and we could capture the commands immediately after a reboot, and then say every hour after that until the problem occurred that would help.
-Nick
02-27-2013 02:02 AM
Nick,
Good idea to have a baseline. I gathered the corresponding output yesterday: I did another reboot of the C2960-24PC-L and as before the switch didn't produce any error messages for a couple of hours.Cause it isn't possible to access the switch anymore as soon as memory allocation error happens, I saved the outputs every hour to our server while using a scheduled task. So here we go...
Please find attached all outputs of the provided commands. It took arround 4 and a half hours till the error messages appeared again: Reboot took place at 6:32am and the first error appeared at 11:12am (local time). The following graphs shows additionally the steady increase of the used memory of the I/O and processor pool of that particular switch. Based on the output of 'show memory process' I did additionally some charts with shows the process of memory usage for 'Auth Manager' and 'CDP Protocol' (see PDF).
Please let me know if I shall provide additional information. Thanks a lot for your help.
Stefan
C2960-24PC-L
C2960-24PC-L
02-28-2013 04:37 PM
Stefan,
The issue definitely appears to be a leak. If you look at the memory there is a large increase (relatively speaking) in:
PC Total Count Name
0x00D03218 4065960 62 AAA AttrL Sub
In the capture at 0 hours this is holding about 65K, however after 6 hours it's risen to 4 MB. As I mentioned before a small leak can have a large impact on a switch like this because it doesn't have much memory free to begin with. This issue looks very similar to one that we fixed in 15.0(2)SE:
EAP Framework and AAA AttrL Sub Uses All Process Memory
This bug deals with AAA and dot1x authentications. However this bug is already fixed in your release. We need to get a new bug opened for this issue and attempt to reproduce it in our labs. Whenever it is convenient for you, I would suggest opening a TAC case, you can do that here:
http://tools.cisco.com/ServiceRequestTool/create/launch.do
When you go through that process it will ask you for your CCO user id, and you will have to select a tech and subtech that describe the problem. Please use:
TECH: Router and IOS Architecture
SUBTECH: Memory Allocation Failure
PROBLEM CODE: Software Failure
If you have any problems in this process, let me know.
-Nick
03-01-2013 05:54 AM
Hi Nick,
Thanks for analyzing the outputs and the information about the memory leak. Unfortunately I'm not able to open a TAC case by my myself due to our maintenace contract, but I already forwarded the information to our Cisco partner with reference to that forum thread. As soon as I get an update about the issue, I'll let you know. Propably there are some others out there with the same issue on their switches
Stefan
03-18-2013 11:18 AM
Appears to also be a issue in 15.0(2)SE2, 3750-E. Was able to get this yesterday:
sh processes memory sorted
Processor Pool Total: 175321384 Used: 163314948 Free: 12006436
I/O Pool Total: 16777216 Used: 12939452 Free: 3837764
Driver te Pool Total: 4194304 Used: 106740 Free: 4087564
PID TTY Allocated Freed Holding Getbufs Retbufs Process
211 0 345172288 149244784 89995048 25380 0 Auth Manager
0 0 119855256 53253192 60774588 0 0 *Init*
0 0 683544756 670127884 6809520 14732139 1472786 *Dead*
93 0 5137024 1504012 2914280 44196 0 Stack Mgr Notifi
214 0 1248266660 331598488 2437228 635292072 0 CDP Protocol
435 0 2170332 214372 1962324 0 0 EIGRP-IPv4
282 0 978660 1476 952616 0 0 IPC LC Message H
0 0 0 0 656760 0 0 *MallocLite*
288 0 549328 7932 535484 0 0 IP RIB Update
343 0 180616076 178111144 439804 0 0 hulc running con
1 0 76233364 75865200 397164 0 0 Chunk Manager
64 0 368236 600 377796 0 0 EEM ED Identity
205 0 297412 10660 295704 0 0 HL2MCM
204 0 294980 8236 294980 0 0 HL2MCM
370 0 265100 0 275260 100548 0 EEM ED Syslog
30 0 290688 0 268584 0 0 IPC Seat RX Cont
384 0 196344 0 203504 0 0 EEM Server
298 0 289752 76232 190556 2268 0 DHCPD Receive
247 0 1056852 432180 169488 55836 0 802.1x switch
415 0 15259740 13604 167108 10152 0 IGMP Input
Now I can no longer SSH or console into the switch. Get login prompts but local or AAA authentication doesnt work and no TACACS request is being sent to ACS. Upgraded three stacks for testing before deployment. This is the only one that seems to be having the issue. It was right at a week of uptime. Opened TAC cast a bit ago.
03-18-2013 02:23 PM
I've created a TAC case (last week) after I got three different type of Tracebacks from a 3750G. TAC engineer finally confirmed on Friday that they found the bug I was hitting.
No ETA as to when an engineering fix is going to appear.
03-19-2013 09:55 AM
Thanks for your replies. I'm glad to hear that I'm not the only with that issue.
I created a TAC case in the meanwhile as well. Unfortunately I wasn't able to provide Cisco the requested outputs till now as the memory allocation errors didn't appear after a reboot of the switch so far. Nevertheless, the memory usage increases steadily... so I guess it's just a matter of time till it happens again.
Stefan
03-26-2013 08:45 AM
We are also having this issue with 15.0(2) SE2 on two different 3750 switch stacks. We upgraded our 3 HQ stacks over the weekend and now i cannot remotely connect to two of them. The other works fine. Very strange.
We will be rolling back this weekend.
Any suggestions on a good, working IOS to go to?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide