SOLUTION Disabled the logging

Asim Ali · ‎09-15-2014

Hello

I am unable to telnet a Cisco 2811 router, but all the links connected to router are operational means router is working fine. (BUT SOMETIMES all the links connected to that router go down) even router doesnt restart, and I have 2 fiber links from 2 different ISPs that cannot go down at the same time and many times.

It happened to me many times, all teh LEDs are ON, I have to restart the router to solve this problem.

Is there any permanent solution to this problem

InayathUlla Sharieff · ‎09-15-2014

what firmware? can you try upgrading the firmware?

Richard Burts · ‎09-16-2014

I find the description of the problem a bit confusing. Is it that telnet never works or that telnet sometimes does work and sometimes does not work? It would also be helpful if the original poster would post some details of the router configuration.

HTH

Rick

HTH

Rick

Leo Laohoo · ‎09-16-2014

I agree with Rick. I can't see any relevant information.

Post the error messages you are seeing when you attempt to telnet into the appliance. Post the configuration (just take away sensitive information like IP addresses and passwords).

Asim Ali · ‎09-17-2014

Further to clarify pls

Telnet works usually.

But sometimes it stops working. and it will start working normal after I hard reboot the router.

I shall post the config also.

Richard Burts · ‎09-17-2014

Thank you for the clarification. Based on this additional information I will make a guess about the problem. Do you have exec-timeout 0 configured under the vty lines?

exec-timeout 0 may be ok for the console port but it can create problems when configured on vty. With no inactivity timeout there is no way to clear sessions that have gone inactive. Someone may telnet to the router and might have a problem that disconnects them but does not terminate their telnet session. This results in hanging vty sessions and at some point all of the vty lines are used and there is no vty available for new sessions and you get the kind of symptom that is described here.

HTH

Rick

HTH

Rick

Joseph Nelson · ‎09-17-2014

Hi Ali,

More than likely the CPU is spiking on that box. When you have the issue, try connecting with a console cable. Is the console sluggish ( slow response when typing keys, slow paging of text? etc).

Also investigate/evaluate the following common culprits of high cpu utilization:

Excessive number of ACEs in your ACL or Extended ACLs with high ports
Excessive ARP ( proxy ARP), SNMP, ssh/telnet traffic. These traffic types go directly to CPU
Ensure you are not oversubscribing the hardware based on your current configuration and feature use
Console and Monitor logging are off ( i.e. no logging console, no logging monitor )

HTH

Joe

Edit: Added bit about logging to the console/monitor sessions

Asim Ali · ‎09-18-2014

Attached configuration. pls have a look

this is a snap of some error logs that appeared when taking console

also attached show processes cpu

Richard Burts · ‎09-18-2014

Thank you for the additional information. This message indicates that there is a memory problem

%SYS-2-MALLOCFAIL: Memory allocation of 928 bytes failed

From your description that the router runs ok after reboot and then at some point the problem shows up it would sound like there might be a memory leak in your version of IOS. If the router is covered under a maintenance contract you could open a case with Cisco TAC to verify what the problem is.

HTH

Rick

HTH

Rick

Asim Ali · ‎09-23-2014

Thank u v.much Richard for pointing out the problem, but I dont have this device covered under maintenance contract, any way to overcome this issue .

Regards

Ali

Joseph Nelson · ‎09-23-2014

Asim,

Unfortunately, it literally can be any number of things:

Bad memory DIMM
Configuration driven issue ( to many ACEs, QoS policy)
Software bug caused by transit traffic ( i.e. malformed packet)

To new a few exotic kinds of issues. The next best step would be for you to try to replicate the condition. You would need to be able to consistently reproduce the problem. If its configuration driven, we may be able to help with workarounds.

Also, have you seen the below doc:

http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/6507-mallocfail.html

I've already verified that you have the right IOS version for your device/memory loadout.

Still come back and post your results as I'm still very much interested in what you find.

Good luck!

Joe

Asim Ali · ‎09-25-2014

Thanks a million Nelson, for getting closer to the root cause, and the document is really helpful.

.

Now looking at my router logs

****************************************************************

1w2d: %SYS-2-MALLOCFAIL: Memory allocation of 928 bytes failed from 0x4036E1B0,

alignment 0

Pool: Processor Free: 257336 Cause: Memory fragmentation

Alternate Pool: None Free: 0 Cause: No Alternate pool

-Process= "Pool Manager", ipl= 0, pid= 5, -Traceback= 0x40EE6418 0x4008182C 0x

4009A5D0 0x4211B5B0 0x4036E1B8 0x400BAAA0 0x400BAC5C 0x41C636AC 0x41C63690

1w2d: %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for ADJ: request r. No

memory available -Process= "Chunk Manager", ipl= 4, pid= 1, -Traceback= 0x40EE

6418 0x400B3F3C 0x41C636AC 0x41C63690

************************************************************************************************

its the pool manager that is affected by lack of memory.

What is exactly pool manager ?

and in CHUNKEXPANDFAIL

What is the ADJ: request.

Kind Regards

Ali

Joseph Nelson · ‎09-25-2014

Hi Ali,

I don't work for Cisco but I'm guessing that the logs are just a symptom of the general memory problem. I believe the Pool Manager is just a memory watch dog responsible for managing the IO buffers. Maybe some from Cisco can chime in here.

I'm assuming the device is not under maintenance contract so it may be hard to prove its a bug that causing your issue.You may have to engineer around the issue -- by reducing the BGP table size to something that's manageable for that device. Or you may have to replace it.

Asim Ali · ‎09-29-2014

Dear all experts and enthusiasts, I have collected some memory and cpu logs to dig deeper into the issue.

Pls have a look , what culprit can be causing loss of telnet.

Attached files;

show memory allocating-process totals

show processes memory sorted

show memory statistics

show memory fail

show memory dead

show processes cpu corted

Asim Ali · ‎09-30-2014

SOLUTION

Disabled the logging, much of the processor memory has been free now.

Router is working fine upto now, since yesterday. Let me monitor it for some days.

Cant telnet 2811 Router, but all links are operational