cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2182
Views
10
Helpful
8
Replies

Cisco 4451 High Memory Utilization

djoseph18
Level 1
Level 1

Currently have a Cisco 4451 that has been having high memory utilization (97%), which we couldn't log in to the router via SSH.

Since we were not able to remote in to the machine, we got someone to manually reboot. The memory did decrease down to 20%, however I would like to know the cause of this.

The engineer who did the reboot was able to take a some show commands and logs before the reboot (attached in the discussion). Also I did learn that one of the main reasons we could not login via SSH to the router was because the there was an authentication reject once the memory was 3% and lower. This was changed to the lowest 2%, so if it does reach anything below 98% we should still be able to remote in.

At the moment I would still like to know how we can verify what was the cause, can any one assist?

8 Replies 8

Philip D'Ath
VIP Alumni
VIP Alumni

There is not enough information to determine anything.

Lets start with the obvious.  What IOS-XE version are you running?

You obviously have a BGP peer.  How much memory is this using, or how many prefxies are you learning?

I lie, you have OSPF, not BGP.  This makes me most suspicious of the IOS-XE version you are using.

The current version is 15.5(3)S0c, this currently is one of the 2 routers that acts as our Group routers in a data centre

Hmm, you have code at the start of a train.  Always a recipe for an issue.  I wouldn't do anything until you have moved to a gold star release like 3.16.5S.

https://software.cisco.com/download/release.html?mdfid=284389362&catid=268437899&softwareid=282046477&release=3.13.7S&relind=AVAILABLE&rellifecycle=MD&reltype=latest

Currently that is something being proposed to be done in the near future, however at this time would like to see what can be the cause from the current version. 

You are about 5 patch releases behind.  There could be 1,000 bugs fixed between the code you are on and the current recommended code.

If you are keen, you can start reading all those release notes - but really, its not worth investing the time.  You should just move onto code that Cisco recommends you use.

Or to put it another way - why would you choose to stay running code that is not recommend by Cisco?

This would require some level of change management to be approved, however duly noted.

Currently we are just monitoring the situation, since the memory has not risen from the 20%.

However once the upgrade has been performed, it would be hard to say at that point that the upgrade fixed the error, unless it happened again right? 

But won't give the answer as to why it happened now?

From my experience, you can spend hours and hours of time investigating an issue that was resolved long ago in the current recommended code.  These days I tend to move directly to Cisco's current recommend code release and then spend time diagnosing any issue that is left.  The other thing is that, in your case, it is likely to resolve 1,000 bugs that you don't know about yet - or are yet to run into.

What's more important, resolving the fault in an expedient manner, or spending a long time trying to find a documented cause of the issue while experiencing outage after outage?

In your case, you can commence monitoring the memory consumption.  You can take regular (say daily) printouts of the top process memory consumers, and see if one of them is growing.

If memory consumption is consistent then you probably have resolved the issue.

Sometimes it is not low memory that is the issue, but memory fragmentation, or simply a specific buffer pool running out.