I’ll appreciate your comment and advice on the situation we've encountered recently.
Our customer’s ASR1001-X router worked just fine until it required rebooting due to maintenance activity.
The router refused to boot up (we guess it is due to heavy configuration file) and our customer managed to finally boot it up only in several hours, uploading configuration manually in small portions.
Here are some details:
While ASR is running, “sh cpu” and “sh memory” are just fine (25% CPU) (5GB of free memory) with moderate traffic load (mostly telemetry).
After reboot we apparently experience the lack of resources and do see these type of messages:
%SYS-2-MALLOCFAIL: Memory allocation
Pool: Processor Free: Cause: Memory fragmentation
Alternate Pool: Cause: No Alternate pool
%SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for Packet Elements. No memory available -Process= "Chunk Manager"
%SYS-2-CFORKMEM: Process creation of BGP Open failed (no memory). -Process= "BGP Router",
We know that we are exceeding datasheet limits of ASR1001-X as due to the datasheet “Up to 4,000 tunnels GRE are supported” but it works fine under load, the problem is only during booting up.
The questions are:
Unfortunately we can’t address this question to TAC because our service package is expired.
Also I’m under NDA and can’t upload full detailed config, logs, etc.
An upgrade to the latest software asr1001x-universalk9.16.12.05.SPA has not helped.
Having second ASR1001-X is a clear option.
( highest suggested release)
Thank you for your reply. But it doesn’t look like our case.
The router doesn’t crash, it just can’t boot up and process full configuration.
We tried the latest version in 16th (16.12.05) and it has not helped. I doubt 17.3.3 can help. Do you believe it can help?
Hello @kozharov ,
>> Having second ASR1001-X is a clear option.
Adding a second ASR1001-X and dividing the load (GRE Tunnels and BGP sessions) between the two is probably your best option. Because you want to have a solution able to survive in case of a reboot.
The system is not able to restart all the GRE tunnels and the BGP sessions at once, but it is able to support them once reached the steady state.
As you have noted you have gone beyond the suggested performance data.
Hope to help
Dear Giuseppe, you message is clear.
The current idea is to find a single-box solution that can support this amount of GRE tunnels and bgp peers and to buy the redundant box as well.
Simply sharing the current load between two ASRs will definitely solve the situation but only in a half. If one router fails half the tunnel will get down.
What model(s) would you advice if there is nothing to do with ASR1001-Х?
Hello @kozharov ,
the new Cisco platform has the misleading name of Catalyst 8000 but it should be the substitute for ASR 1000.
see presentation attached.
Of course, being a new platform there are some concerns about this.
In alternative there is looking for higher models in ASR 1000 family.
Hope to help
Dear Giuseppe, thank you very much for your comments!
“higher models in ASR 1000 family” – it is unclear if they can help (they have more memory and more powerful CPU) or not (all upper models have the same restriction “up to 4000 GRE tunnels”).
8000 platform – indicates “Up to 8000 SD-WAN IPsec Tunnels” but doesn’t mention GRE. So there is no 100% guaranty it will help with GRE.
So, I’m still in limbo which way to go…
It depends on what you want to see... “Show tech-support” or “sh run” are definitely can’t be shown in full but some specific output can be shown if needed…
Kindly provide the complete output to the following commands:
I believe the router lacks memory during booting up that is why all these allocation problems, but ASR1001-X can have onle 8GB or 16GB on board and we do already have 16GB.