07-03-2017 09:38 PM
Hi All.
Currently one of my customers is replacing his infrasctructure for ASR9K, these routers are used as PE's, right after they were set on production, their memory raised to 60 to 70% of their utilization.
I've been reviewing this for the past weeks, I suspected it was a memory leaking issue but the mem compares registers did not show anything anormal, the threshold do not indicate any potential issue and the watchdog memory also indicates no present issue.
!
RP/0/RSP0/CPU0:AGU_R9006_01D#show memory compare report
Fri Jun 30 17:23:11.161 CDT
JID name mem before mem after difference mallocs restart/exit/new
--- ---- ---------- --------- ---------- ------- ----------------
256 instdir 487342216 492028648 4686432 84444
335 mibd_interface 3209772 3319556 109784 76
379 pm_collector 13308584 13324880 16296 291
469 wdsysmon 3897320 3908208 10888 73
441 sysdb_svr_local 6671040 6678652 7612 150
323 lpts_pa 1387976 1390952 2976 51
1198 mpls_ldp 12730628 12732888 2260 49
1018 ospf 8125696 8127680 1984 79
437 sysdb_shared_nc 2662544 2663184 640 14
383 ppp_ma 3509912 3510328 416 13
410 sconbkup 125456 125872 416 26
440 sysdb_svr_admin 2058212 2058556 344 7
262 ntpd 1432704 1432832 128 2
112 aib 1887012 1887060 48 1
!
RP/0/RSP0/CPU0:AGU_R9006_01D#show watchdog memory-state
Mon Jul 3 14:30:57.262 CDT
Memory information:
Physical Memory: 6144 MB
Free Memory: 1673.371 MB
Memory State: Normal
!
RP/0/RSP0/CPU0:AGU_R9006_01D#sh watchdog threshold memory configured location 0/RSP0/CPU0
Mon Jul 3 14:33:02.231 CDT
Configured memory thresholds:
Minor: 614 MB,
Severe: 491 MB
Critical: 307.199 MB
I checked the process consumption and I did not find any process which had more than 200M used.
My concern is this might be an expected behavior and the chassis was not being dimentioned accordingly with the customer needs.
Does anyone know another way to discard an abnormal behavior in order to detect a defect pattern?
Thank you.
07-04-2017 02:32 AM
With 1.6GB of free memory in steady state, I don't really see a reason for concern. Were all the services already enabled? Especially BGP peering, as most memory on ISP routers tends to be consumed by BGP.
This router seems to have an RSP440-TR. For deployments that require more RAM, there is RSP440-SE with 12GB of RAM. Alternatively, RSP880-TR comes with 16GB and RSP880-SE comes with 32GB RAM.
/Aleksandar
07-04-2017 10:52 AM
Hi Aleksander
I've seen the memory has raised 5% in the past weeks, my concern is this behavior continues increasing in the upcoming months, but I did not find any process that really increases badly.
BGP is the process which consumes the most, 200M. I'll keep an eye on that.
Thank you for the suggested HW.
07-06-2017 03:29 AM
If you observe a monotonically increasing trend of BGP memory consumption, that would indeed be an indication of a memory leak. If this happens, let us know.
You also want to check how close is Dynamic to Dyn-Limit in the output of "sh processes memory detail". If BGP consumes 200MB, you should still be very very far from the Dyn-Limit.
regards,
/Aleksandar
07-10-2017 03:02 PM
Hi Aleksandar,
The BGP process is the one with highest consumption but it does not have a trend of increasing its utilization.
However I've seen the instdir process which vary in time, last week this process got a peak in utilization and the ASR9K reset some BGP sessions to protect itself.
I've been looking for information in regard this process, but no so much about, I believe this process indeed has experience a memory leaking condition, and the patter is present with different ASR9K boxes.
Do you know if there's something already know in regard this process condition?
Thank you.
07-10-2017 11:49 PM
Hi,
what version are you running?
We are running 6.1.3 and last week BGP has crashed on one of our IGW's.
This is the message before the crash.
RP/0/RP1/CPU0:Jul 5 16:27:49.385 : sysdb_shared_nc[447]: %SYSDB-SYSDB-6-TIMEOUT_EDM : EDM request for 'oper/ip-bgp/gl/instance/default/act/shared/vrf/Internet/afi/' from 'bgp_show' (jid 65930, node 0/RP1/CPU0). No response from 'bgp' (jid 1058, node 0/RP1/CPU0) within the timeout period (100 seconds)
This is the output from sh processes memory detail.
JID Text Data Stack Dynamic Dyn-Limit Shm-Tot Phy-Tot Process
------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------
1058 1M 6M 564K 1458M 1978M 106M 1465M bgp
1166 436K 488K 376K 589M 2176M 49M 590M ipv4_rib
349 608K 284K 164K 175M 2048M 137M 175M mpls_lsd
61 120K 60K 152K 148M 300M 32M 148M eth_server
1168 436K 648K 264K 39M 2176M 49M 40M ipv6_rib
07-13-2017 09:51 AM
Hi Smail, we are running 5.3.3
The symptom it was little different, in fact we face a bug
This was the process restart, that cause de memory leaking
RP/0/RSP0/CPU0:Jul 10 12:39:17.437 CDT: sysmgr[94]: instdir(1) (jid 256) (pid 204867) (fail_count 1) abnormally terminated, restart scheduled
When the box faced the memory issue it protected itself by shutting down some BGP session and then free some memory
The bug we experienced is CSCvc99542
As per your logs the behavior is different, however I've seen 6.1.3 is also affected by the instdir process memory leaking. If you see with the mem compare that instdir process is the top process I suggest you to install the SMU asr9k-px-6.1.3.CSCvc99542.pie which is hitless, hope this helps you.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide