cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
156
Views
0
Helpful
3
Replies

ASR9001 Not enough memory

savolyukdmitry
Level 1
Level 1

Hi, ASR9001 rebooted, dump was generated, log messages about Not enough memory, what memory can we talk about ? There is SNMP ciscoMemoryPoolName ( 1.3.6.1.4.1.9.9.48.1.1.1.2)  - "processor".

There is SNMP, at that moment utilization reached 100%, after rebooting free memory became more, but it continues to decrease, what is this memory, how can I see it? could not find anywhere information about this in the context of IOS XR, please advise. Thank you!

part of dump:

Local Syslog Messages:

P/0/RSP0/CPU0:Jul 3 00:53:38.077 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.177 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.177 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.277 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.277 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.377 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.377 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.477 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.477 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.577 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.577 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.677 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.677 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.777 : wdsysmon[487]: wd_get_system_pulseq_count:open failed
RP/0/RSP0/CPU0:Jul 3 00:53:38.777 : wdsysmon[487]: %HA-HA_WD-4-PULSEQ_POLL : wd_memory_poll: Failed to poll for pulseQ usage: 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511) : : (PID=143426) : -Traceback= 40003f10 f10feec8 f10fcf84 400049dc f118241c
RP/0/RSP0/CPU0:Jul 3 00:53:38.777 : fib_mgr[229]: %OS-SHMWIN-6-OPEN_FAILED : SHMWIN: shm open for /dev/shmem/shmwin/ifc-ipv4/0xa8e31000 failed, error: Not enough memory[12]
RP/0/RSP0/CPU0:Jul 3 00:53:38.794 : [95]: fib_mgr[229] PID-356529: PID 356529: _dl_abort. libfib_mpls_format.dll:dllmgr: Failed to allocate data segment for dll libfib_mpls_format.dll [Not enough memory]

RP/0/RSP0/CPU0:Jul 3 00:54:12.640 : sysmgr[97]: fib_mgr(1) (jid 229) (pid 1608458417) (fail_count 5) abnormally terminated, restart scheduled

Crash Reason: Cause: maximum restart attempts exceeded for fib_mgr

 

ciscoMemoryPoolName ( 1.3.6.1.4.1.9.9.48.1.1.1.2)  - "processor"

 

savolyukdmitry_4-1720462690238.png

 

Now:

savolyukdmitry_1-1720462237482.png

 

 

3 Replies 3

smilstea
Cisco Employee
Cisco Employee
According to the syslog for fibmgr you are out of shared memory. Specially for some mpls label operation.
What version of code is this and can you share the features you are using for static, igp, bgp routing? If I had to guess you are running l3vpn and you need to change your label allocation mode to reduce the number of routes/ labels.

Sam

 

Hello Sam.
I have similar routers, there is 80-85% utilization. As can be seen from the conclusions above, there is enough memory, but utilization increases, while there were no records similar to MEMORY_STATE_CHANGE : New memory state: Severe in the logs at the time of the accident and memory did not decrease according to the monitoring of communication.
how can I understand that I need to change label allocation mode to reduce the number of routes/ labels and I have problems with it ?

Cisco IOS XR Software, Version 6.4.2[Default] show route summary Route Source Routes Backup Deleted Memory(bytes) connected 56 1 0 9120 local 57 0 0 9120 local LSPV 1 0 0 160 static 31 2 0 6216 ospf 1 574 5 0 92640 ospf 5 7 0 0 1120 ospf 100 0 0 0 0 bgp 65299 4773 538 0 849760 dagr 0 0 0 0 Total 5499 546 0 968136 show route vrf all summary VRF: XXXXX Route Source Routes Backup Deleted Memory(bytes) local 2 0 0 320 connected 0 2 0 320 Total 2 2 0 640 VRF: XXXXX Route Source Routes Backup Deleted Memory(bytes) bgp 65299 43274 0 2 7286824 connected 18 0 0 2880 local 18 0 0 2880 static 1 0 0 160 dagr 0 0 0 0 Total 43311 0 2 7292744 VRF: XXXXX Route Source Routes Backup Deleted Memory(bytes) bgp 65299 63846 0 2 10864400 local 3 0 0 480 connected 2 1 0 480 static 1 0 0 160 dagr 0 0 0 0 Total 63852 1 2 10865520 VRF: **eint Route Source Routes Backup Deleted Memory(bytes) local 1 0 0 160 connected 1 0 0 160 Total 2 0 0 320

savolyukdmitry
Level 1
Level 1

the same problem is on several other ASR9001 elements, first they fall off over ssh, then they reboot, the memory is freed, and they become available

add-on: OS XR 32 bit is on them.

who can tell you how to track the indicator of this memory and how to check why it is decreasing?

 %SECURITY-SSHD-3-ERR_ERRNO Error in spawnp Not enough memory

reboot history:

 0x2C000010 Cause: Missed deadline, client: dsc, timeout: 15 Process: wd-critical-mon Traceback: f58e7d58 f58e8f7c f58e93f8 40000878 40001f70 f598941c