on 02-08-2016 01:41 PM
Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors. Using memory for communication inside a single program, for example among its multiple threads, is also referred to as shared memory. In computer hardware, shared memory refers to a (typically large) block of random access memory (RAM) that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system. POSIX provides a standardized API for using shared memory.
Shared memory windows are used for storage of dynamic shared data between multiple processes. Shared Memory Window Server (shmwin) is designed to meet the scalability requirements for the protocols (clients). With shmwin, the memory is allocated dynamically by the server that allocates a virtual memory area to map the shared memory files. The server maintains groups which comprise a set of all windows that have atleast one common participant. All windows inside a group would share the same virtual memory space. A window id is assigned for each shared memory window. The server keeps track of the window to group id mapping.
Shared memory window library (libshmwin) provides the malloc and free interfaces to the shared memory.
1. What is the difference between shared memory and cache memory?
Cached represents the size of the Linux page cache, minus the memory in the swap cache.
The memory is the swap cache is represented by SwapCached.
Total page cache size is Cached + SwapCached
Example:
[host:~]$ free -m
total used free shared buffers cached
Mem: 15902 9800 6101 0 269 45
-/+ buffers/cache: 9485 6417
Swap: 0 0 0
2. Free memory decreases continuously on a production system. Is this a memory leak scenario?
No, this is not classified as a memory leak scenario. While on other IOS XR platforms, a decrease in memory over time indicates a slow memory leak, in the case of the NCS6000 platform, this is simply an indication of the cached memory utilization that is increasing. This is as per design and there is no risk of a leak because the memory utilization will not increase further once the buffer limit is reached.
3. Is there a way to verify the cached memory utilization?
There are a number of show commands but a specific OID or CLI is required.
4. Is it possible that the slow memory leak may be due to fragmentation?
In theory, yes. Linux kernel manages and keeps track of memory and does all the house-keeping with it. The bigger concern with this situation is actually memory fragmentation which may be related to system uptime.
When system has been up for a long time and kernel memory has gone through many allocation and de-allocation, it's likely that fragmentation may happen. However, the uptime of the router must be taken into account. Fragmentation happens over several weeks after a long period of malloc/free calls and this is evident from the overall memory utilization to fluctuate (trending upwards).
5. Is there a utility to monitor the per-process shmwin memory limits?
6. How can I figure out the total system memory and the application memory?
Use "show mem" CLI from XR VM.
7. Which command do I use to figure out the heap memory?
Use "malloc_dump" utility from XR VM.
8. Are there any thresholds for system memory utilization?
The default thresholds are as follows:
Sate 3 (Severe): 10% system memory remaining
State 4 (Critical): 5% system memory remaining
Cache will be freed each time a threshold is crossed (Normal (1)->Minor (2)->Severe (3)->Critical (4).
If the system remains in critical state then the watchdog (wdsysmon) will kick in to free the cache.
9. Is this “cache” memory not reclaimable at the 10% threshold?
The shared memory utilization by processes will stop growing further once the processes hit their respective limits. Shared memory is not freed up even if the respective processes are not actively using the memory and only the transient/un-used memory would be freed up. Processes will retain their shared memory for reuse.
10. Explain why there is a consistent free memory decrease, and at what threshold it will be freed, or stabilize?
This trend will stop once the shared memory limit is reached. The show free memory CLI output will not show any further decrease at this point. How soon would the memory limit be reached would depend upon the router operations (e.g. Mcast route joins and leaves etc.). The per-process thresholds are defined in code to prevent low memory conditions. The memory is pre-carved/allocated by the system and what we are observing here is the method by which the shared memory is being managed on this system.
11. What is the total maximum aggregate shared memory on the system?
The shmwin shared memory size is limited to max 2G.
The maximum size for shared memory as whole is not limited. E.g. The regular POSIX shared memory is not limited and applications may use as much as they need.
12. What is the minor threshold?
We should consider the watchdog thresholds only for router health monitoring. Use the standard CLI “show watchdog threshold memory defaults” to verify the available free memory and memory state from the watchdog perspective.
13. What type of linux kernel is used in the NCS6000 platform?
14. How is the linux kernel memory managed on the system?
15. How can I figure out the total memory that is being consumed by a given process?
Process (example):
RP/0/RP0/CPU0:NCS6k#show process memory 2852
JID Text Data Stack Dynamic Process
------ ------ ------ ------ ------------ -------------
187 16MB 376KB 136KB 136MB mpls_lsd
Shared Memory (example):
RP/0/RP0/CPU0:NCS6k#run df -k /dev /dev/shm /var /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
none 6061112 151268 5909844 3% /dev
none 6061112 937536 5123576 16% /dev/shm
none 65536 48 65488 1% /var
none 65536 72 65464 1% /tmp
16. What is the temporary file system?
Part of the physical memory can be allocated as a filesystem partition which can then be used to read/write files to memory similar to the way files are read/writtedn to disk.
Files located in TMPFS are lost if the system is reloaded.
The size of a TMPFS partition can be specified when it is created and it is not possible for the partition to grow larger than the specified limit. TMPFS uses swap if/when required.
There are 4 TMPFS partitions on an NCS6000 system.
17. What is implied by the linux cached memory?
18. What are the resource monitor memory states?
The watchdog memory states are as follows:
NORMAL
MINOR (10% of system memory remaining)
Cached re-claimable memory is flushed
SEVERE (8% of system memory remaining)
Cached re-claimable memory is flushed
Top non-OOM registered processes are killed
Top OOM registered processes are notified allowing them to gracefully shut down
CRITICAL (5% of system memory remaining)
Cached re-claimable memory is flushed
Nice document !
Would you know which MIBs are most valuable for monitoring memory stats. And most importantly, have you tested the integrity of the values returned ?
Thanks in Advance
/Samir
Hi Samir
Thanks for the feedback.
For memory monitoring on the NCS6k, I would suggest that you look at the CISCO-MEMORY-POOL-MIB.
ftp://ftp.cisco.com/pub/mibs/supportlists/ncs6000/ncs6k-supportlist.html
You can also use the show memory compare CLI to understand the memory utilization on the box.
Hi Osman,
That MIB was tested a while ago for devices running 5.2.4, and unfortunately it returns values not matching show commands, hence my original question.
5.2.1 had similar issues with regards to certain MIBs and inventory status reporting.
Regards
/Samir
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: