on 09-27-2011 06:15 AM
One of the frequently asked questions is how to monitor CPU utilization on RP, RSP, PRP and Line Cards on IOS-XR based devices using SNMP tools, like MRTG.
Few easy steps described below will help to understand which OIDs have to be used for polling and how differentiate RP, RSP and Line Cards on different platforms.
All examples below, taken from IOS-XR based devices, i.e. CRS, XR12000 and ASR9000 running XR release 4.0.1 with SNMPv2.
Step 1.
snmpwalk for the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.2 ) for the object "cpmCPUTotalPhysicalIndex" gives the PhysicalIndex mapping of cards
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2 = INTEGER: 2359704
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.18 = INTEGER: 10154515
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.34 = INTEGER: 33511382
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.50 = INTEGER: 48351593
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.514 = INTEGER: 24635790
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.530 = INTEGER: 38114433
RP/0/RP0/CPU0:CRS#sh platform
Node Type PLIM State Config State
------------- ----------------- ---------------- --------------- -----------------------------------------------
0/0/CPU0 MSC 4OC192-POS/DPT IOS XR RUN PWR,NSHUT,MON
0/1/CPU0 MSC 8-10GbE IOS XR RUN PWR,NSHUT,MON
0/2/CPU0 MSC Jacket Card IOS XR RUN PWR,NSHUT,MON
0/3/CPU0 MSC-140G 14-10GbE IOS XR RUN PWR,NSHUT,MON
0/RP0/CPU0 RP(Active) N/A IOS XR RUN PWR,NSHUT,MON
0/RP1/CPU0 RP(Standby) N/A IOS XR RUN PWR,NSHUT,MON
Step 2.
It is possible now to figure out which card is what by polling OID (1.3.6.1.2.1.47.1.1.1.1.7) for object "entPhysicalName" using the
values received in step 1
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.2359704
SNMPv2-SMI::mib-2.47.1.1.1.1.7.2359704 = STRING: "0/0/* - cpu"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.10154515
SNMPv2-SMI::mib-2.47.1.1.1.1.7.10154515 = STRING: "0/1/* - cpu"
NMS2% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.33511382
SNMPv2-SMI::mib-2.47.1.1.1.1.7.33511382 = STRING: "0/2/* - cpu"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.48351593
SNMPv2-SMI::mib-2.47.1.1.1.1.7.48351593 = STRING: "0/3/* - cpu"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.24635790
SNMPv2-SMI::mib-2.47.1.1.1.1.7.24635790 = STRING: "0/RP0/* - host"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.38114433
SNMPv2-SMI::mib-2.47.1.1.1.1.7.38114433 = STRING: "0/RP1/* - host"
So, according to the given example we can identify each RP and/or Line Card and given PhysicalIndex.
Step 3
snmpwalk for the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.7) for the object "cpmCPUTotal1minRev" gives the
CPU utilization percent for one minute for the index above and if, for example, we are talking about RP0 and RP1
we should look at the indexes 514 and 530
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.514
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.514 = Gauge32: 2
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.530
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.530 = Gauge32: 1
Corresponding data from the router:
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp0/cpu0
CPU utilization for one minute: 2%; five minutes: 3%; fifteen minutes: 3%
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp1/cpu0
CPU utilization for one minute: 1%; five minutes: 1%; fifteen minutes: 2%
For other line cards:
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.2
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 3
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.18
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.18 = Gauge32: 3
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.34
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.34 = Gauge32: 5
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.50
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.50 = Gauge32: 2
Corresponding data from the router:
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/0/cpu0
CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/1/cpu0
CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/2/cpu0
CPU utilization for one minute: 5%; five minutes: 4%; fifteen minutes: 4%
RP/0/RP0/CPU0:CRSproc cpu loc 0/3/cpu0
CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 2%
Step 4.
Polling the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.8) for the object "cpmCPUTotal5minRev" gives the CPU
utilization percent for 5 minute for the index above and, again, if we are talking about RP0 and RP1
we should look at the indexes 514 and 530
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.8.514
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.514 = Gauge32: 2
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.8.530
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.530 = Gauge32: 1
And corresponding data from the router:
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp0/cpu0
CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 2%
RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp1/cpu0
CPU utilization for one minute: 1%; five minutes: 1%; fifteen minutes: 1%
The same approach works for XR12000 routers, as it shown in given example
-Obtaining PhysicalIndex mapping
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.17 = INTEGER: 26932192
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.33 = INTEGER: 16733769
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.49 = INTEGER: 65129206
RP/0/1/CPU0:XR12000#sh platform
Node Type PLIM State Config State
------------------------------------------------------------------------------------------------------------
0/1/CPU0 PRP(Active) N/A IOS XR RUN PWR,NSHUT,MON
0/2/CPU0 L3LC Eng 5+ Jacket Card IOS XR RUN PWR,NSHUT,MON
0/3/CPU0 L3LC Eng 5+ Jacket Card IOS XR RUN PWR,NSHUT,MON
-Verifying which card should be used for polling
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.26932192
SNMPv2-SMI::mib-2.47.1.1.1.1.7.26932192 = STRING: "0/1/CPU0 - host"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.16733769
SNMPv2-SMI::mib-2.47.1.1.1.1.7.16733769 = STRING: "0/2/CPU0 - host"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.65129206
SNMPv2-SMI::mib-2.47.1.1.1.1.7.65129206 = STRING: "0/3/CPU0 - host
-Verifying CPU utilization for one minute (as an example)
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.17
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.17 = Gauge32: 2
Corresponding data from the router from Active PRP, so, without "location" keyword
RP/0/1/CPU0:XR12000#sh proc cpu
CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 1%
And finally, example for ASR9000
-Obtaining PhysicalIndex mapping
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2 = INTEGER: 52690955
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2082 = INTEGER:35271015
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2098 = INTEGER: 8695772
RP/0/RSP0/CPU0:ASR9000#sh platform
Node Type State Config State
----------------------------------------------------------------------------------------------------------
0/RSP0/CPU0 A9K-RSP-4G(Active) IOS XR RUN PWR,NSHUT,MON
0/0/CPU0 A9K-4T-E IOS XR RUN PWR,NSHUT,MON
0/1/CPU0 A9K-40GE-E IOS XR RUN PWR,NSHUT,MON
-Verifying which card should be used for polling
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.52690955
SNMPv2-SMI::mib-2.47.1.1.1.1.7.52690955 = STRING: "module 0/RSP0/CPU0"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.35271015
SNMPv2-SMI::mib-2.47.1.1.1.1.7.35271015 = STRING: "module 0/0/CPU0"
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.8695772
SNMPv2-SMI::mib-2.47.1.1.1.1.7.8695772 = STRING: "module 0/1/CPU0"
-Verifying CPU utilization for one minute (as an example)
NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.2
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 3
Corresponding data from the router:
RP/0/RSP0/CPU0:ASR9000#sh proc cpu
CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%
So, the mentioned OIDs should be used on NMS system for polling IOS-XR based devices to get CPU utilization on different Line Cards and RP, RSP and PRP
Hi Vadim,
Very good topic, help me a lot.
I have another question, how does to do with memory?
Thanks
Same approach can be used to monitor used & free memory of the different CPU. Step 1 : The snmpwalk of OID 1.3.6.1.2.1.47.1.1.1.1.7 gives us the mapping between index and line cards CPU : SNMPv2-SMI::mib-2.47.1.1.1.1.7.16203662 = STRING: "module 0/0/CPU0" SNMPv2-SMI::mib-2.47.1.1.1.1.7.38557239 = STRING: "module 0/RSP0/CPU0" SNMPv2-SMI::mib-2.47.1.1.1.1.7.56744940 = STRING: "module 0/RSP1/CPU0" SNMPv2-SMI::mib-2.47.1.1.1.1.7.59453759 = STRING: "module 0/1/CPU0" SNMPv2-SMI::mib-2.47.1.1.1.1.7.159516845 = STRING: "module 1/RSP1/CPU0" SNMPv2-SMI::mib-2.47.1.1.1.1.7.168504586 = STRING: "module 1/RSP0/CPU0" Step 2 : use below OID to retrieve used & free mem : 1.3.6.1.4.1.9.9.221.1.1.1.1.18 gives the used memory : http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.18&translate=Translate&submitValue=SUBMIT&submitClicked=true 1.3.6.1.4.1.9.9.221.1.1.1.1.20 gives free memory : http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.20&translate=Translate&submitValue=SUBMIT&submitClicked=true Please note ASR9k uses pool type 1 (other) as defined here : http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.2&translate=Translate&submitValue=SUBMIT&submitClicked=true Example for RSP0 in chassis 0 (index = 38557239): SNMPv2-SMI::enterprises.9.9.221.1.1.1.1.18.38557239.1 = Counter64: 1439284872 SNMPv2-SMI::enterprises.9.9.221.1.1.1.1.20.38557239.1 = Counter64: 4734349312 When looking at CLI output, we see : RP/1/RSP0/CPU0:ASR9010#sh memory location 0/RSP0/CPU0 Mon Jun 23 21:51:14.298 SGT node: node0_RSP0_CPU0 ------------------------------------------------------------------ Physical Memory: 6144M total Application Memory : 5887M (4515M available) Image: 63M (bootram: 63M) ... We see it roughly matches : Used mem : 5887 + 63 - 4515 = 1435 MB (1.43GB) Free mem : 6144 - 1435 = 4709 MB (4.7GB)
hi Vadim
thanks for your sharing, i have a question
from oid navigator
the cpu utilization oid enterprises.9.9.109.1.1.1.1.7 belongs to mib CISCO-PROCESS-MIB , but i cannot find this mib from the a9k supported list
ftp://ftp.cisco.com/pub/mibs/supportlists/asr9000/asr9000-supportlist.html#_IOS_XR_5.1.1
where is it ?
Hi Nan,
as far as i see it works
RP/0/RSP0/CPU0:kino#sh ip int brie | i Mg
MgmtEth0/RSP0/CPU0/0 10.48.32.19 Up Up
RP/0/RSP0/CPU0:kino#sh inst ac sum
Default Profile:
SDRs:
Owner
Active Packages:
disk0:asr9k-mpls-px-5.2.2
disk0:asr9k-mini-px-5.2.2
disk0:asr9k-mgbl-px-5.2.2
disk0:asr9k-mcast-px-5.2.2
disk0:asr9k-k9sec-px-5.2.2
disk0:asr9k-fpd-px-5.2.2
RP/0/RSP0/CPU0:kino#sh platform
Node Type State Config State
-----------------------------------------------------------------------------
0/RSP0/CPU0 ASR9001-RP(Active) IOS XR RUN PWR,NSHUT,MON
0/0/CPU0 ASR9001-LC IOS XR RUN PWR,NSHUT,MON
0/0/0 A9K-MPA-20X1GE OK PWR,NSHUT,MON
RP/0/RSP0/CPU0:kino#sh proc cpu
CPU utilization for one minute: 1%; five minutes: 2%; fifteen minutes: 2%
VZHOVTAN-M-16M9:$ snmpwalk -v2c -c cisco 10.48.32.19 1.3.6.1.4.1.9.9.109.1.1.1.1.7
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 1
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2082 = Gauge32: 2 <<<
ASR-9000 supports cpmCPUTotalTable defined in CISCO-PROCESS-MIB.
cpmCPUTotalIndex .1.3.6.1.4.1.9.9.109.1.1.1.1.1
cpmCPUTotalPhysicalIndex .1.3.6.1.4.1.9.9.109.1.1.1.1.2
cpmCPUTotal1minRev .1.3.6.1.4.1.9.9.109.1.1.1.1.7
cpmCPUTotal5minRev .1.3.6.1.4.1.9.9.109.1.1.1.1.8
I checked the following example and it works
snmpwalk -c public -v 2c <ip_addr> .1.3.6.1.4.1.9.9.109.1.1.1.1.7
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2 = Gauge32: 1 percent <--------------------
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.18 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2082 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2098 = Gauge32: 1 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2114 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2130 = Gauge32: 0 percent
Now, you need to know what each line card is associated with each index.
You need to check an entity with cpmCPUTotalPhysicalIndex as entPhysicalIndex.
cpmCPUTotalPhysicalIndex .1.3.6.1.4.1.9.9.109.1.1.1.1.2
This will give you;
snmpwalk -c public -v 2c <ip_addr> .1.3.6.1.4.1.9.9.109.1.1.1.1.2
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2 = INTEGER: 52690955 <----------------
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.18 = INTEGER: 26932192
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2082 = INTEGER: 35271015
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2098 = INTEGER: 8695772
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2114 = INTEGER: 36631989
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2130 = INTEGER: 31344434
Now, you look up these indexes with entPhysicalName:
entPhysicalName 1.3.6.1.2.1.47.1.1.1.1.7
snmpwalk -c public -v 2c <ip_addr>
ENTITY-MIB::entPhysicalName.52690955
ENTITY-MIB::entPhysicalName.52690955 = STRING: module 0/RSP0/CPU0
wbr
/vadim
Is it also possible to send out SNMP traps for CPU load and memory utilization? I can not find the commands for this.
Hi Tom,
not directly because CPU and memory utilisation are not events. You can use the "performance-mgmt" feature in IOS XR to trigger a syslog message when the CPU or memory utilisation exceed a certain threshold. Then use EEM/Tcl to use that syslog message as trigger and generate an event_register_snmp_notification.
Aleksandar
Thank you for your quick reply. This sounds as a good alternative however I will need to time to figure out how to implement this as my experience with IOS-XR is still very limited.
If I'm not mistaking normal IOS does support these SNMP traps. Does IOS threat this differently than IOS-XR?
Hi Tom,
last time I checked on IOS, EEM was still required for this purpose because to make a CPU or memory utilisation an event, one needs to set somehow the threshold. Someone wants an notification at 80%, someone at 90%, etc. Difference between IOS and IOS XR is that IOS supports EEM applet, which simplifies the final configuration.
Hope this helps,
Aleksandar
You are probably right, although in my search today I did encounter the following command which appears to set these thresholds:
Router(config)# process cpu threshold type total rising 80 interval 5 falling 20 interval 5
I stand corrected, we have indeed made that connection between threshold configuration in IOS and SNMP trap. In IOS XR the "performance-mgmt thresholds" currently triggers a syslog message, but not an SNMP trap.
Aleksandar
I wanted to add that as part of the XR usability we are tracking CSCut68455
this enhancement will send triggers when the memory consumption of a process is reaching the rLIMIT. This is the mem limit that a process is defined to reach at max for data (not stack or txt).
xander
I have question in for the CRS family for their memory monitoring. Customer asked to exclude image and reserved memory areas from memory monitoring as they are going to be critical always and they get false memory outof range alarms. This is specific to device model ciscoCRS8S (.1.3.6.1.4.1.9.1.643) in question? Can cisco confirm this if its ok to exclude image and reserved areas for CRS devices?
yeah unlike IOS where the memory is not indexed (to the same extent as XR), you don't have to monitor reserved and image memory at all.
these are just static reserves, for both pieces that as you indicated always report critical due to the "high use" which is what we want to see anyway.
if for instance image memory can't hold it, the image itself owuldn't even install.
this is just an artifact of how the process mib was organized and XR provided maybe too much detail on it.
you probably just want to look at available mem, and here also it is not necessarily bad seeing this going above x % like IOS, since XR is a unix based OS, (like mac OSX), it can hold memory, free it, but keep it ready to use again if it needs to, however the freed memory, while still showing "used" or unavailable is usable for another process to grab it if needs it.
I recognize and realize the memory management could be a bit more simplified, problem is that the process mib follows a monolithic approach hence the detail one can provide is limited.
let me see what to do.
xander
Hello,
Worked perfectly on both ASR9001 and ASR9010, just had to play with templates. Thank you very much!
hi Farid, great thanks for letting us know. If you can and want to, maybe you can share your template here on the forums for others to leverage if thats possible. It seems a topic of much discussion hence :)
cheers!
xander
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: