Monitoring CPU utilization on IOS-XR based platforms using SNMP tools

Vadim Zhovtanyuk · ‎09-27-2011

One of the frequently asked questions is how to monitor CPU utilization on RP, RSP, PRP and Line Cards on IOS-XR based devices using SNMP tools, like MRTG.

Few easy steps described below will help to understand which OIDs have to be used for polling and how differentiate RP, RSP and Line Cards on different platforms.

All examples below, taken from IOS-XR based devices, i.e. CRS, XR12000 and ASR9000 running XR release 4.0.1 with SNMPv2.

Step 1.

snmpwalk for the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.2 ) for the object "cpmCPUTotalPhysicalIndex" gives the PhysicalIndex mapping of cards

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2 = INTEGER: 2359704

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.18 = INTEGER: 10154515

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.34 = INTEGER: 33511382

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.50 = INTEGER: 48351593

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.514 = INTEGER: 24635790

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.530 = INTEGER: 38114433

RP/0/RP0/CPU0:CRS#sh platform

Node Type PLIM State Config State

------------- ----------------- ---------------- --------------- -----------------------------------------------

0/0/CPU0 MSC 4OC192-POS/DPT IOS XR RUN PWR,NSHUT,MON

0/1/CPU0 MSC 8-10GbE IOS XR RUN PWR,NSHUT,MON

0/2/CPU0 MSC Jacket Card IOS XR RUN PWR,NSHUT,MON

0/3/CPU0 MSC-140G 14-10GbE IOS XR RUN PWR,NSHUT,MON

0/RP0/CPU0 RP(Active) N/A IOS XR RUN PWR,NSHUT,MON

0/RP1/CPU0 RP(Standby) N/A IOS XR RUN PWR,NSHUT,MON

Step 2.

It is possible now to figure out which card is what by polling OID (1.3.6.1.2.1.47.1.1.1.1.7) for object "entPhysicalName" using the

values received in step 1

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.2359704

SNMPv2-SMI::mib-2.47.1.1.1.1.7.2359704 = STRING: "0/0/* - cpu"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.10154515

SNMPv2-SMI::mib-2.47.1.1.1.1.7.10154515 = STRING: "0/1/* - cpu"

NMS2% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.33511382

SNMPv2-SMI::mib-2.47.1.1.1.1.7.33511382 = STRING: "0/2/* - cpu"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.48351593

SNMPv2-SMI::mib-2.47.1.1.1.1.7.48351593 = STRING: "0/3/* - cpu"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.24635790

SNMPv2-SMI::mib-2.47.1.1.1.1.7.24635790 = STRING: "0/RP0/* - host"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.38114433

SNMPv2-SMI::mib-2.47.1.1.1.1.7.38114433 = STRING: "0/RP1/* - host"

So, according to the given example we can identify each RP and/or Line Card and given PhysicalIndex.

Step 3

snmpwalk for the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.7) for the object "cpmCPUTotal1minRev" gives the

CPU utilization percent for one minute for the index above and if, for example, we are talking about RP0 and RP1

we should look at the indexes 514 and 530

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.514

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.514 = Gauge32: 2

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.530

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.530 = Gauge32: 1

Corresponding data from the router:

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp0/cpu0

CPU utilization for one minute: 2%; five minutes: 3%; fifteen minutes: 3%

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp1/cpu0

CPU utilization for one minute: 1%; five minutes: 1%; fifteen minutes: 2%

For other line cards:

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.2

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 3

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.18

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.18 = Gauge32: 3

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.34

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.34 = Gauge32: 5

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.50

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.50 = Gauge32: 2

Corresponding data from the router:

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/0/cpu0

CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/1/cpu0

CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/2/cpu0

CPU utilization for one minute: 5%; five minutes: 4%; fifteen minutes: 4%

RP/0/RP0/CPU0:CRSproc cpu loc 0/3/cpu0

CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 2%

Step 4.

Polling the OID (1.3.6.1.4.1.9.9.109.1.1.1.1.8) for the object "cpmCPUTotal5minRev" gives the CPU

utilization percent for 5 minute for the index above and, again, if we are talking about RP0 and RP1

we should look at the indexes 514 and 530

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.8.514

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.514 = Gauge32: 2

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.8.530

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.530 = Gauge32: 1

And corresponding data from the router:

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp0/cpu0

CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 2%

RP/0/RP0/CPU0:CRS#sh proc cpu loc 0/rp1/cpu0

CPU utilization for one minute: 1%; five minutes: 1%; fifteen minutes: 1%

The same approach works for XR12000 routers, as it shown in given example

-Obtaining PhysicalIndex mapping

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.17 = INTEGER: 26932192

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.33 = INTEGER: 16733769

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.49 = INTEGER: 65129206

RP/0/1/CPU0:XR12000#sh platform

Node Type PLIM State Config State

------------------------------------------------------------------------------------------------------------

0/1/CPU0 PRP(Active) N/A IOS XR RUN PWR,NSHUT,MON

0/2/CPU0 L3LC Eng 5+ Jacket Card IOS XR RUN PWR,NSHUT,MON

0/3/CPU0 L3LC Eng 5+ Jacket Card IOS XR RUN PWR,NSHUT,MON

-Verifying which card should be used for polling

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.26932192

SNMPv2-SMI::mib-2.47.1.1.1.1.7.26932192 = STRING: "0/1/CPU0 - host"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.16733769

SNMPv2-SMI::mib-2.47.1.1.1.1.7.16733769 = STRING: "0/2/CPU0 - host"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.65129206

SNMPv2-SMI::mib-2.47.1.1.1.1.7.65129206 = STRING: "0/3/CPU0 - host

-Verifying CPU utilization for one minute (as an example)

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.17

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.17 = Gauge32: 2

Corresponding data from the router from Active PRP, so, without "location" keyword

RP/0/1/CPU0:XR12000#sh proc cpu

CPU utilization for one minute: 2%; five minutes: 2%; fifteen minutes: 1%

And finally, example for ASR9000

-Obtaining PhysicalIndex mapping

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.2

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2 = INTEGER: 52690955

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2082 = INTEGER:35271015

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.2.2098 = INTEGER: 8695772

RP/0/RSP0/CPU0:ASR9000#sh platform

Node Type State Config State

----------------------------------------------------------------------------------------------------------

0/RSP0/CPU0 A9K-RSP-4G(Active) IOS XR RUN PWR,NSHUT,MON

0/0/CPU0 A9K-4T-E IOS XR RUN PWR,NSHUT,MON

0/1/CPU0 A9K-40GE-E IOS XR RUN PWR,NSHUT,MON

-Verifying which card should be used for polling

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.52690955

SNMPv2-SMI::mib-2.47.1.1.1.1.7.52690955 = STRING: "module 0/RSP0/CPU0"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.35271015

SNMPv2-SMI::mib-2.47.1.1.1.1.7.35271015 = STRING: "module 0/0/CPU0"

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.2.1.47.1.1.1.1.7.8695772

SNMPv2-SMI::mib-2.47.1.1.1.1.7.8695772 = STRING: "module 0/1/CPU0"

-Verifying CPU utilization for one minute (as an example)

NMS% snmpwalk -v2c -c <community_name> <router's IP address> 1.3.6.1.4.1.9.9.109.1.1.1.1.7.2

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 3

Corresponding data from the router:

RP/0/RSP0/CPU0:ASR9000#sh proc cpu

CPU utilization for one minute: 3%; five minutes: 3%; fifteen minutes: 3%

So, the mentioned OIDs should be used on NMS system for polling IOS-XR based devices to get CPU utilization on different Line Cards and RP, RSP and PRP

luiz.polli · ‎02-01-2013

Hi Vadim,

Very good topic, help me a lot.

I have another question, how does to do with memory?

Thanks

Fabrice Ducomble · ‎06-25-2014

Same approach can be used to monitor used & free memory of the different CPU.

Step 1 : The snmpwalk of OID 1.3.6.1.2.1.47.1.1.1.1.7 gives us the mapping 
between index and line cards CPU :

SNMPv2-SMI::mib-2.47.1.1.1.1.7.16203662 = STRING: "module 0/0/CPU0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.38557239 = STRING: "module 0/RSP0/CPU0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.56744940 = STRING: "module 0/RSP1/CPU0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.59453759 = STRING: "module 0/1/CPU0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.159516845 = STRING: "module 1/RSP1/CPU0"
SNMPv2-SMI::mib-2.47.1.1.1.1.7.168504586 = STRING: "module 1/RSP0/CPU0"

Step 2 : use below OID to retrieve used & free mem :

1.3.6.1.4.1.9.9.221.1.1.1.1.18 gives the used memory :

http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.18&translate=Translate&submitValue=SUBMIT&submitClicked=true

1.3.6.1.4.1.9.9.221.1.1.1.1.20 gives free memory :

http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.20&translate=Translate&submitValue=SUBMIT&submitClicked=true

Please note ASR9k uses pool type 1 (other) as defined here :

http://tools.cisco.com/Support/SNMP/do/BrowseOID.do?objectInput=1.3.6.1.4.1.9.9.221.1.1.1.1.2&translate=Translate&submitValue=SUBMIT&submitClicked=true


Example for RSP0 in chassis 0 (index = 38557239):

SNMPv2-SMI::enterprises.9.9.221.1.1.1.1.18.38557239.1 = Counter64: 1439284872

SNMPv2-SMI::enterprises.9.9.221.1.1.1.1.20.38557239.1 = Counter64: 4734349312


When looking at CLI output, we see :

RP/1/RSP0/CPU0:ASR9010#sh memory location 0/RSP0/CPU0
Mon Jun 23 21:51:14.298 SGT

node:      node0_RSP0_CPU0
------------------------------------------------------------------
Physical Memory: 6144M total
  Application Memory : 5887M (4515M available)
  Image: 63M (bootram: 63M)
...

We see it roughly matches :

Used mem : 5887 + 63 - 4515 = 1435 MB (1.43GB)
Free mem : 6144 - 1435 = 4709 MB (4.7GB)

Nan Bai · ‎12-11-2014

hi Vadim

thanks for your sharing, i have a question

from oid navigator

the cpu utilization oid enterprises.9.9.109.1.1.1.1.7 belongs to mib CISCO-PROCESS-MIB , but i cannot find this mib from the a9k supported list

ftp://ftp.cisco.com/pub/mibs/supportlists/asr9000/asr9000-supportlist.html#_IOS_XR_5.1.1

where is it ?

Vadim Zhovtanyuk · ‎12-13-2014

Hi Nan,

as far as i see it works

RP/0/RSP0/CPU0:kino#sh ip int brie | i Mg
MgmtEth0/RSP0/CPU0/0 10.48.32.19 Up Up

RP/0/RSP0/CPU0:kino#sh inst ac sum
Default Profile:
SDRs:
    Owner
Active Packages:
    disk0:asr9k-mpls-px-5.2.2
    disk0:asr9k-mini-px-5.2.2
    disk0:asr9k-mgbl-px-5.2.2
    disk0:asr9k-mcast-px-5.2.2
    disk0:asr9k-k9sec-px-5.2.2
    disk0:asr9k-fpd-px-5.2.2

RP/0/RSP0/CPU0:kino#sh platform
Node            Type                      State            Config State
-----------------------------------------------------------------------------
0/RSP0/CPU0     ASR9001-RP(Active)        IOS XR RUN       PWR,NSHUT,MON
0/0/CPU0        ASR9001-LC                IOS XR RUN       PWR,NSHUT,MON
0/0/0           A9K-MPA-20X1GE            OK               PWR,NSHUT,MON

RP/0/RSP0/CPU0:kino#sh proc cpu
CPU utilization for one minute: 1%; five minutes: 2%; fifteen minutes: 2%

VZHOVTAN-M-16M9:$ snmpwalk -v2c -c cisco 10.48.32.19 1.3.6.1.4.1.9.9.109.1.1.1.1.7
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2 = Gauge32: 1
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.7.2082 = Gauge32: 2 <<<

ASR-9000 supports cpmCPUTotalTable defined in CISCO-PROCESS-MIB.

cpmCPUTotalIndex          .1.3.6.1.4.1.9.9.109.1.1.1.1.1
cpmCPUTotalPhysicalIndex .1.3.6.1.4.1.9.9.109.1.1.1.1.2
cpmCPUTotal1minRev        .1.3.6.1.4.1.9.9.109.1.1.1.1.7
cpmCPUTotal5minRev        .1.3.6.1.4.1.9.9.109.1.1.1.1.8

I checked the following example and it works

snmpwalk -c public -v 2c <ip_addr> .1.3.6.1.4.1.9.9.109.1.1.1.1.7
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2 = Gauge32: 1 percent <--------------------
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.18 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2082 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2098 = Gauge32: 1 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2114 = Gauge32: 0 percent
CISCO-PROCESS-MIB::cpmCPUTotal1minRev.2130 = Gauge32: 0 percent

Now, you need to know what each line card is associated with each index.
You need to check an entity with cpmCPUTotalPhysicalIndex as entPhysicalIndex.

cpmCPUTotalPhysicalIndex .1.3.6.1.4.1.9.9.109.1.1.1.1.2

This will give you;
snmpwalk -c public -v 2c <ip_addr> .1.3.6.1.4.1.9.9.109.1.1.1.1.2
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2 = INTEGER: 52690955 <----------------
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.18 = INTEGER: 26932192
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2082 = INTEGER: 35271015
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2098 = INTEGER: 8695772
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2114 = INTEGER: 36631989
CISCO-PROCESS-MIB::cpmCPUTotalPhysicalIndex.2130 = INTEGER: 31344434

Now, you look up these indexes with entPhysicalName:

entPhysicalName 1.3.6.1.2.1.47.1.1.1.1.7

snmpwalk -c public -v 2c <ip_addr>
ENTITY-MIB::entPhysicalName.52690955
ENTITY-MIB::entPhysicalName.52690955 = STRING: module 0/RSP0/CPU0

wbr

/vadim

Tom Marcoen · ‎06-05-2015

Is it also possible to send out SNMP traps for CPU load and memory utilization? I can not find the commands for this.

Aleksandar Vidakovic · ‎06-05-2015

Hi Tom,

not directly because CPU and memory utilisation are not events. You can use the "performance-mgmt" feature in IOS XR to trigger a syslog message when the CPU or memory utilisation exceed a certain threshold. Then use EEM/Tcl to use that syslog message as trigger and generate an event_register_snmp_notification.

Aleksandar

Tom Marcoen · ‎06-05-2015

Thank you for your quick reply. This sounds as a good alternative however I will need to time to figure out how to implement this as my experience with IOS-XR is still very limited.

If I'm not mistaking normal IOS does support these SNMP traps. Does IOS threat this differently than IOS-XR?

Aleksandar Vidakovic · ‎06-05-2015

Hi Tom,

last time I checked on IOS, EEM was still required for this purpose because to make a CPU or memory utilisation an event, one needs to set somehow the threshold. Someone wants an notification at 80%, someone at 90%, etc. Difference between IOS and IOS XR is that IOS supports EEM applet, which simplifies the final configuration.

Hope this helps,

Aleksandar

Tom Marcoen · ‎06-05-2015

You are probably right, although in my search today I did encounter the following command which appears to set these thresholds:

Router(config)# process cpu threshold type total rising 80 interval 5 falling 20 interval 5

Source: http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/bsm/configuration/15-mt/bsm-15-mt-book/bsm-cpu-thresh-notif.html

Aleksandar Vidakovic · ‎06-05-2015

I stand corrected, we have indeed made that connection between threshold configuration in IOS and SNMP trap. In IOS XR the "performance-mgmt thresholds" currently triggers a syslog message, but not an SNMP trap.

Aleksandar

xthuijs · ‎06-05-2015

I wanted to add that as part of the XR usability we are tracking CSCut68455

this enhancement will send triggers when the memory consumption of a process is reaching the rLIMIT. This is the mem limit that a process is defined to reach at max for data (not stack or txt).

xander

nmccontent · ‎11-16-2015

I have question in for the CRS family for their memory monitoring. Customer asked to exclude image and reserved memory areas from memory monitoring as they are going to be critical always and they get false memory outof range alarms. This is specific to device model ciscoCRS8S (.1.3.6.1.4.1.9.1.643) in question? Can cisco confirm this if its ok to exclude image and reserved areas for CRS devices?

xthuijs · ‎11-16-2015

yeah unlike IOS where the memory is not indexed (to the same extent as XR), you don't have to monitor reserved and image memory at all.

these are just static reserves, for both pieces that as you indicated always report critical due to the "high use" which is what we want to see anyway.

if for instance image memory can't hold it, the image itself owuldn't even install.

this is just an artifact of how the process mib was organized and XR provided maybe too much detail on it.

you probably just want to look at available mem, and here also it is not necessarily bad seeing this going above x % like IOS, since XR is a unix based OS, (like mac OSX), it can hold memory, free it, but keep it ready to use again if it needs to, however the freed memory, while still showing "used" or unavailable is usable for another process to grab it if needs it.

I recognize and realize the memory management could be a bit more simplified, problem is that the process mib follows a monolithic approach hence the detail one can provide is limited.

let me see what to do.

xander

Farid Akhundov · ‎02-27-2016

Hello,

Worked perfectly on both ASR9001 and ASR9010, just had to play with templates. Thank you very much!

xthuijs · ‎02-28-2016

hi Farid, great thanks for letting us know. If you can and want to, maybe you can share your template here on the forums for others to leverage if thats possible. It seems a topic of much discussion hence :)

cheers!

xander