Troubleshooting high CPU under interrupts on 7600 and 6500 boxes using "debug netdr" tool

Dejan Puhar · ‎11-24-2010

When dealing with high CPU on the box in case that the CPU cycles are spent on processing the interrupts there is always a question what kind of packets are sent to the CPU in order to be looked at. In this case we need a way to look into these packets to see what is the reason for sending them to the CPU to be processed.

On these platforms there is a great tool that can be used in troubleshooting the high CPU under interrupts, it is “debug netdr”.

This tool can be used even on very busy box, with 100% CPU if the access is possible.

This tool will capture the packets going to and from CPU in the circular buffer (which can store 4K packets) and it will not cause any additional overhead to the system.

There are too many posible reasons for the high CPU under interupts so here I will just give you a short example on how this toll can be used and what kind of informations we can get from it.

First of all once we determent that the CPU is high under interrupts we can check the output of “show ibc”, this will give us the inbound statistics. The main thing that we can see here is the number of packets going to and leaving the CPU.

Example:

7600#show ibc

Interface information:

Interface IBC0/0(idb 0x1CC84028)

5 minute rx rate 14000 bits/sec, 21 packets/sec

5 minute tx rate 91000 bits/sec, 82 packets/sec

…..

From here we can get an information if there are too many packets coming in but not so many leaving the box, or if there is some ration 1:1 or 1:2… this can give us some indication what is happening. If ration is close to 1:1 most likely those are packets that needs to be processed switched for some reason. If the ration is 1:2 then maybe we are doing some fragmentation and so on.

Next step is to start the capture with “debug netdr capture”

Just type in “debug netdr capture” and wait for 1 or 2 sec and you can stop the capture with “undebug netdr capture”

To display the captured packets use “show netdr captured-packets”

Example:

Once we capture the packets we will see something like this

------- dump of incoming inband packet -------

interface Po10, routine mistral_process_rx_packet_inlin

dbus info: src_vlan 0x12C(300), src_indx 0x340(832), len 0x42(66)

bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x7F07(32519)

C8000400 012C0000 03400000 42080000 00064532 16F259CB 86820000 7F072000

mistral hdr: req_token 0x0(0), src_index 0x340(832), rx_offset 0x76(118)

requeue 0, obl_pkt 0, vlan 0x12C(300)

destmac 00.00.0C.07.AC.C8, srcmac 00.22.90.5F.A4.C0, protocol 0800

protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 48, identifier 5229

df 1, mf 0, fo 0, ttl 51, src 88.88.88.88, dst 77.77.77.77

tcp src 56662, dst 2525, seq 3178192631, ack 0, win 16384 off 7 checksum 0x327 syn

In this output there are couple of things that we can look at.

First we see if this is incoming or outgoing packet based on the line

------- dump of incoming inband packet -------

Then we can see source and destination address, this will give us information on what flow is going to the CPU and then we can check the routing for the destination in order to see if all the routing information’s are there.

We can also see what is the incoming interface and VLAN for these packets then we can see if the incoming interface have some configuration on it that would cause the packets to be processed switched.

We can see if the flood bit is set and if this is the reason for having the packets send to the CPU.

Then we can check ttl value to see if the packets are punted to the CPU due to TTL=1 value. If we have too many packets with TTl=1 punted to the CPU then we need to see what is the reason for this but we can protect the device by putting the rate limiter for this kind of packets. We can configure this limiter with “mls rate-limit all ttl-failure 100 10” chose the values that you think are appropriate for your network.

In case that this is the MPLS packet we will have the payload showed in the output as well, then you need to decode the hex values to see how many labels are there, then we can check the label forwarding table to see if all forwarding informations are present on the box. We can also see the TTL value of those packets.

These are only few general things that we can look into but this command in general is an excelent starting point for troubleshooting high CPU under interupts on 7600 and 6500 platform.

Tip1:

Use the options under debug netdr command to limit the scope of captured packets, it will make it easier to analyze.

lan-7600-1#debug netdr capture ?

acl (11) Capture packets matching an acl

and-filter (3) Apply filters in an and function: all must match

continuous (1) Capture packets continuously: cyclic overwrite

destination-ip-address (10) Capture all packets matching ip dst address

dstindex (7) Capture all packets matching destination index

ethertype (8) Capture all packets matching ethertype

interface (4) Capture packets related to this interface

or-filter (3) Apply filters in an or function: only one must match

rx (2) Capture incoming packets only

source-ip-address (9) Capture all packets matching ip src address

srcindex (6) Capture all packets matching source index

tx (2) Capture outgoing packets only

vlan (5) Capture packets matching this vlan number

<cr>

Tip2:

When capture is complete use the “pipe” in order to filter the output based on various informations that you need

For instance

Show netdr captu | i ttl

This will give you only the line with ttl and src and dest addresses. In this way you can quickly check if the same flow is going to the CPU or there is a variety of the flows.

Show netdr capture | i interface

With this way you can see if all packets are coming from the same interface

Tip3:

The packets are captured in circular buffer so we will have only the latest 4K packets. If we have the CPU high under interrupts we need to run "debug netdr capture" for only 1 second in order to capture valuable information.

Xiang Hong Angela YAN · ‎02-12-2012

excellent , very helpful. Just have quick questions. what is the difference between high cpu utilization caused by IP Input and Interrupts ?

kthned · ‎09-25-2012

I hope below will help...

"There are two type of CPU utilization within IOS, interrupt and process.

CPU utilization caused by a process

Processes switched traffic. This is traffic that is hitting a specific process in order to be forwarded OR processed by the CPU. An example of each would traffic being forwarded via the "IP Input" process OR control-plane traffic hitting the "PIM process".
A process trying to clean up tables/previous actions performed. This can be seen in process such a "CEF Scanner" or "BGP Scanner", which are used to clean/update the CEF and BGP tables.

Interrupt based CPU utilization:
CPU caused by an interrupt is always traffic based. Interrupt switched traffic, is traffic that does not match a specific process, but still needs to be forwarded."

See further detail on following reference :

https://supportforums.cisco.com/message/3741720#3741720

staniulian · ‎09-09-2013

Hello,

I have a question related to Tip3.

You said the packets are captured in a circular way but i believe this is happening when you also provide circular argument.

If you put only "debug netdr capture" the capture will stop automatically after 4096 packets are captured. Correct ?

Best regards,

Iulian

shoikhan · ‎03-31-2016

Just so simply driven, awesome.