on 11-24-2010 05:32 AM
When dealing with high CPU on the box in case that the CPU cycles are spent on processing the interrupts there is always a question what kind of packets are sent to the CPU in order to be looked at. In this case we need a way to look into these packets to see what is the reason for sending them to the CPU to be processed.
On these platforms there is a great tool that can be used in troubleshooting the high CPU under interrupts, it is “debug netdr”.
This tool can be used even on very busy box, with 100% CPU if the access is possible.
This tool will capture the packets going to and from CPU in the circular buffer (which can store 4K packets) and it will not cause any additional overhead to the system.
There are too many posible reasons for the high CPU under interupts so here I will just give you a short example on how this toll can be used and what kind of informations we can get from it.
First of all once we determent that the CPU is high under interrupts we can check the output of “show ibc”, this will give us the inbound statistics. The main thing that we can see here is the number of packets going to and leaving the CPU.
Example:
7600#show ibc
Interface information:
Interface IBC0/0(idb 0x1CC84028)
5 minute rx rate 14000 bits/sec, 21 packets/sec
5 minute tx rate 91000 bits/sec, 82 packets/sec
…..
From here we can get an information if there are too many packets coming in but not so many leaving the box, or if there is some ration 1:1 or 1:2… this can give us some indication what is happening. If ration is close to 1:1 most likely those are packets that needs to be processed switched for some reason. If the ration is 1:2 then maybe we are doing some fragmentation and so on.
Next step is to start the capture with “debug netdr capture”
Just type in “debug netdr capture” and wait for 1 or 2 sec and you can stop the capture with “undebug netdr capture”
To display the captured packets use “show netdr captured-packets”
Example:
Once we capture the packets we will see something like this
------- dump of incoming inband packet -------
interface Po10, routine mistral_process_rx_packet_inlin
dbus info: src_vlan 0x12C(300), src_indx 0x340(832), len 0x42(66)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x7F07(32519)
C8000400 012C0000 03400000 42080000 00064532 16F259CB 86820000 7F072000
mistral hdr: req_token 0x0(0), src_index 0x340(832), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0x12C(300)
destmac 00.00.0C.07.AC.C8, srcmac 00.22.90.5F.A4.C0, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 48, identifier 5229
df 1, mf 0, fo 0, ttl 51, src 88.88.88.88, dst 77.77.77.77
tcp src 56662, dst 2525, seq 3178192631, ack 0, win 16384 off 7 checksum 0x327 syn
In this output there are couple of things that we can look at.
First we see if this is incoming or outgoing packet based on the line
------- dump of incoming inband packet -------
Then we can see source and destination address, this will give us information on what flow is going to the CPU and then we can check the routing for the destination in order to see if all the routing information’s are there.
We can also see what is the incoming interface and VLAN for these packets then we can see if the incoming interface have some configuration on it that would cause the packets to be processed switched.
We can see if the flood bit is set and if this is the reason for having the packets send to the CPU.
Then we can check ttl value to see if the packets are punted to the CPU due to TTL=1 value. If we have too many packets with TTl=1 punted to the CPU then we need to see what is the reason for this but we can protect the device by putting the rate limiter for this kind of packets. We can configure this limiter with “mls rate-limit all ttl-failure 100 10” chose the values that you think are appropriate for your network.
In case that this is the MPLS packet we will have the payload showed in the output as well, then you need to decode the hex values to see how many labels are there, then we can check the label forwarding table to see if all forwarding informations are present on the box. We can also see the TTL value of those packets.
These are only few general things that we can look into but this command in general is an excelent starting point for troubleshooting high CPU under interupts on 7600 and 6500 platform.
Tip1:
Use the options under debug netdr command to limit the scope of captured packets, it will make it easier to analyze.
lan-7600-1#debug netdr capture ?
acl (11) Capture packets matching an acl
and-filter (3) Apply filters in an and function: all must match
continuous (1) Capture packets continuously: cyclic overwrite
destination-ip-address (10) Capture all packets matching ip dst address
dstindex (7) Capture all packets matching destination index
ethertype (8) Capture all packets matching ethertype
interface (4) Capture packets related to this interface
or-filter (3) Apply filters in an or function: only one must match
rx (2) Capture incoming packets only
source-ip-address (9) Capture all packets matching ip src address
srcindex (6) Capture all packets matching source index
tx (2) Capture outgoing packets only
vlan (5) Capture packets matching this vlan number
<cr>
Tip2:
When capture is complete use the “pipe” in order to filter the output based on various informations that you need
For instance
Show netdr captu | i ttl
This will give you only the line with ttl and src and dest addresses. In this way you can quickly check if the same flow is going to the CPU or there is a variety of the flows.
Show netdr capture | i interface
With this way you can see if all packets are coming from the same interface
Tip3:
The packets are captured in circular buffer so we will have only the latest 4K packets. If we have the CPU high under interrupts we need to run "debug netdr capture" for only 1 second in order to capture valuable information.
excellent , very helpful. Just have quick questions. what is the difference between high cpu utilization caused by IP Input and Interrupts ?
I hope below will help...
"There are two type of CPU utilization within IOS, interrupt and process.
CPU utilization caused by a process
Interrupt based CPU utilization:
CPU caused by an interrupt is always traffic based. Interrupt switched traffic, is traffic that does not match a specific process, but still needs to be forwarded."
See further detail on following reference :
Hello,
I have a question related to Tip3.
You said the packets are captured in a circular way but i believe this is happening when you also provide circular argument.
If you put only "debug netdr capture" the capture will stop automatically after 4096 packets are captured. Correct ?
Best regards,
Iulian
Just so simply driven, awesome.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: