cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
32334
Views
5
Helpful
2
Comments
xthuijs
Cisco Employee
Cisco Employee

 

Introduction

After reading the packet troubleshooting guide for the ASR9000 there still may be some questions left open as to what the precise packet is that was subject to that drop counter.

Until today we didn't ahve a good capability to capture packets outside the PARSE stage of the NPU (and then via heavy engineering commands).

With the Packet Capture capability that will come in XR 4.3.1 (spring of 2013) you will be able to capture packets for a variety of counters.

 

 

At a glance

What is this all about, in a quick few words, the key things you need to know to get started using the packet capture capability.

 

What is it

It allows you to capture packets that are subject to a particular counter (not necessarily a drop counter) when the packet traverses the various stages of processing inside the NPU.

How does it work

 

How to start

Once the packet is determined to be matching a counter set as a trap,  the packet is sent to the CPU for formatting and displaying.

The output will be in a hexadecimal format, but this is easy to  convert into a wireshark format for evaluation and printing (to be  discussed later)

 

Note that a captured packet will be DROPPED!

That means that if you would trigger on a counter such as PARSE_FABRIC_RECEIVE, which means packets received from the fabric for processing in the egress direction, these packets potentially could have been forwarded. Because of the capture, we are diverting the packet from the normal forwarding path for display and we are not reinjecting the packet back.

So be careful when selecting a particular counter to trigger on.

When triggering on a true drop counter, this is obviously not an issue.

How to use

With one simple command you can enable the packet capture:

 

RSP# monitor np counter <COUNTER_NAME> <NPU> count <N>

 

We'll discuss the precise COUNTER_NAME and NPU values separately.

Note: In some XR releases the NP reset after the execution of "monitor np counter" is optional. We strongly recommend to always select the reset option after running the monitoring.

Limitations

 

While this packet capture is GREAT and something we all have been waiting for for a long time, you need to be aware of the following limitations:

 

1) After the captures have been made upto the number of packets specified (N), or when you exit the capture mode, the NPU needs to be reset. This is a simple internal reset operation of the NPU to free used resources for the capture but during this fast reset operation the NPU will not be forwarding. You should expect a forwarding loss of about 50 msec.

This is regardless of whether packets have been captured or not, everytime you quit or exit the capture mode, this reset is IMMINENT and you'll be warned before starting the capture that this is going to happen.

 

2) Nothing in life is for free and neither is this. When you capture a large number of packets on a counter that is very active, the CPU will be more busy then normal. Recommend NOT using this via console, but only via TELNET or sync connections.

 

3) When using this facility, make sure you exit the capture facility properly before closing your telnet connection. If you have an exec timeout configured it is recommended to disable that while the capture facility is running as a good practice. If your exec dies while a capture is enabled, it will not drop out of the capture mode!

To avoid LC reload)

–Step 1: Issue another “monitor np counter” command then press Ctrl-C quickly to send a kill signal to cause monitor to detach from NP.

–Step 2: Issue a third “monitor np counter” command then press Ctrl-C right away to cause a Fast Reset to clean up.

 

 

4) This feature is for Typhoon based linecards ONLY. There is no plan to support this on Trident linecards.

 

5) Not all NP counters are supported, for instance PUNT counters can't be enabled for capture (but we have SPP/NETIO debugs for those anyway), if a counter is not supported, the CLI will return you a message about it.

 

6) The cli option "noreset" should not be used as this is for internal development use only. Using this option will leave the system in an undefined state with potential leaked buffers and you may have suboptimum performance.

 

7) The maximum number of captured packets is 100. But for slow speed interfaces (eg 1G) you should not have more then 20. (This because the capture buffers are shared packet buffers which are in turn shared by multiple interfaces).

 

Detailed how to step by step guide

 

Step 1 : Complaint of data loss

Determine the interface that is currently experiencing this loss

 

Step 2 : Correlate interface to NPU

From the packet troubleshooting guides you may remember that first you need to link the interface to the NPU via the command:

show controller np ports all location 0/X/cpu0

whereby X is the slot that holds the interface in question.

 

Step 3 : Attempt to identify the counter that is associated with the traffic loss

Knowing the NP that is used for forwarding traffic on this interface, we can view the NP counters with the following command:

show controllers np counters npY location 0/X/CPU0

 

Where X is the same slot ID as before and Y is the NP number that we found via the command in step 2.

 

Step 4 : Capturing the packets associated with a counter

Now that we know the counter that we're interested in found in step 3, we can enable the packet capture facility and capture some of those

packets. For example lets assume that the drop count was DROP_IN_UIDB_DOWN and the associated NPU is np2 on a Linecard located in slot 1.

 

Command to use will be:

 

monitor np counter DROP_IN_UIDB_DOWN np2 loc 0/1/CPU0

 

Step 5 : Confirm the warning

 

Warning: A manditory NP reset will be done after monitor to clean up.

         This will cause ~50ms traffic outage. Links will stay Up.

Proceed y/n [y] > y

 

 

the capture will proceed and the system will respond with a line similar to this below:

 

Monitor DROP_IN_UIDB_DOWN on NP2 ... (Ctrl-C to quit)

 

Step 6 : Wait for a capture

 

From TengigE 0/2/0/1:  48 byte packet

  0000: 00 65 7A 00 00 00 00 70 72 00 00 02 00 B0 00 00

  0010: 80 00 00 00 00 00 0F 40 00 00 00 01 3F 00 00 00

  0020: 00 00 BD 18 65 EF 4A 80 2E 04 09 08 01 03 00 05

  ---

(count: 1 of 1)

NP reset now. Starting fast reset for NP 0

 

 

Because we didn't specify a number of packets to capture, we default to ONE.

Note that after the capture was made, the fast reset is automatically induced!

 

Step 7: Allow the fast reset to happen

Make sure that you allow the fast reset to complete which is very important to release the buffers that were previously in use.

See above for disaster recovery (telnet limitation) in case needed.

Make sure you do NOT use the noreset option which is merely useful for development to prevent a np reset to allow some debugging on NP structures

in this undefined state.

Decoding the capture

 

The format in which the captures are made are very easy to convert via wireshark into a PCAP file

and decode them with the wireshark CLI tool.

We're working on an offline tool to parse the output and display it in a nice format for you and or embed that in

the XR shell.

 

 

 

 

 

Xander Thuijs, CCIE #6775

Principal Engineer ASR9000

Comments
Bryan Garland
Cisco Employee
Cisco Employee

One thing you may want to mention or highlight.  When doing the NP reset to it 50ms loss on all traffic through this NP.  So, that is multiple interfaces and can affect things like BFD, OAM, etc. 

Other than that, great feature and been needed for some time now.  :-)

sfaulkne
Community Member

A defect has been found in this feature. Some bytes of the captured packet are displayed incorrectly. The first 4 bytes of the packet (the MAC-DA), and the 4 bytes starting at offset 0x40 (generally the Destination IP address).

CSCuh41940 has been opened to track and resolve this issue.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links