on 02-18-2014 11:56 AM
Steps to debug Input Packet drops:
Introduction:
In this article, we will discuss the common reasons for input packet drops and how the drops could be captured in general.
This document will be an ongoing effort to understand the input drops & capture and so if after reading this article things are still not clear, please make comments necessary on the article and answers will be provided.
Core Issue:
Why input drops occur?
There are several reasons why a packet drop occurs at the ingress of an interface. There are about 500-800 drop counters depending on the line card type and NP generation. So when we type show drops verbose location <> the user will find all the relevant drops. All the NP Input drops in general can be captured by doing a “show drops verbose location <> | include <counter name>” The counter name can be retrieved by identifying the type of drop at the time it occurred.
Some of the common drop reasons at ingress are:
1) Out of buffer size: There are several buffers within the NP itself that can have overflow issues or underflow issues. If we take the case of a Traffic Manager Ingress and a Egress Queue buffer and if either of those are out of buffers then the TM complains, which might lead to NP issues and result in packet drops.
E.g. show drops verbose location 0/1/CPU0 | include BUF_EXCD
Mon Feb 10 18:32:50.249 EDT
PUNT_DIAGS_RX_BUFF_EXCD 0
PUNT_DIAGS_RX_BUFF_EXCD 0
2) Link issue between PHY & NP: This could be due to some SERDES errors. Could lead to either a hardware or software issue.
e.g. NP-MAC--------------PHY----------------Optics
NP-MAC is the first entry point for data. The following CLI provides MAC level Ingress and egress statistics.
show controller tengig <> stat
Further ingress drops may happen in NP Ucode.
The following CLI may provide more details. Since many ports are mapped to one NP, it may not give per-port ucode drops statistics.
show controller np counters all location <>
3) NP Back pressure: This is typically an over subscription case. E.g. if the ingress LC sending traffic of 30gige to an egress of 10 gig, the fia might backpressure to NP of the ingress and hence we see the ingress drops.
e.g. show drops verbose location <> | include <counter_name>
4) NP Lockup: NP Lockups can occur in all sorts of situations. NP lockup happens when microcode has a bug or a low level resource specific to the NP such as the TM, ICFD, and NP Engine gets blocked. If the HW detects that the NP has stopped forwarding, this scenario can be termed as NP lockup.
In general when the NP lockup happens, it raises a syslog with NP-DIAG in the logs. Soon after this the local ping will fail if the NP is locked up and the LC will automatically reload. Soon after this the local ping will fail if the NP is locked up and the LC will automatically reload that subsequently results in heavy traffic loss.
There is no CLI as such to determine the NP lock up situation except the user the monitor & observe the above described situation.
5) Unrecognized upper-level protocol: This could also be termed as “Input drops other”. For e.g. when a router receives a new LSP, it floods this LSP to its neighbors, except the neighbor that sent the new LSP. On point-to-point links, the neighbors acknowledge the new LSP with a PSNP, which holds the LSP ID, sequence number, checksum, and remaining lifetime. When the acknowledgment PSNP is received from a neighbor, the originating router stops sending the new LSP to that particular neighbor although it may continue to send the new LSP to other neighbors that have not yet acknowledged it.
Symptoms: Layer 3, for e.g. ISIS 64 bytes PSNP packets are counted as "Input drop other", though they are not really dropped.
Conditions: ISIS router with links configured as point-to-point.
Work around: This problem is merely cosmetic, the PSNP's are still processed just reported incorrect due to an accumulation discrepancy in the sw.
6) RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT should not be in total drops: The counter RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT should not be included in the calculation for total drops seen under an interface.
Ex)
RP/0/RSP0/CPU0:BEL1MN#sh int te 0/6/0/0
Tue Jan 31 10:39:39.047 ARG
TenGigE0/6/0/0 is up, line protocol is up
...
Last clearing of "show interface" counters never
30 second input rate 962000 bits/sec, 702 packets/sec
30 second output rate 10000 bits/sec, 19 packets/sec
519481709 packets input, 587232457343 bytes, 19412 total input drops
RP/0/RSP0/CPU0:BEL1MN#sh controller np counters np4 location 0/6/cpu0
Tue Jan 31 10:39:52.210 ARG
Node: 0/6/CPU0:
----------------------------------------------------------------
Show global stats counters for NP4, revision v3
Read 44 non-zero NP counters:
Offset Counter FrameValue Rate
(pps)
----------------------------------------------------------------------------------------------------------------
...
447 RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT 19411 0
There are 19411 drops for RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT and 19412
total drops for the interface.
This was duplicated in the lab and there is a one to one increase in the NP counter and the total drops counter.
This provides confusion to the customer who thinks they have a traffic loss issue and impedes troubleshooting when trying to identify the source of the drops. The steps to identify the issue are as stated below -
0) clear both interface counters and np counters.
1) Show interface xxx, if there is non-zero generic input drop counter then
2) Show controller np ports all location yyy, this will display which NP the interface in question is on
3) Show controller np counters npz location yyy, where z is the NP # shown in step 2
In the output, find RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT, RESOLVE_INGRESS_DROP_CNT, RESOLVE_EGRESS_DROP_CNT
If the generic input drop matches (RESOLVE_VPLS_REFLECTION_FILTER_DROP_CNT - RESOLVE_EGRESS_DROP_CNT)
That means there is ingress drop due to loop, please figure out where the cause of loop.
If the input drop is more than the delta, that means other input drops also exists, try find other DROP counter under the np.
If the delta is 0, that means the input drop is NOT due to the loop condition, try find other DROP counter under the np.
7) l2-tcam Invalid DA Drops
8)Controller drops: runts, FCS, aborts, FIFO overflows, giants
9) Unknown DMAC or dot1q vlan
CLI to capture packet loss.
Before beginning to debug traffic issues, please clear all counters and start afresh.
Clear Interface counters
RP/0/RSP0/CPU0:ROSH06_jetfire#clear counters all
Clear "show interface" counters on all interfaces [confirm]
RP/0/RSP0/CPU0:ROSH06_jetfire#
Clear NP counters
RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller np counters all
Clear Fabric counters
To clear FIA counters on LC and RSP:
RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller fabric fia location
To clear all fabric crossbar counters:
RP/0/RSP0/CPU0:ROSH06_jetfire#clear controller fabric crossbar-counters location
To clear bridge counters on LC
Check all the relevant traffic counters
After clearing counters, start traffic pattern that caused the drop.
Check the counters at input interface: This is first place to check in. Here the user can observe the packet count increment each time the user execute’s this command and identify if a drop is occurring by comparing the input and output packet rate.
RP/0/RSP0/CPU0:ROSH06#show interfaces tenGigE 0/1/0/0
Thu Jan 1 01:10:01.908 UTC
TenGigE0/1/0/0 is up, line protocol is up
Interface state transitions: 1
Hardware is TenGigE, address is 001e.bdfd.1736 (bia 001e.bdfd.1736)
Layer 2 Transport Mode
MTU 1514 bytes, BW 10000000 Kbit
reliability 255/255, txload 0/255, rxload 0/255
Encapsulation ARPA,
Full-duplex, 10000Mb/s, LR, link type is force-up
output flow control is off, input flow control is off
loopback not set,
Maintenance is enabled,
ARP type ARPA, ARP timeout 04:00:00
Last clearing of "show interface" counters never
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 total input drops
0 drops for unrecognized upper-level protocol
Received 0 broadcast packets, 0 multicast packets
0 runts, 0 giants, 0 throttles, 0 parity
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 packets output, 0 bytes, 0 total output drops
Output 0 broadcast packets, 0 multicast packets
0 output errors, 0 underruns, 0 applique, 0 resets
0 output buffer failures, 0 output buffers swapped out.
Check NPU counters
Show controllers NP counters all location
Fields of interest in NPU counters from data path standpoint:
800 PARSE_ENET_RECEIVE_CNT -- Num of packets received from external interface
970 MODIFY_FABRIC_TRANSMIT_CNT -- Num of packets sent to fabric
801 PARSE_FABRIC_RECEIVE_CNT -- Num of packets received from fabric
971 MODIFY_ENET_TRANSMIT_CNT -- Num of packets sent to external interface
The following CLI with grepping ingress display the drop counters of any particular counters the users are interested in i.e. GRE, l2vpn, mpls etc.
The first form as indicated only shows non-zero (counters that have a non-zero value) counters:
show drops np all loc <0/5/CPU0>| inc Ingress
e.g.
Sat Feb 1 14:22:05.158 UTC
Node: 0/5/CPU0:
----------------------------------------------------------------
NP 0 Drops:
----------------------------------------------------------------
MODIFY_PUNT_REASON_MISS_DROP 1
----------------------------------------------------------------
NP 1 Drops:
----------------------------------------------------------------
MODIFY_PUNT_REASON_MISS_DROP 1
show drops np all verbose loc <> | inc INGRESS
show drops verbose | include INGRESS
*******************************************************************
GRE_ING_DECAP_P2_GRE_KEY_PRESENT_DROP 0
GRE_ING_DECAP_P2_GRE_SEQ_PRESENT_DROP 0
GRE_ING_DECAP_P2_GRE_NONZERO_VER_DROP 0
GRE_ING_DECAP_P2_GRE_NONZERO_RSVD0_DROP 0
GRE_ING_DECAP_P2_PROT_UNSUPPORTED 0
GRE_ING_DECAP_P2_NESTED_GRE_DROP 0
GRE_ING_DECAP_P2_CLNS_NO_ISIS_DROP 0
GRE_ING_DECAP_P2_NO_UIDB_DROP 0
**************************************
Commands to monitor packet counts at various places:
Ingress interface
Ex: show interfaces gigabitEthernet 0/6/0/25
show controller gigabitEthernet 0/6/0/25 internal
Ingress NP
Ex: show controllers np counters np1 location 0/6/CPU0
Ingress NP fabric counters
how controllers np fabric-counters tx np1 location 0/6/CPU0
Ingress bridge
Ex: show controllers fabric fia bridge stats location 0/6/CPU0
Ingress FIA
Ex:
show controllers fabric fia stats location 0/6/CPU0
show controllers fabric fia drops ingress location 0/6/CPU0
show controllers fabric fia q-depth location 0/6/CPU0
Ingress crossbar
Ex:
show controllers fabric crossbar statistics instance 0 location 0/6/CPU0
show controllers fabric crossbar statistics instance 1 location 0/6/CPU0
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: