Output Error on N9k 10G interface

Arya81 · ‎12-03-2020

Hi to all,

I'm experiencing increments of output errors on a 10G interface:

Ethernet1/1 is up
admin state is up, Dedicated Interface
Hardware: 100/1000/10000/25000 Ethernet, address: 2c4f.525c.ba07 (bia 2c4f.525c.ba08)
Description:xxxxxxxx
Internet Address is xxxxxxxxxxxxxxx
MTU 9150 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 13/255, rxload 7/255
Encapsulation ARPA, medium is broadcast
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 29week(s) 1day(s)
Last clearing of "show interface" counters never
17 interface resets
Load-Interval #1: 30 seconds
    30 seconds input rate 301003400 bits/sec, 55200 packets/sec
    30 seconds output rate 532395640 bits/sec, 53879 packets/sec
    input rate 301.00 Mbps, 55.20 Kpps; output rate 532.40 Mbps, 53.88 Kpps
Load-Interval #2: 5 minute (300 seconds)
    300 seconds input rate 297893384 bits/sec, 56022 packets/sec
    300 seconds output rate 424997712 bits/sec, 53892 packets/sec
    input rate 297.89 Mbps, 56.02 Kpps; output rate 425.00 Mbps, 53.89 Kpps
Load-Interval #3: 5 seconds
    5 seconds input rate 367465200 bits/sec, 62072 packets/sec
    5 seconds output rate 632914320 bits/sec, 65310 packets/sec
    input rate 367.46 Mbps, 62.07 Kpps; output rate 632.91 Mbps, 65.31 Kpps
RX
    1834246582655 unicast packets 283418861 multicast packets 45 broadcast packets
    1834530417310 input packets 1626688888225625 bytes
    659091065800 jumbo packets 0 storm suppression bytes
    0 runts 0 giants 35 CRC 0 no buffer
    35 input error 0 short frame 0 overrun   0 underrun 0 ignored
    0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
    0 input with dribble 0 input discard
    0 Rx pause
TX
    2051751641869 unicast packets 7338181449 multicast packets 41 broadcast packets
    2059093817881 output packets 2210037874377512 bytes
    1057043746858 jumbo packets
    3466622 output error 0 collision 0 deferred 0 late collision
    0 lost carrier 0 no carrier 0 babble 302895 output discard
    0 Tx pause

sh ver

Cisco Nexus Operating System (NX-OS) Software

Software
BIOS: version 07.65
NXOS: version 7.0(3)I7(6)
BIOS compile time: 09/04/2018
NXOS image file is: bootflash:///nxos.7.0.3.I7.6.bin
NXOS compile time: 3/5/2019 13:00:00 [03/05/2019 23:04:55]

Hardware
cisco Nexus9000 93180YC-EX chassis
Intel(R) Xeon(R) CPU @ 1.80GHz with 24633600 kB of memory.
Processor Board ID FDO23190AP9

What might be the cause behind this behaviour?

Many thanks

Christopher Hart · ‎12-03-2020

Hello!

Nexus 9000 series switches utilize cut-through switching by default. This means that if a malformed frame enters the switch, the switch is unable to validate that the FCS field of the Ethernet frame is valid through a CRC check before portions of the frame are forwarded out of an egress interface. As a result, cut-through switches generally will increment two counters when they receive a malformed frame:

The input errors and/or CRC error counter on the ingress interface
The output errors counter on the egress interface

In your scenario, you've provided output from Ethernet1/1 showing a non-zero (and supposedly incrementing) output errors counter. Next, we need to validate whether there are any non-zero input errors counters on any other interfaces. In NX-OS, the easiest command to identify these interface is show interface counters errors non-zero - can you provide the output of this command?

As a side note, a detailed explanation of how Nexus 9000 series switches with the Cloud Scale ASIC (which is what the Nexus 93180YC-EX has) perform cut-through switching and react to CRCs can be found in the Nexus 9000 Cloud Scale ASIC CRC Identification & Tracing Procedure document.

I hope this helps - thank you!

-Christopher

Arya81 · ‎12-03-2020

Hi Christopher,

following the "show" you suggested:

show interface counters errors non-zero

--------------------------------------------------------------------------------
Port          Align-Err    FCS-Err   Xmit-Err    Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/1                0         35    3466730         35          0      302895
Eth1/10               0       1196          0       1196          0           0
Eth1/14               0          1          0          1          0           0
Eth1/47               0          9          0          9          0           0
Eth1/48               0         23          0         23          0           0
Eth1/53               0        258          0        258          0           0
Eth1/54               0      66817          0      66817          0           0
Po1                   0         32          0         32          0           0
Po501                 0          1          0          1          0           0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth1/1 0 -- 0 3466730 0 0

Many thanks for your feedback!

Christopher Hart · ‎12-09-2020

Hello!

Based upon this output, it appears that we have non-zero CRC error counters on a handful of interfaces:

Ethernet1/1
Ethernet1/10
Ethernet1/14
Ethernet1/47
Ethernet1/48
Ethernet1/53
Ethernet1/54

The total number of CRC errors does not add up to the total number of output errors on Ethernet1/1 - however, it's possible that counters were cleared on some interfaces in the recent past. To confirm that these CRC errors are incrementing and directly correlated with the output errors on Ethernet1/1, we need to observe non-zero counters multiple times within a short period of time.

To do this, can you provide the output of the below commands?

terminal width 511 ; terminal length 0 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero

Thank you!

-Christopher

daniellima · ‎08-30-2022

Hi Christopher,

I hope you are fine.

My Nexus 93180-YC-FX is incrementing output error on port channel and 10G interface.
follow the attached file.

Please, could you check.
Malformed frame is the problem cause in my nexus?

Thank you

f00z · ‎12-09-2020

Eh, this is usually a bad transceiver, or something like using an unsupported DAC length. seeing xmit mac error without any other related errors in my experience is usually that. The switch sees the error in the transceiver but the frame is already out of the switch so it can't drop it at that point. The other side of the link should be incrementing FCS or CRC errors if you have access to see it.

f00z · ‎12-19-2020

Oh I forgot to mention, if the switch is in cut-through switching mode, input errors from other interfaces which egress out the erroring interface will incremement this counter. To debug this, change the forwarding mode to store and forward and see if it still increments that counter. Doing this will also give you more insight on which interfaces are having CRC errors on ingress. It might not be the ideal solution depending on your traffic environment but it's a way to diagnose. I noticed the counters are not always correct in cut-through mode , possibly software bugs. Maybe there is a way to see real framing errors even in cut through mode with the new telemetry code but I haven't seen any.

The issue is, the CRC/FCS is at the end of the ethernet frame, while the header is at the beginning, and cut-through only looks at the header. Then the egress port has to re-calculate the FCS before it transmits so it can append it to the end of the frame, this is where it will notice something is wrong and increment the counter you are seeing. There are some mechanisms to 'stomp' crc propagation where it purposely mangles the FCS on egress so the next switch it hits drops it. The only downside of cut through is that your NMS system will report errors on every upstream interface until it is dropped, and it's hard to track where it comes from (That's why the stomp mechanism was put in place).

So, TLDR: 99% of the time it's either transceiver error, or corrupt frames getting passed from other ports which are doing cut-through