12-03-2020 06:23 AM
Hi to all,
I'm experiencing increments of output errors on a 10G interface:
Ethernet1/1 is up
admin state is up, Dedicated Interface
Hardware: 100/1000/10000/25000 Ethernet, address: 2c4f.525c.ba07 (bia 2c4f.525c.ba08)
Description:xxxxxxxx
Internet Address is xxxxxxxxxxxxxxx
MTU 9150 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 13/255, rxload 7/255
Encapsulation ARPA, medium is broadcast
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 29week(s) 1day(s)
Last clearing of "show interface" counters never
17 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 301003400 bits/sec, 55200 packets/sec
30 seconds output rate 532395640 bits/sec, 53879 packets/sec
input rate 301.00 Mbps, 55.20 Kpps; output rate 532.40 Mbps, 53.88 Kpps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 297893384 bits/sec, 56022 packets/sec
300 seconds output rate 424997712 bits/sec, 53892 packets/sec
input rate 297.89 Mbps, 56.02 Kpps; output rate 425.00 Mbps, 53.89 Kpps
Load-Interval #3: 5 seconds
5 seconds input rate 367465200 bits/sec, 62072 packets/sec
5 seconds output rate 632914320 bits/sec, 65310 packets/sec
input rate 367.46 Mbps, 62.07 Kpps; output rate 632.91 Mbps, 65.31 Kpps
RX
1834246582655 unicast packets 283418861 multicast packets 45 broadcast packets
1834530417310 input packets 1626688888225625 bytes
659091065800 jumbo packets 0 storm suppression bytes
0 runts 0 giants 35 CRC 0 no buffer
35 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
2051751641869 unicast packets 7338181449 multicast packets 41 broadcast packets
2059093817881 output packets 2210037874377512 bytes
1057043746858 jumbo packets
3466622 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 302895 output discard
0 Tx pause
sh ver
Cisco Nexus Operating System (NX-OS) Software
Software
BIOS: version 07.65
NXOS: version 7.0(3)I7(6)
BIOS compile time: 09/04/2018
NXOS image file is: bootflash:///nxos.7.0.3.I7.6.bin
NXOS compile time: 3/5/2019 13:00:00 [03/05/2019 23:04:55]
Hardware
cisco Nexus9000 93180YC-EX chassis
Intel(R) Xeon(R) CPU @ 1.80GHz with 24633600 kB of memory.
Processor Board ID FDO23190AP9
What might be the cause behind this behaviour?
Many thanks
12-03-2020 08:31 AM
Hello!
Nexus 9000 series switches utilize cut-through switching by default. This means that if a malformed frame enters the switch, the switch is unable to validate that the FCS field of the Ethernet frame is valid through a CRC check before portions of the frame are forwarded out of an egress interface. As a result, cut-through switches generally will increment two counters when they receive a malformed frame:
In your scenario, you've provided output from Ethernet1/1 showing a non-zero (and supposedly incrementing) output errors counter. Next, we need to validate whether there are any non-zero input errors counters on any other interfaces. In NX-OS, the easiest command to identify these interface is show interface counters errors non-zero - can you provide the output of this command?
As a side note, a detailed explanation of how Nexus 9000 series switches with the Cloud Scale ASIC (which is what the Nexus 93180YC-EX has) perform cut-through switching and react to CRCs can be found in the Nexus 9000 Cloud Scale ASIC CRC Identification & Tracing Procedure document.
I hope this helps - thank you!
-Christopher
12-03-2020 09:41 AM
Hi Christopher,
following the "show" you suggested:
show interface counters errors non-zero
--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/1 0 35 3466730 35 0 302895
Eth1/10 0 1196 0 1196 0 0
Eth1/14 0 1 0 1 0 0
Eth1/47 0 9 0 9 0 0
Eth1/48 0 23 0 23 0 0
Eth1/53 0 258 0 258 0 0
Eth1/54 0 66817 0 66817 0 0
Po1 0 32 0 32 0 0
Po501 0 1 0 1 0 0
--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth1/1 0 -- 0 3466730 0 0
Many thanks for your feedback!
12-09-2020 05:57 AM
Hello!
Based upon this output, it appears that we have non-zero CRC error counters on a handful of interfaces:
The total number of CRC errors does not add up to the total number of output errors on Ethernet1/1 - however, it's possible that counters were cleared on some interfaces in the recent past. To confirm that these CRC errors are incrementing and directly correlated with the output errors on Ethernet1/1, we need to observe non-zero counters multiple times within a short period of time.
To do this, can you provide the output of the below commands?
terminal width 511 ; terminal length 0 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero ; sleep 30 ; show interface counters errors non-zero
Thank you!
-Christopher
08-30-2022 08:17 AM
12-09-2020 04:00 PM
Eh, this is usually a bad transceiver, or something like using an unsupported DAC length. seeing xmit mac error without any other related errors in my experience is usually that. The switch sees the error in the transceiver but the frame is already out of the switch so it can't drop it at that point. The other side of the link should be incrementing FCS or CRC errors if you have access to see it.
12-19-2020 12:00 PM
Oh I forgot to mention, if the switch is in cut-through switching mode, input errors from other interfaces which egress out the erroring interface will incremement this counter. To debug this, change the forwarding mode to store and forward and see if it still increments that counter. Doing this will also give you more insight on which interfaces are having CRC errors on ingress. It might not be the ideal solution depending on your traffic environment but it's a way to diagnose. I noticed the counters are not always correct in cut-through mode , possibly software bugs. Maybe there is a way to see real framing errors even in cut through mode with the new telemetry code but I haven't seen any.
The issue is, the CRC/FCS is at the end of the ethernet frame, while the header is at the beginning, and cut-through only looks at the header. Then the egress port has to re-calculate the FCS before it transmits so it can append it to the end of the frame, this is where it will notice something is wrong and increment the counter you are seeing. There are some mechanisms to 'stomp' crc propagation where it purposely mangles the FCS on egress so the next switch it hits drops it. The only downside of cut through is that your NMS system will report errors on every upstream interface until it is dropped, and it's hard to track where it comes from (That's why the stomp mechanism was put in place).
So, TLDR: 99% of the time it's either transceiver error, or corrupt frames getting passed from other ports which are doing cut-through
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide