cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4430
Views
10
Helpful
8
Replies

Nexus 5672 output (to fex) errors

from88
Level 4
Level 4

Hello,

We're receving lots of output errors.


As you see it's strange that Xmit-Err (tranceive) and IntMacRx-Er (receive) counters is the same.

 

show interface ethernet 1/38 counters errors
--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/38 0 1 2519847 1 0 0
--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth1/38 0 0 0 0 0 0
--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth1/38 0 -- 0 0 2519847 0
TEONET01B# show interface ethernet 1/38 counters detailed
Ethernet1/38
Rx Packets: 3449696305
Rx Unicast Packets: 3448862564
Rx Multicast Packets: 33897
Rx Broadcast Packets: 799843
Rx Bytes: 3606293562403
Tx Packets: 2507449425
Tx Unicast Packets: 2497426110
Tx Multicast Packets: 1925368
Tx Broadcast Packets: 5578100
Tx Bytes: 1365596235452
Input CRC Errors: 1
Input Errors: 1
Output Errors: 2519847
Output Errors: 2519847


Despite of that, we could think that this is because of some source which is generating that errors and NX-OS is using cut-through switching.
But i can't see any host port, which would generate such an errors.

 

Device:
Nexus 5672UP
version 7.1(4)N1(1)

8 Replies 8

gabbyradu
Level 1
Level 1
hello,
I have the same problem. Did you find the problem/solution?
thx

Peter Paluch
Cisco Employee
Cisco Employee

Hello,

Hmmm. It looks like the errored frames may be coming from the FEX and received by the switch.

Would you be so kind to grab and post the outputs of following commands?

terminal length 0
show fex
show interface fex-fabric
show hardware internal bigsur port e1/38

I would like to have a more in-depth look on the FEX type, and the interface counters on the parent switch as recorded by the ASIC handling the port e1/38. The outputs will be fairly large - it might be useful to attach them here as a separate text file.

Thank you!

Best regards,
Peter

Hi Peter,

I have the same problem but with N9K + FEX single homed.

I attached the output of your suggested commands. I would appreciate if you could take a look and give me your thoughts.

Thx,

 

Hello,

I believe we need to check the FEX counters themselves - I still have doubts whether these are errored frames exiting the N9K or entering the N2K.

I would like to ask you for the outputs from the following commands (ignore any of them that are refused as unknown):

attach fex 103
dbgexec tib
fp
show oper
show sts
show stats all
show stats all all
show new_ints
<Ctrl><C>
exit

This will hopefully allow us to see the counters from the FEX side and understand what's the source of those errors.

Thank you!

Best regards,
Peter

Hi Peter,

 

Thank you for taking interest in my issue.

You will find attached the output from your commands.

 

Best regards,

Gabby

Hi Peter,

 

Thank you for taking interest in my issue.

You will find attached the output from your commands.

 

Best regards,

Gabby

Hello Gabby,

You are very much welcome!

I have had a look on the interface counters on the FEX, and I can see RX CRC errors reported on the FEX uplinks - and all of these RX CRC errors are what we call stomped CRC errors.

Let me explain for a moment: The N9K-C93180YC-EX switch is a cut-through switch, meaning it starts forwarding a frame before it is fully received. If any corruption is detected during the forwarding of this frame, the switch cannot interrupt the forwarding process anymore. Instead, what the switch does is to change the frame's CRC to a special value called the stomped CRC. This special value indicates to the next receiver that the frame was already corrupt when it arrived to the switch in the first place, and did not get corrupted on the link toward that receiver.

There are multiple reasons why the N9K can declare the frame as corrupt, and stomp its CRC, and some of them are:

  • The frame's initial CRC already did not match the computed CRC (meaning the frame was corrupt during arrival)
  • The frame's initial CRC was already stomped (meaning that the corruption occurred somewhere upstream, and the immediately preceding switch is also a cut-through switch performing the stomping)
  • The frame's size exceeds the ingress interface's MTU
  • The frame uses the 802.2 LLC format, and the Length field value does not match the real size of the frame

These errored frames reported as Rx frames with stomped CRC on the FEX are the ones that are causing the output errors shown on your parent N9K switch.

The stomped errors did not increase during your data collection, and so I assume that they only increase relatively infrequently. My suggestion is to carefully check the outputs of show interface counter error and closely check for any input errors - especially input CRC errors, symbol errors, alignment errors, and giants. If such frames were received by your N9K and happened to be forwarded to the FEX, they would account as output errors on the fex-fabric ports, and as stomped CRC errors on the FEX.

The next steps would then depend on the nature of the input errors. If it is related to giants, then either the port's MTU or the attached device's MTU have to be aligned. If it was an input CRC error, symbol error, or an alignment error, you would need to perform a physical layer troubleshooting on that port. Corruption in the Length field of 802.2 LLC frames is impossible to reliably prove without performing a packet capture; however, such frames would also cause the IntMacRx-Er counter on the ingress interface to increase, too. I recall once seeing the IntMacRx-Er counter increase when an Alcatel-Lucent switch connected to this port was sending out frames with the Alcatel-Lucent Mapping Adjacency Protocol (AMAP). These frames originated by the Alcatel switch had an incorrect value put into their Length field, and caused the IntMacRx-Er counter to increase.

Either way, please check the input error counters on all your interfaces. We will take it further based on your findings.

Thank you!

Best regards,
Peter

Hi Peter,

 

I must say that you are awesome :).

Thank you a lot for the detailed explanation. It was more than helpful. 

I checked all the interfaces counters in N9K for input error and jumbo packets and I found that I had a lot of interfaces with MTU 1500 and a lot of RX/TX jumbo packets.

 

Thank you again for your help!

 

Gabby

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco