cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4908
Views
0
Helpful
9
Replies

Output errors on port Nexus 5020

JGFR
Level 1
Level 1

Hello everyone,

 

We have just migrated our 1G CWDM network to 10G DWDM with Cisco Nexus 3000 and 5000.

I notice errors on a channel port made up of two ports that carries about 6Gbps of traffic constantly, I have about 8000 errors per second in output on the channel port and about 4000 on each of the two members.

I increased the MTU from end to end to 9216 using the following Cisco documentation https://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/112080-config-mtu-nexus.html but that doesn't change anything.

The optical signal quality seems to be correct between -15 and -16 in RX power

9 Replies 9

Andrea Testino
Cisco Employee
Cisco Employee

Hi Jordan,

 

What kind of interfaces are you seeing these output errors on? Ethernet interfaces on the parent chassis, or interfaces off of a FEX perhaps?

 

Could you share the following assuming these are interfaces on the parent chassis (the 5020 itself):

 

show hardware internal gatos all-ports | egrep x/y|x/y

Example on my device:

5020-A# show hardware internal gatos all-ports | egrep 1/3|1/4|1/38
xgb1/3 |2  |0  |2  |b7  |en |up |1:2:2:f|2  |56 |0   |4  |1a002000|pass
xgb1/4 |3  |0  |3  |b7  |en |up |1:3:3:f|3  |57 |0   |6  |1a003000|pass
<snip>
xgb1/38|37 |9  |2  |b7  |en |up |1:2:2:f|2  |14 |11  |2  |1a025000|pass

debug hardware internal gatos clear-counters interrupt 
show hardware internal gatos asic 0 counters interrupt match err
show hardware internal gatos asic 9 counters interrupt match err
In my case it is 0,9 based on what I've highlighted,
please modify this number to the relevant value depending on
which interfaces are impacted in your switch.

Thanks!

 

 

- Andrea, CCIE #56739 R&S

Hi

 

Thanks for you reply, i have theses errors on the parent chassis.

 

NXXXXXX001# show hardware internal gatos all-ports | egrep 1/15|1/16
xgb1/16|15 |1  |0  |b7  |en |up |0:0:0:f|0  |50 |1   |2  |1a00f000|pass
xgb1/15|14 |1  |1  |b7  |en |up |0:1:1:f|1  |51 |1   |0  |1a00e000|pass
NXXXXXX001# debug hardware internal gatos clear-counters interrupt
Done.
NXXXXXX001# show hardware internal gatos asic 1 counters interrupt match err

Gatos 1 interrupt statistics:
Interrupt name                                 |Count   |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_pkt_err_cb_bm_eof_err           |1       |0       |1       |0
gat_fw0_INT_eg_pkt_err_eth_crc_stomp           |1       |0       |1       |0
gat_fw0_INT_eg_pkt_err_ip_pyld_len_err         |1       |0       |1       |0
gat_fw1_INT_eg_parse_unexp_ipv4_ver_err        |1       |0       |1       |0
gat_fw1_INT_eg_parse_unexp_ipv4_hl_err         |1       |0       |1       |0
gat_fw1_INT_eg_parse_unexp_ipv4_csum_err       |1       |0       |1       |0
gat_fw1_INT_eg_pkt_err_cb_bm_eof_err           |1       |0       |1       |0
gat_fw1_INT_eg_pkt_err_eth_crc_stomp           |1       |0       |1       |0
gat_fw1_INT_eg_pkt_err_ip_pyld_len_err         |1       |0       |1       |0
gat_mm0_INT_rlp_tx_pkt_crc_err                 |1       |0       |1       |0
gat_mm1_INT_rlp_tx_pkt_crc_err                 |1       |0       |1       |0
Done.
NXXXXXX001#

Jordan,

 

Could you run "show hardware internal gatos asic 1 counters interrupt match err" again and post the output? Since they were cleared yesterday, Im curious to see if any of the counters incremented drastically (assuming the output errors did as well)

 

Thanks!

 

 

 

 

- Andrea, CCIE #56739 R&S

Hello Andrea,

 

NETVELLAGG001# show hardware internal gatos asic 1 counters interrupt match err

 

Here are the results this morning :
Gatos 1 interrupt statistics:
Interrupt name                                 |Count   |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_parse_unexp_ipv4_ver_err        |34      |0       |4       |0
gat_fw0_INT_eg_parse_unexp_ipv4_hl_err         |d       |0       |1       |0
gat_fw0_INT_eg_parse_unexp_ipv4_csum_err       |5d4     |0       |4       |0
gat_fw0_INT_eg_pkt_err_cb_bm_eof_err           |af7     |0       |3       |0
gat_fw0_INT_eg_pkt_err_eth_crc_stomp           |af7     |0       |3       |0
gat_fw0_INT_eg_pkt_err_e802_3_len_err          |2       |0       |2       |0
gat_fw0_INT_eg_pkt_err_ip_pyld_len_err         |af4     |0       |4       |0
gat_fw1_INT_eg_parse_unexp_ipv4_ver_err        |3e4     |0       |4       |0
gat_fw1_INT_eg_parse_unexp_ipv4_hl_err         |221     |0       |1       |0
gat_fw1_INT_eg_parse_unexp_ipv4_csum_err       |af3     |0       |3       |0
gat_fw1_INT_eg_pkt_err_cb_bm_eof_err           |af7     |0       |3       |0
gat_fw1_INT_eg_pkt_err_eth_crc_stomp           |af7     |0       |3       |0
gat_fw1_INT_eg_pkt_err_e802_3_len_err          |19      |0       |1       |0
gat_fw1_INT_eg_pkt_err_ip_pyld_len_err         |af6     |0       |2       |0
gat_mm0_INT_rlp_tx_pkt_crc_err                 |af7     |0       |3       |0
gat_mm1_INT_rlp_tx_pkt_crc_err                 |af7     |0       |3       |0

Jordan,

 

Looks like you may have some CRCs crawling around - The Nexus 5Ks/6Ks are cut-through switches so they are more than likely forwarding this out that port-channel but let's verify.

 

NETVELLAGG001# show hardware internal gatos asic 1 counters interrupt match err

Here are the results this morning : 
Gatos 1 interrupt statistics:
Interrupt name                                 |Count   |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_parse_unexp_ipv4_ver_err        |34      |0       |4       |0
gat_fw0_INT_eg_parse_unexp_ipv4_hl_err         |d       |0       |1       |0
gat_fw0_INT_eg_parse_unexp_ipv4_csum_err       |5d4     |0       |4       |0
gat_fw0_INT_eg_pkt_err_cb_bm_eof_err           |af7     |0       |3       |0
gat_fw0_INT_eg_pkt_err_eth_crc_stomp           |af7     |0       |3       |0
<snip>

dec 0xaf7 < 2807 CRCs at the time the command was run

Could you do the following:

sh clock; clear counters interface all 

* Wait 1-2 minutes for the output errors to occur, then *

term width 511
sh clock ; sh int counter err | egrep "Port|--|\B [1-9]" | egrep -v "\ 0\ *--\ *0\ *0\ *0\ *0"

Note: If you have FEX interfaces, this command may take a minute or two
This is OK and is not impacting the switch.

Thanks!

- Andrea, CCIE #56739 R&S

Hi,

Please see output :
Fri Mar 2 15:08:17 UTC 2018
--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/17 17 110243 0 110260 0 0
mgmt0 -- -- -- -- -- --
--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth1/17 0 0 0 0 0 17
mgmt0 -- -- -- -- -- --
--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
mgmt0 -- -- -- -- -- --

Jordan,

 

As you can see from the output, we are having tons of CRCs off of Eth1/17 - Does this lead to another switch or is this a host port?

 

If it's a host port, try swapping the fibre/cables and see if the CRCs persist. If they do, could be a bad NIC on the end device.

 

If this leads to another Nexus 5K/6K, you can repeat the clearing of the counters then checking the long regex command I previously shared on that neighboring device to see what the offending interface is.

 

Thanks!

- Andrea, CCIE #56739 R&S

On this port I have an old Cisco 6500 router

Jordan,

 

Awesome! Thanks for checking. Could you swap the cable/transceiver on both ends of that connection (if there's a patch in between, may have to try a different patch panel port as well) and see if the N5K keeps getting hit with thousands of CRCs? 

 

As a quick test you could shutdown Eth1/17 on the N5K and see if the output errors stop on your other port-channel - This is assuming Eth1/17 isn't playing a vital role in the design.

 

If swapping the cable/transceiver does not resolve the CRCs, we can contemplate a bad port on the Cat6500 but this is the least likely possibility. It happens though.

 

Thanks!

- Andrea, CCIE #56739 R&S