02-28-2018 01:49 AM - edited 03-05-2019 10:00 AM
Hello everyone,
We have just migrated our 1G CWDM network to 10G DWDM with Cisco Nexus 3000 and 5000.
I notice errors on a channel port made up of two ports that carries about 6Gbps of traffic constantly, I have about 8000 errors per second in output on the channel port and about 4000 on each of the two members.
I increased the MTU from end to end to 9216 using the following Cisco documentation https://www.cisco.com/c/en/us/support/docs/switches/nexus-5000-series-switches/112080-config-mtu-nexus.html but that doesn't change anything.
The optical signal quality seems to be correct between -15 and -16 in RX power
02-28-2018 04:18 PM
Hi Jordan,
What kind of interfaces are you seeing these output errors on? Ethernet interfaces on the parent chassis, or interfaces off of a FEX perhaps?
Could you share the following assuming these are interfaces on the parent chassis (the 5020 itself):
show hardware internal gatos all-ports | egrep x/y|x/y Example on my device: 5020-A# show hardware internal gatos all-ports | egrep 1/3|1/4|1/38 xgb1/3 |2 |0 |2 |b7 |en |up |1:2:2:f|2 |56 |0 |4 |1a002000|pass xgb1/4 |3 |0 |3 |b7 |en |up |1:3:3:f|3 |57 |0 |6 |1a003000|pass <snip> xgb1/38|37 |9 |2 |b7 |en |up |1:2:2:f|2 |14 |11 |2 |1a025000|pass debug hardware internal gatos clear-counters interrupt show hardware internal gatos asic 0 counters interrupt match err show hardware internal gatos asic 9 counters interrupt match err
In my case it is 0,9 based on what I've highlighted,
please modify this number to the relevant value depending on
which interfaces are impacted in your switch.
Thanks!
03-01-2018 03:25 AM
Hi
Thanks for you reply, i have theses errors on the parent chassis.
NXXXXXX001# show hardware internal gatos all-ports | egrep 1/15|1/16
xgb1/16|15 |1 |0 |b7 |en |up |0:0:0:f|0 |50 |1 |2 |1a00f000|pass
xgb1/15|14 |1 |1 |b7 |en |up |0:1:1:f|1 |51 |1 |0 |1a00e000|pass
NXXXXXX001# debug hardware internal gatos clear-counters interrupt
Done.
NXXXXXX001# show hardware internal gatos asic 1 counters interrupt match err
Gatos 1 interrupt statistics:
Interrupt name |Count |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |1 |0
gat_fw0_INT_eg_pkt_err_eth_crc_stomp |1 |0 |1 |0
gat_fw0_INT_eg_pkt_err_ip_pyld_len_err |1 |0 |1 |0
gat_fw1_INT_eg_parse_unexp_ipv4_ver_err |1 |0 |1 |0
gat_fw1_INT_eg_parse_unexp_ipv4_hl_err |1 |0 |1 |0
gat_fw1_INT_eg_parse_unexp_ipv4_csum_err |1 |0 |1 |0
gat_fw1_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |1 |0
gat_fw1_INT_eg_pkt_err_eth_crc_stomp |1 |0 |1 |0
gat_fw1_INT_eg_pkt_err_ip_pyld_len_err |1 |0 |1 |0
gat_mm0_INT_rlp_tx_pkt_crc_err |1 |0 |1 |0
gat_mm1_INT_rlp_tx_pkt_crc_err |1 |0 |1 |0
Done.
NXXXXXX001#
03-01-2018 04:30 AM
Jordan,
Could you run "show hardware internal gatos asic 1 counters interrupt match err" again and post the output? Since they were cleared yesterday, Im curious to see if any of the counters incremented drastically (assuming the output errors did as well)
Thanks!
03-02-2018 02:47 AM
Hello Andrea,
NETVELLAGG001# show hardware internal gatos asic 1 counters interrupt match err
Here are the results this morning :
Gatos 1 interrupt statistics:
Interrupt name |Count |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_parse_unexp_ipv4_ver_err |34 |0 |4 |0
gat_fw0_INT_eg_parse_unexp_ipv4_hl_err |d |0 |1 |0
gat_fw0_INT_eg_parse_unexp_ipv4_csum_err |5d4 |0 |4 |0
gat_fw0_INT_eg_pkt_err_cb_bm_eof_err |af7 |0 |3 |0
gat_fw0_INT_eg_pkt_err_eth_crc_stomp |af7 |0 |3 |0
gat_fw0_INT_eg_pkt_err_e802_3_len_err |2 |0 |2 |0
gat_fw0_INT_eg_pkt_err_ip_pyld_len_err |af4 |0 |4 |0
gat_fw1_INT_eg_parse_unexp_ipv4_ver_err |3e4 |0 |4 |0
gat_fw1_INT_eg_parse_unexp_ipv4_hl_err |221 |0 |1 |0
gat_fw1_INT_eg_parse_unexp_ipv4_csum_err |af3 |0 |3 |0
gat_fw1_INT_eg_pkt_err_cb_bm_eof_err |af7 |0 |3 |0
gat_fw1_INT_eg_pkt_err_eth_crc_stomp |af7 |0 |3 |0
gat_fw1_INT_eg_pkt_err_e802_3_len_err |19 |0 |1 |0
gat_fw1_INT_eg_pkt_err_ip_pyld_len_err |af6 |0 |2 |0
gat_mm0_INT_rlp_tx_pkt_crc_err |af7 |0 |3 |0
gat_mm1_INT_rlp_tx_pkt_crc_err |af7 |0 |3 |0
03-02-2018 05:53 AM
Jordan,
Looks like you may have some CRCs crawling around - The Nexus 5Ks/6Ks are cut-through switches so they are more than likely forwarding this out that port-channel but let's verify.
NETVELLAGG001# show hardware internal gatos asic 1 counters interrupt match err Here are the results this morning : Gatos 1 interrupt statistics: Interrupt name |Count |ThresRch|ThresCnt|Ivls -----------------------------------------------+--------+--------+--------+---- gat_fw0_INT_eg_parse_unexp_ipv4_ver_err |34 |0 |4 |0 gat_fw0_INT_eg_parse_unexp_ipv4_hl_err |d |0 |1 |0 gat_fw0_INT_eg_parse_unexp_ipv4_csum_err |5d4 |0 |4 |0 gat_fw0_INT_eg_pkt_err_cb_bm_eof_err |af7 |0 |3 |0 gat_fw0_INT_eg_pkt_err_eth_crc_stomp |af7 |0 |3 |0 <snip> dec 0xaf7 < 2807 CRCs at the time the command was run Could you do the following: sh clock; clear counters interface all * Wait 1-2 minutes for the output errors to occur, then * term width 511 sh clock ; sh int counter err | egrep "Port|--|\B [1-9]" | egrep -v "\ 0\ *--\ *0\ *0\ *0\ *0" Note: If you have FEX interfaces, this command may take a minute or two
This is OK and is not impacting the switch.
Thanks!
03-02-2018 06:45 AM
03-02-2018 06:49 AM - edited 03-02-2018 06:49 AM
Jordan,
As you can see from the output, we are having tons of CRCs off of Eth1/17 - Does this lead to another switch or is this a host port?
If it's a host port, try swapping the fibre/cables and see if the CRCs persist. If they do, could be a bad NIC on the end device.
If this leads to another Nexus 5K/6K, you can repeat the clearing of the counters then checking the long regex command I previously shared on that neighboring device to see what the offending interface is.
Thanks!
03-02-2018 06:54 AM
03-02-2018 07:05 AM
Jordan,
Awesome! Thanks for checking. Could you swap the cable/transceiver on both ends of that connection (if there's a patch in between, may have to try a different patch panel port as well) and see if the N5K keeps getting hit with thousands of CRCs?
As a quick test you could shutdown Eth1/17 on the N5K and see if the output errors stop on your other port-channel - This is assuming Eth1/17 isn't playing a vital role in the design.
If swapping the cable/transceiver does not resolve the CRCs, we can contemplate a bad port on the Cat6500 but this is the least likely possibility. It happens though.
Thanks!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide