cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
12695
Views
5
Helpful
8
Replies

Nexus 5K massive interface/CRC errors

meieltsptx
Level 1
Level 1

Hello,

I'm having some issue's on a Nexus5K with storage appliances.

The whole switch is getting errors. But the main problem, i think, is at the fiber towards datacenter and the two storage appliances.

Some VM in the cisco UCS (see attachment) is trying to copy from Stor1 to Stor2, so the UCS is requesting the traffic, and then sending the same traffic back over the same fiberchannel from datacenter to the office. I'm getting loads and loads of CRC errors and input errors on the red switch (datacenter-port). The datacenter switch shows NO errors.


RX
7485577331 unicast packets 240182380 multicast packets 8779090 broadcast packets
7735003673 input packets 8305417669706 bytes
5113054770 jumbo packets 0 storm suppression packets
0 runts 0 giants 464872 CRC 0 no buffer
464872 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
15134909252 unicast packets 144902 multicast packets 217191 broadcast packets
15135271345 output packets 21450829077674 bytes
13785569567 jumbo packets
0 output errors 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

On both switchports towards the STOR appliances, i receive alot of output errors.

RX
1930336700 unicast packets 10 multicast packets 3313 broadcast packets
1930340023 input packets 2675456250533 bytes
1724901392 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
1537143681 unicast packets 357350 multicast packets 382784 broadcast packets
1538005196 output packets 2020346047476 bytes
1295596131 jumbo packets
121381 output errors 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
0 interface resets

In addition, ALL interfaces on the red switch are getting alot of errors, see output;

sw0# show interface | i error
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
7953 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
5530 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
121536 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
4957 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
106 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
78 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
616 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
31658 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
28663 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
10728 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
10728 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
112980 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
67163 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
12317 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
10729 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
10 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
1 output errors 0 collision 0 deferred 0 late collision
466410 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
115805 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
5757 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
13485 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
31669 output errors 0 collision 0 deferred 0 late collision
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
28665 output errors 0 collision 0 deferred 0 late collision

So i'm wondering what could be the issue here. I can't easily check the cables, especially the one towards the datacenter because it's single link i cant detach it. 

The Speed/duplex settings on the STORdevices are AUTO. At the switch they are 10GB/FD. 

But also these are hard to change (to fixed, or both auto) without a maintenance window because of possible network hickup. 

I was thinking; maybe the switch itself has problems because all interfaces are generating errors. Is there anyone who had likewise problems or anyone with an idea of what may be going on here?

We will plan maintenance window to check and clean cables and changing the speed/duplex settings to test. But could it also be a problem in the switch itself because ALL interfaces having problems?

Software version: System version: 5.2(1)N1(7)

Thanks in advance.

8 Replies 8

pdub206
Level 1
Level 1

It may be good to also check the detailed version of your counter errors to find out what type of errors you are seeing.

Try show int eth x/y counters detailed all

Throwing packets since 2012

  All Port Counters:
0. InPackets = 8580659635
1. InOctets = 9106076378825
2. InUcastPkts = 8267468591
3. InMcastPkts = 302050032
4. InBcastPkts = 10623961
5. InJumboPkts = 5587949439
6. StormSuppressPkts = 0
7. OutPackets = 16712156211
8. OutOctets = 23640636566339
9. OutUcastPkts = 16711679170
10. OutMcastPkts = 196322
11. OutBcastPkts = 280719
12. OutJumboPkts = 15185168925
13. rxHCPkts64Octets = 107271997
14. rxHCPkts65to127Octets = 2013223481
15. rxHCPkts128to255Octets = 467848282
16. rxHCPkts256to511Octets = 115297663
17. rxHCpkts512to1023Octets = 109259029
18. rxHCpkts1024to1518Octets = 179809744
19. rxHCpkts1519to1548Octets = 0
20. txHCPkts64Octets = 100532608
21. txHCPkts65to127Octets = 691635067
22. txHCPkts128to255Octets = 239252823
23. txHCPkts256to511Octets = 44116301
24. txHCpkts512to1023Octets = 317144019
25. txHCpkts1024to1518Octets = 134306468
26. txHCpkts1519to1548Octets = 0
27. ShortFrames = 0
28. Collisions = 0
29. SingleCol = 0
30. MultiCol = 0
31. LateCol = 0
32. ExcessiveCol = 0
33. LostCarrier = 0
34. NoCarrier = 0
35. Runts = 0
36. Giants = 0
37. InErrors = 517051
38. OutErrors = 0
39. InputDiscards = 0
40. BadEtypeDrops = 0
41. IfDownDrops = 0
42. InUnknownProtos = 0
43. txErrors = 0
44. rxCRC = 517051
45. Symbol = 0
46. txDropped = 0
47. TrunkFramesTx = 16712132625
48. TrunkFramesRx = 8579916571
49. WrongEncap = 0
50. Babbles = 0
51. Watchdogs = 0
52. ECC = 0
53. Overruns = 0
54. Underruns = 0
55. Dribbles = 0
56. Deferred = 0
57. Jabbers = 0
58. NoBuffer = 0
59. Ignored = 0
60. bpduOutLost = 0
61. cos0OutLost = 0
62. cos1OutLost = 0
63. cos2OutLost = 0
64. cos3OutLost = 0
65. cos4OutLost = 0
66. cos5OutLost = 0
67. cos6OutLost = 0
68. cos7OutLost = 0
69. RxPause = 0
70. TxPause = 0
71. Resets = 0
72. SQETest = 0
73. InLayer3Routed = 0
74. InLayer3RoutedOctets = 0
75. OutLayer3Routed = 0
76. OutLayer3RoutedOctets = 0
77. OutLayer3Unicast = 0
78. OutLayer3UnicastOctets = 0
79. OutLayer3Multicast = 0
80. OutLayer3MulticastOctets = 0
81. InLayer3Unicast = 0
82. InLayer3UnicastOctets = 0
83. InLayer3Multicast = 0
84. InLayer3MulticastOctets = 0
85. InLayer3AverageOctets = 0
86. InLayer3AveragePackets = 0
87. OutLayer3AverageOctets = 0
88. OutLayer3AveragePackets = 0

Above is the relevant info i guess. Nothing new if i see 37 and 44... anything strange about the rest?

I'm guessing you have a port-channel between your two switches, right? If so, you can silently remove one of the member ports by shutting down the link on both sides.  The port-channel should continue to operate as normal and there should be no downtime.  The same could be done with a server if it is dual-homed to the same switch.

In this way, I would proceed to swap in new SFP's on either side, bring the ports back up, and verify if anything had changed (clear your counters beforehand).  If not, try replacing the fiber cabling as well using the same shut/no shut procedure.

If you can validate your layer 1 medium is fine, or shows the errors no matter what, I'd open a case with Cisco.  It's more likely that your switch may have some issues, but it's better to open a case with a lot of troubleshooting already complete to help the TAC agent quickly diagnose your problem.

Throwing packets since 2012

Thanks, but the fiber to the DC is a direct fiber single link. :(

I guess we have to bring it down in MT window and check if L1 is correct indeed. That's what we're planning to do. 

As all the interfaces are having troubles, I was hoping to find someone who encountered similar problems.

Hi,

I also having the same problem , And I have replaced the all fiber and SFP,s both sides but still getting the errors on the all interfaces.

Hello,

If you already replaced the fiber cables and transceivers, try to bounce the port shut (wait like 10 seconds) then no shut the interface. It should reset the transmission in the packets and then monitor the interface every 20 minutes to check for errors. Try that.

Vishal Pathak
Level 1
Level 1

The crc errors are increasing in the RX ( receive ) direction on the red nexus switch. So, the frames reaching the red nexus switch are themselves corrupted. Remember, that the nexus switch ( at 10 Gbps ingress B/W) is a cut-through switch, which means it will start forwarding the packet as soon as it has enough information required to do so.

So, in this case, even before the red nexus switch checks the FCS ( at the end of the frame received from the datacenter nexus switch ), it starts forwarding it. So, it stomps the frame ( corrupts the CRC ), so that the store-and-forward downstream switches can drop the frame. The TX crc is increasing as the switch is sending out a frame with corrupt crc.

We need to trace the source of this CRC error. Now that it is clear the data center nexus switch is sending out the corrupt frames, we need to find out if the nexus is corrupting the frames or it is also a victim like the red nexus switch.

You will need to check the interface connected to the UCS. If you find Receive CRC on the interface, that would indicate the datacenter nexus switch is also receiving corrupt frames.

Locating the source of CRC error is very important to apply remedial action. If we don't trace to the source of the error, replacing the physical links and SFPs elsewhere will not solve the problem.

CRC error are indicative of a layer 1 issue. If there is a patch panel involved in between, try connecting the devices directly. Clear the interface counters and then monitor.

As a corrective measure the physical link needs to be replaced. If that does not remove the errors, the SFPs need to be changed.

Hope that helps.

lalit.6789
Level 1
Level 1

Hi,

Did u get any solution for the issue because I also having the same issue in our environment .

If yes then please also let me know.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card