11-01-2019 09:19 AM
Here's one for you guys to tinker with:
I recently replaced a 3750X in our datacenter with a 3850. The switch in question is connected to a 7k via 2 trunk ports (Te1/1/3-4) configured as a port-channel. The second port (Te1/1/4) is showing numerous input and CRC errors, with reliability fluctuating somewhere between 230 and 250. I've swapped cables (literally swapped the fiber between Te1/1/3 and 1/1/4, both in parallel and crossing over), switched out multiple SFPs, even replaced the network module. I considered the possibility that the port-channel itself desync'd when I swapped devices, so I shut the faulty ports and rebuilt the port-channel from scratch. A look at sho controllers on the 3850 reveals a combination of SymbolErr and FcsErr frames (about 1 Symbol per every 6 Fcs), with no collisions. The 7k displays no errors.
Any ideas before I start building another 3850 to replace this one?
Solved! Go to Solution.
11-04-2019 05:07 AM
So it was definitely something with the 3850 itself, either hardware or software, I don't know which. I came in early this morning and the fault was still there so I swapped it out with another switch from storage. Interesting point, the link was flapping constantly overnight (and, I assume, over the weekend), which leads me to believe that the keepalive packets may have been at least partly to blame for the input and CRC errors? We had no issues with the link dropping when the interface was passing production traffic.
Anyway, thanks to everyone for your input on the matter. I'll just chalk this one up to being a ghost in the machine.
11-01-2019 09:29 AM
I would start with patch cable - any SFP re-seating.
Like to see the config and show output (since you mentioned reliability 230 and 250)
show controller ten x/x - of all side connected interface to look
11-01-2019 10:19 AM
On the 3850:
sho int t1/1/4
TenGigabitEthernet1/1/4 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is XXXX.XXXX.XXXX (bia XXXX.XXXX.XXXX)
MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
reliability 245/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Auto-duplex, Auto-speed, link type is auto, media type is SFP-10GBase-SR
input flow-control is on, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:11, output hang never
Last clearing of "show interface" counters 03:37:29
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 88000 bits/sec, 28 packets/sec
5 minute output rate 76000 bits/sec, 103 packets/sec
445802 packets input, 149529421 bytes, 0 no buffer
Received 234109 broadcasts (152246 multicasts)
0 runts, 0 giants, 0 throttles
77397 input errors, 65569 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 152246 multicast, 0 pause input
0 input packets with dribble condition detected
1194737 packets output, 203413847 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
sho controll e t1/1/4
Transmit TenGigabitEthernet1/1/4 Receive
1201731303 Total bytes 1057111238 Total bytes
9109580 Unicast frames 944941 Unicast frames
1195755314 Unicast bytes 886849428 Unicast bytes
50145 Multicast frames 1329693 Multicast frames
5975649 Multicast bytes 108151039 Multicast bytes
5 Broadcast frames 684291 Broadcast frames
340 Broadcast bytes 62110771 Broadcast bytes
0 System FCS error frames 0 IpgViolation frames
7 MacUnderrun frames 0 MacOverrun frames
0 Pause frames 0 Pause frames
0 Cos 0 Pause frames 0 Cos 0 Pause frames
0 Cos 1 Pause frames 0 Cos 1 Pause frames
0 Cos 2 Pause frames 0 Cos 2 Pause frames
0 Cos 3 Pause frames 0 Cos 3 Pause frames
0 Cos 4 Pause frames 0 Cos 4 Pause frames
0 Cos 5 Pause frames 0 Cos 5 Pause frames
0 Cos 6 Pause frames 0 Cos 6 Pause frames
0 Cos 7 Pause frames 0 Cos 7 Pause frames
0 Oam frames 0 OamProcessed frames
0 Oam frames 0 OamDropped frames
3478 Minimum size frames 46569 Minimum size frames
8215756 65 to 127 byte frames 2568356 65 to 127 byte frames
346748 128 to 255 byte frames 185193 128 to 255 byte frames
163730 256 to 511 byte frames 214853 256 to 511 byte frames
126922 512 to 1023 byte frames 1000076 512 to 1023 byte frames
301029 1024 to 1518 byte frames 263090 1024 to 1518 byte frames
2060 1519 to 2047 byte frames 217276 1519 to 2047 byte frames
0 2048 to 4095 byte frames 2265 2048 to 4095 byte frames
0 4096 to 8191 byte frames 24 4096 to 8191 byte frames
0 8192 to 16383 byte frames 0 8192 to 16383 byte frames
0 16384 to 32767 byte frame 0 16384 to 32767 byte frame
0 > 32768 byte frames 0 > 32768 byte frames
0 Late collision frames 99105 SymbolErr frames
0 Excess Defer frames 0 Collision fragments
0 Good (1 coll) frames 0 ValidUnderSize frames
0 Good (>1 coll) frames 0 InvalidOverSize frames
0 Deferred frames 0 ValidOverSize frames
0 Gold frames dropped 539769 FcsErr frames
0 Gold frames truncated
0 Gold frames successful
0 1 collision frames
0 2 collision frames
0 3 collision frames
0 4 collision frames
0 5 collision frames
0 6 collision frames
0 7 collision frames
0 8 collision frames
0 9 collision frames
0 10 collision frames
0 11 collision frames
0 12 collision frames
0 13 collision frames
0 14 collision frames
0 15 collision frames
0 Excess collision frames
LAST UPDATE 776 msecs AGO
sho run int t1/1/4
interface TenGigabitEthernet1/1/4
switchport trunk native vlan 999
switchport mode trunk
channel-group 11 mode active
On the 7009:
sho int e6/30
Ethernet6/30 is up
admin state is up, Dedicated Interface
Belongs to Po11
Hardware: 1000/10000 Ethernet, address: XXXX.XXXX.XXXX (bia XXXX.XXXX.XXXX)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is p2p
Port mode is trunk
full-duplex, 10Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on
Input flow-control is off, output flow-control is off
Auto-mdix is turned on
Rate mode is dedicated
Switchport monitor is off
Ethertype is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 00:04:14
Last clearing of "show interface" counters 03:21:38
19 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 13048 bits/sec, 15 packets/sec
30 seconds output rate 68536 bits/sec 29 packets/sec
input rate 13.05 Kbps, 15pps; output rate 102.93 Kbps, 24 pps
RX
1126152 unicast packets 5347 multicast packets 91821 broadcast packets
567001 output packets 167062101 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC/FCS 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input disregard
0 Rx pause
TX
291096 unicast packets 184123 multicast packets 91821 broadcast packets
567001 output packets 167062101 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
sho run int e6/30
interface Ethernet6/30
switchport
switchport mode trunk
switchport trunk native vlan 999
channel-group 11 mode active
no shutdown
11-01-2019 09:34 AM
is it just connected via fiber jumpers, or are you using "facility" fiber to get from a closet somewhere in the bldg back to the core location? If you're using the facility fiber then you could have a marginal strrand. If you have extra pairs try moving the link to a differnet pair. Also,if it is 10gig the fiber should be OM3 50 micron. If it is 62.5 micron then you'd need the LRM SFPs.
of course if you're just using fiber jumpers the it sounds like you've covered everything.
11-01-2019 10:28 AM
It's 50 micron fiber running through a patch panel between two racks. That said, as I previously mentioned, the problem presists on T1/1/4, even if I swap the cables between the two trunk ports. If it were the cable, then logically the fault would shift to the other interface. T1/1/3 is working perfectly.
11-01-2019 03:54 PM
You have shifted the cabled and tested. have you tried New SFP ?
11-01-2019 09:36 AM - edited 11-01-2019 09:38 AM
Hello,
you might be hitting the bug below:
Symptom: CRC errors may increment on the 10g uplink port or a front panel 10g port (if your switch supports it) that are not related to actual layer 1 issues. The interface must be a native 10g interface but can be running any speed (such as 1g) Conditions: A small number of unique file types being transmitted through the switch may increment the CRC counter. This is seen on both 3850 and 3650 for native 10g interfaces only. 1g interfaces are not impacted. Seen on 16.3.1 and 16.3.2, as well as the 3.7.x train.
CRC Errors on Uplink interface of WS-C3650 and WS-C3850 switches.
CSCvc44041
Description
Symptom:
CRC errors may increment on the 10g uplink port or a front panel 10g port (if your switch supports it) that are not related to actual layer 1 issues. The interface must be a native 10g interface but can be running any speed (such as 1g)
Conditions:
A small number of unique file types being transmitted through the switch may increment the CRC counter. This is seen on both 3850 and 3650 for native 10g interfaces only. 1g interfaces are not impacted.
Seen on 16.3.1 and 16.3.2, as well as the 3.7.x train.
Workaround:
No workarounds.
Further Problem Description:
If you have CRCs on the interface, there is no easy way to tell if they are the result of a genuine cable issue or this condition. Generally speaking, real CRC errors will increment at a rate proportional to the volume of traffic on the interface.
CRCs from this condition may not increment for a long period of time, and then suddenly start incrementing when a file transfer is taking place.
If historical interface CRC monitoring is available, CRCs that increment in a stepping motion (if graphed out) may be this issue. Slow or arching lines (if graphed out) may point to a legitimate layer 1 issue.
11-01-2019 10:23 AM
I'm running Everest-16.6.6
11-01-2019 03:49 PM
11-02-2019 07:10 PM
I’m not sure how the SFP on the 7k could be the culprit because the fault persists on T1/1/4 regardless of which port it’s connected to. E6/30 is just the original interface, when I swap cables that becomes E5/30 as the source, with no change to the error rate.
...Regardless, I’ve scheduled a reload on the 3850 for early Monday morning as a last ditch attempt at clearing the issue.
11-04-2019 05:07 AM
So it was definitely something with the 3850 itself, either hardware or software, I don't know which. I came in early this morning and the fault was still there so I swapped it out with another switch from storage. Interesting point, the link was flapping constantly overnight (and, I assume, over the weekend), which leads me to believe that the keepalive packets may have been at least partly to blame for the input and CRC errors? We had no issues with the link dropping when the interface was passing production traffic.
Anyway, thanks to everyone for your input on the matter. I'll just chalk this one up to being a ghost in the machine.
11-01-2019 10:01 AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide