08-16-2011 06:19 PM - edited 03-07-2019 01:44 AM
I am having an issue with intersite links via a given service provider. The links experiencing the error are ethernet over a DWDM mux point-to-point. Either end of the given link is a 3560 switch running IP Services.
Whenever the link comes under 10% or more load, packets start to get lost. We can replicate the fault at any point by running a decent size file transfer and then pinging across the link. At low utilisation the link experiences no problems.
This problem seems to occur on all links on the given service provider (with different end sites or end devices). However, we cannot replicate the fault on our links from another different service provider - it gets the full 1Gbps transfer to the same devices with zero errors. Initially I thought it might be a CPU overload issue, but the equipment is barely utilised even when we run our file transfer test, and you'd expect the device to show the same problem on the other service provider if it was a CPU issue.
We've checked the patches and cables, I've checked the CPU usage during the problem and it doesn't seem to go terribly high. The service provider has done tests and they've all come back clean. I'd suspect an IOS bug but then surely the problem would occur on either providers links (similar thoughts for the CPU usage as well)?
Does anyone have any ideas of how to troubleshoot this further?
08-16-2011 06:22 PM
For some reason this forum won't let me post my show commands in full, so here are the relevent parts:
GigabitEthernet0/1 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet
Internet address is x.x.x.x/30
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
Last clearing of "show interface" counters 6d19h
Input queue: 1/75/0/0 (size/max/drops/flushes); Total output drops: 321
311 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 347042 multicast, 0 pause input
0 input packets with dribble condition detected
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
08-16-2011 08:59 PM
Hi Dan,
Input and CRC errors are the indication of the physical layer issue ie bad cable, bad port , bad GBIC or patch panel.
Please check the above in case teh input error or the crc are incrementing .
Also please let me know if you have qos configured on the switch. give teh output of " show run int gig 0/1 " mad " sh mls qos"
Regards,
Swati Dheer
Please rate helpful posts
08-16-2011 09:42 PM
We can replicate this fault across three different pairs of seperate devices (at different locations), so it is exceedingly unlikely to be a physical issue on a port, device, or cable. Also - a poor physical connection or faulty card would most likely generate CRC fails in addition to the input errors. This problem is much harder to pinpoint than a single physical fault. I'm guessing it has to be configuration, software, or vendor incompatibility based?
#sh mls qos
QoS is disabled
QoS ip packet dscp rewrite is enabled
08-16-2011 10:04 PM
Latest test conducted - we induced about a 50% load on one of the other circuits experiencing the issue (a different one than I first posted for, with different devices at either end) then ran ping tests during the load. All pings got roughly a 1% packet loss.
Relevent show output below:
GigabitEthernet1/0/48
Input queue: 1/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
GigabitEthernet0/23
Input queue: 4/75/0/0 (size/max/drops/flushes); Total output drops: 0
66 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
08-16-2011 10:52 PM
Show controllers is interesting:
Transmit GigabitEthernet1/0/48 Receive
2567310152 Bytes 1834651575 Bytes
115349883 Unicast frames 120413962 Unicast frames
15897212 Multicast frames 8672424 Multicast frames
468900 Broadcast frames 87379 Broadcast frames
0 Too old frames 719393324 Unicast bytes
0 Deferred frames 602872684 Multicast bytes
0 MTU exceeded frames 6001341 Broadcast bytes
0 1 collision frames 0 Alignment errors
0 2 collision frames 0 FCS errors
0 3 collision frames 0 Oversize frames
0 4 collision frames 0 Undersize frames
0 5 collision frames 0 Collision fragments
0 6 collision frames
0 7 collision frames 1620645 Minimum size frames
0 8 collision frames 44888557 65 to 127 byte frames
0 9 collision frames 21856250 128 to 255 byte frames
0 10 collision frames 394208 256 to 511 byte frames
0 11 collision frames 6862595 512 to 1023 byte frames
0 12 collision frames 2088574 1024 to 1518 byte frames
0 13 collision frames 0 Overrun frames
0 14 collision frames 0 Pause frames
0 15 collision frames
0 Excessive collisions 2968 Symbol error frames
0 Late collisions 0 Invalid frames, too large
0 VLAN discard frames 51465899 Valid frames, too large
0 Excess defer frames 5 Invalid frames, too small
341874 64 byte frames 0 Valid frames, too small
54154981 127 byte frames
21702502 255 byte frames 0 Too old frames
831414 511 byte frames 0 Valid oversize frames
10821956 1023 byte frames 0 System FCS error frames
6957711 1518 byte frames 0 RxPortFifoFull drop frame
36905557 Too large frames
0 Good (1 coll) frames
0 Good (>1 coll) frames
Transmit GigabitEthernet0/23 Receive
1945821251 Bytes 2200873797 Bytes
120371704 Unicast frames 114989573 Unicast frames
7252636 Multicast frames 12296837 Multicast frames
148373 Broadcast frames 310873 Broadcast frames
0 Too old frames 824422628 Unicast bytes
0 Deferred frames 843903821 Multicast bytes
0 MTU exceeded frames 23333023 Broadcast bytes
0 1 collision frames 0 Alignment errors
0 2 collision frames 0 FCS errors
0 3 collision frames 0 Oversize frames
0 4 collision frames 0 Undersize frames
0 5 collision frames 0 Collision fragments
0 6 collision frames
0 7 collision frames 268670 Minimum size frames
0 8 collision frames 50270535 65 to 127 byte frames
0 9 collision frames 21604623 128 to 255 byte frames
0 10 collision frames 814573 256 to 511 byte frames
0 11 collision frames 10809888 512 to 1023 byte frames
0 12 collision frames 6938455 1024 to 1518 byte frames
0 13 collision frames 0 Overrun frames
0 14 collision frames 0 Pause frames
0 15 collision frames
0 Excessive collisions 68 Symbol error frames
0 Late collisions 0 Invalid frames, too large
0 VLAN discard frames 36890607 Valid frames, too large
0 Excess defer frames 0 Invalid frames, too small
1304817 64 byte frames 0 Valid frames, too small
43695120 127 byte frames
21835495 255 byte frames 0 Too old frames
387587 511 byte frames 0 Valid oversize frames
6838776 1023 byte frames 0 System FCS error frames
2068885 1518 byte frames 0 RxPortFifoFull drop frame
51642033 Too large frames
0 Good (1 coll) frames
0 Good (>1 coll) frames
08-17-2011 02:53 AM
try hardcoding you interface and have your service provider hardcode theirs..it maybe irrelevant but would you know what the service provider's equipment is?..the one connected to your device.
08-17-2011 09:48 PM
We tried the double-hardcoding to no effect.
The service provider equipment is Huawei WDM.
Further testing has revealed that the errors only occur during certain traffic types - our traffic generators would get zero errors even at 800Mbps, but a single Robocopy would generate the errors.
We think we've managed to at least work around the problem - if you install an SFP and use that instead of the onboard gigabit ethernet interfaces then the problem goes away. Whether this is an incompatibility between Huawei and Cisco on-board ports, or something to do with the differences between how the SFPs and the onboards handle bursty traffic we aren't sure.
08-18-2011 10:14 PM
hmm...as i remember huawei is a bit flaky and has compatibility issues...did you have the service provider try and rebuild the VLANs/circuits?
08-19-2011 02:30 AM
Hi,
Ever checked frame size?
Your counters show lots of too large frames.
08-21-2011 06:51 PM
jjtanner wrote:
Hi,
Ever checked frame size?
Your counters show lots of too large frames.
MTUs are standard so frame size should not be a problem. That particular link is a dot1q trunk, the other links we've replicated the errors on are standard layer 3 point-to-point circuits, which show no large frame counters.
did you have the service provider try and rebuild the VLANs/circuits?
Yes the service provider did rebuild the circuits, but it had no effect.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide