10-05-2016 08:47 AM - edited 03-08-2019 07:41 AM
Has anyone experience of the above switches? I have multiple stacks (no more than 5 switches in a stack) and seeing very high number of Output Errors on a lot of ports. The total number of output drops exactly match the output errors, e.g
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 17890212
17890212 output errors, 0 collisions, 1 interface resets
Ports are running Full/1GB
Checking the CPU history also I see the switch has spiked to 100% on multiple occasions over 72 hr period.
I have read that this could be buffering issues. possibly due to bursty traffic and not much can actually be done.
Does this sound like potentially switch performance?
10-06-2016 07:40 AM
Hi,
same issue here. On every 3650 I have access to. Output errors count output discards (which seems wrong, never saw that before on Cisco gear, and I deployed plenty). Example from SNMP:
IF-MIB::ifOutDiscards.16 = Counter32: 652000252 IF-MIB::ifOutDiscards.17 = Counter32: 3619677524 IF-MIB::ifOutDiscards.18 = Counter32: 3116752094 IF-MIB::ifOutDiscards.19 = Counter32: 4169394348 IF-MIB::ifOutDiscards.20 = Counter32: 1617800266 IF-MIB::ifOutDiscards.21 = Counter32: 3103643133 IF-MIB::ifOutDiscards.22 = Counter32: 3295355258 IF-MIB::ifOutDiscards.23 = Counter32: 4029261027 [...] IF-MIB::ifOutErrors.16 = Counter32: 652000252 IF-MIB::ifOutErrors.17 = Counter32: 3619677524 IF-MIB::ifOutErrors.18 = Counter32: 3116752094 IF-MIB::ifOutErrors.19 = Counter32: 4169394348 IF-MIB::ifOutErrors.20 = Counter32: 1617800266 IF-MIB::ifOutErrors.21 = Counter32: 3103643133 IF-MIB::ifOutErrors.22 = Counter32: 3295355258 IF-MIB::ifOutErrors.23 = Counter32: 4029261027
Please note how ifOutErrors follows exactly the ifOutDiscards counters on these ports. I consider this unnatural - discards are not errors, they've got their own counters for a reason. It's also unclear if the rate of discards (and resulting errors) is really correct. My monitoring sometimes states loaded interfaces would have an error rate of 50% or such, which would be entirely fatal - but at the same time, a ping going through that port is perfectly loss-free. I'm seeing this on standalone 3650s as well as stacked ones. I'm seeing it on port-channel members as well as individual switchports. I'm seeing it on ports manually set to a lower speed (e.g. 100Mbps) as well as on loaded 1G-ports. IOS-XE is 03.07.04.E.
According to show int Gi1/0/X controller the only error counter that goes upward here is Excess Defer frames:
[...] 0 Late collision frames 0 SymbolErr frames 14502702154 Excess Defer frames 0 Collision fragments 0 Good (1 coll) frames 0 ValidUnderSize frames 0 Good (>1 coll) frames 0 InvalidOverSize frames 0 Deferred frames 0 ValidOverSize frames 0 Gold frames dropped 0 FcsErr frames [...]
I can't find any documentation on what excess defer might be (at least in the context of non-CSMA/CD aka Full Duplex Ethernet). Is this counter abused to count queue tail drops? This drives us crazy due to the monitoring false positives (interface errors are something serious, output discards usually aren't).
What's wrong here? Just this years version of Cisco Counter Fun - or am I missing something? Replacing 3560s by 3650s got harder now we have new failure modes...
TIA & HTH,
Andre.
12-17-2016 09:44 AM
Hi,
I can confirm this issue. IMHO this happens cause we have massive traffic FROM 10G out to 1G.
Funny thing is, this happens only on a Stack with 3.7.3 but not on a Stack with 3.3.5?
If you have a test machine try to downgrade it.
There's also a known bug that output drops and output discards rise with same values, but only fixed in 16.5 or 3.6.5 interim.
08-09-2018 07:07 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide