Re: Need help with packet loss on 6880-X VSS pair

Steve Pfister · ‎09-23-2024

I've got a VSS pair of 6880-Xs in a VSS pair running 15.2(1)SY6 as the network core switch connecting to various IDFs with a pair of fiber in a portchannel. We also have some monitoring software that pings the IDF switches periodically go see if they're up.

Lately, we've been getting a lot of false alarms, like pings are getting dropped. No reports of network performance issues yet.

I've found no errors on any interfaces, but a lot of the interfaces on the core switch side have bursts of output drops. I'm not sure what's causing them. I don't think the traffic is anywhere near the max (10G transceivers and 10G port speeds over MM fiber)

Any ideas what to check?

I'm not sure the number of output drops is significant compared to the number of packets output (maybe 6 to 7%), we're just looking to have fewer false alarrms. QoS is in use, mainly to support the Cisco VoIP phones (using AutoQoS I think).

Reza Sharifi · ‎09-23-2024

A couple of things you can check on:

Check the SNMP process on the core switches. "sh process cpu sort | exc 0.00" should show you if the SNMP is causing high CPU. The other way to test is to run a continuous ping to one of the IDF switches and watch for ping drops.

HTH

Steve Pfister · ‎09-25-2024

SNMP doesn't seem to be taking much CPU time at all. I've checked several times and the top process at around 10% is SpanTree Flush. Could this be an indication of spanning-tree problems?

balaji.bandi · ‎09-23-2024

is all the interface showing same issue ? how long was the uptime of the device ?

have you clear the counters and check the how fast the errors growing,

you mentioned all Layer2, how is your spanning any reconvergence taking any time ?

can you post of the Ping loss out and interface output to look what is wrong.

how big is the network ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Steve Pfister · ‎09-25-2024

Not all of them, but a lot of them. Uptime is about 32 weeks now. I have cleared the counters and they increase in bursts, no real pattern that I can see. How do I tell if spanning tree reconvergence is taking time?

balaji.bandi · ‎09-25-2024

can you post one the output to look on that interface having issue.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Steve Pfister · ‎09-26-2024

Sure, here is one of them:

TenGigabitEthernet1/5/8 is up, line protocol is up (connected)
Hardware is C6k 10000Mb 802.3, address is a89d.21c0.2520 (bia a89d.21c0.2520)
MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 10Gb/s, media type is 10Gbase-SR
input flow-control is on, output flow-control is off
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output never, output hang never
Last clearing of "show interface" counters 5d21h
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 5510
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 94000 bits/sec, 8 packets/sec
5 minute output rate 6352000 bits/sec, 1307 packets/sec
12371039 packets input, 3252167756 bytes, 0 no buffer
Received 78429 broadcasts (73724 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
314370665 packets output, 170894264887 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Joseph W. Doherty · ‎09-26-2024

"Last clearing of "show interface" counters 5d21h"

"Total output drops: 5510"

"314370665 packets output,"

If this interface's stats are representative, unsurprising no reported performance issues, somewhat surprising you're seeing lots of monitoring issues. Hmm, you did mention AutoQoS might be active, correct?

balaji.bandi · ‎09-26-2024

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 5510

what other devices connected to the switches is this transit interface in the path ?

some guide lines to troubleshoot :

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/12027-53.html#toc-hId-1625982949

try change different interface that config and check (last option)

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Giuseppe Larosa · ‎09-23-2024

Hello @Steve Pfister ,

you can compare the output of

show interface type x/y

with that of

show policy-map interface x/y

or depending on the linecards involved other hardware specific show commands so that you can determine how may output drops are reported by QoS on each traffic class and to compare total Qos drops with output drops.

6-7% of drops needs your attention.

Hope to help

Giuseppe

Steve Pfister · ‎09-25-2024

show policy-map for each interface returns nothing. Do you know offhand what command shows QoS drops?

Joseph W. Doherty · ‎09-23-2024

". . . but a lot of the interfaces on the core switch side have bursts of output drops."

"Lately, we've been getting a lot of false alarms, like pings are getting dropped."

Well the latter is likely due to the former, i.e. NOT false alarms.

"I'm not sure what's causing them."

Generally egress queue overflow.

"I don't think the traffic is anywhere near the max (10G transceivers and 10G port speeds over MM fiber)"

A very, very, very common thought.

You might want to read:

https://www.cisco.com/c/en/us/support/docs/lan-switching/switched-port-analyzer-span/116260-technote-wireshark-00.html

https://notalwaysthenetwork.com/tag/output-drops/

Steve Pfister · ‎09-25-2024

When I said false alarms, I had more in mind "switch are being reported down by the monitoring software when they aren't" and not "pings occasionally fail because of packet drops". I will study those links you sent.

Joseph W. Doherty · ‎09-26-2024

"When I said false alarms, I had more in mind "switch are being reported down by the monitoring software when they aren't" and not "pings occasionally fail because of packet drops"."

Likey same cause, i.e. monitoring communication packets also being dropped.

BTW, a possible fix for monitoring is using QoS to try to avoid dropping its packets.