I noticed in my monitoring software (Solarwinds NPM) the other day that the interface connecting our main switch stack and our router has over 20 million (million!) transmit discards and climbing. My network has about 150-200 devices on it, this discard amount seems ridiculously high to me. We aren't having any noticeable network slowness, and had I not looked at this particular screen in the monitor software I wouldn't have known there was an issue at all. That being said, it was time to investigate.
I ran a "sh int" on my switch stack interface (gig speed) on my stack of 4 Cisco 3750 switches. Nothing looked unusual. Ran the same command on my core router (Cisco 2821) inside interface (gig speed). One thing stood out: 1237956 unknown protocol drops and rising. It's not 20 million but it's a good start.
Tracked down this article: https://supportforums.cisco.com/docs/DOC-15490 and made sure DTP, CDP, LLDP, VTP are not applied/enable on either where applicable. Checked, protocol drop count was still going up.
SPAN’d (mirrored) the switch port we're having problems with to sniff the traffic with Wireshark. Results show I’m getting a lot of “TCP Retransmission” “TCP Out-of-order” and “TCP Dup ACK”. From my research these errors show a bottle neck somewhere, I'm unsure of where that bottleneck is as everything is gigabit speed leading from the PC out to the internet. PC <--gig--> Switch <--gig--> Router <--gig--> Firewall.
My question: where do I go from here as far as figuring out where this bottleneck is? Or alternatively if it's not a bottleneck, what settings need to be changed to fix the unknown protocol drops?
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
The TCP stats sounds like packets are being delivered out of sequence. Can be adverse to performance and waste bandwidth. Anywhere along the path multiple links that are "load balanced" on a packet-by-packet basis?
The only thing like that we have is a tagging policy that's applied to an interface on the router heading out to a remote office, this again is the link between the switch and the router. Since the numbers are increasing on the router interface, I'm assuming the packets being bounced would need to come from the switch.
To test, I pulled the tagging policy off of the interface and did a sh int on the other router interface actually affected, no change.