Solved: Hi all,

Alberto Seg · ‎06-23-2016

we have a problem between 4 switches catalyst 3750, suddenly we have detected that there are a lot of packet loss between several point to point connection

I attached a JPG with the scheme.

Red lines are the connections with problems. Green line don't have problems

for example:

P28TAMERO05#ping 10.25.110.6 repeat 100 size 500 source 10.25.110.5

Type escape sequence to abort.

Sending 100, 500-byte ICMP Echos to 10.25.110.6, timeout is 2 seconds:

Packet sent with a source address of 10.25.110.5

...!!!!...!!.!!!.!!.!.!!!!.!!!!.!!!.!.!!.!!!.!!!.!.!.!!!!.!!!!.!!!.!!!

!.!!.!!.!!!!..!!!!..!!!..!!!.!

Success rate is 68 percent (68/100), round-trip min/avg/max = 8/11/42 ms

P28TAMERO05#

P28TAMERO05#ping 10.25.110.10 repeat 100 size 500 source 10.25.110.9

Type escape sequence to abort.

Sending 300, 500-byte ICMP Echos to 10.25.110.10, timeout is 2 seconds:

...........................!!..........!!!.!!!!...!!.......!!!.!!!.!..

.!...!.

Success rate is 25 percent (20/77), round-trip min/avg/max = 8/12/25 ms

P28TAMERO05#

P28TAMERO05#ping 10.25.110.58 repeat 100 size 500 source 10.25.110.57

Type escape sequence to abort.

Sending 100, 500-byte ICMP Echos to 10.25.110.58, timeout is 2 seconds:

Packet sent with a source address of 10.25.110.57

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Success rate is 100 percent (100/100), round-trip min/avg/max = 8/13/26 ms

P28TAMERO05#

We have similar outputs in the rest of the switches.

Circuits are different, two of them are PDH circuits, and the other are carrier ethernet E-Lines, but we have reviewed the circuits and not present any problems.

The Vlans for users that we have behind the switches don't have connectivity problems, for example:

P28TAMERO05#ping 10.25.111.111 size 500 source 10.25.115.1 repeat 300

Type escape sequence to abort.

Sending 300, 500-byte ICMP Echos to 10.25.111.111, timeout is 2 seconds:

Packet sent with a source address of 10.25.115.1

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!

Success rate is 100 percent (300/300), round-trip min/avg/max = 8/13/34 ms

P28TAMERO05#

We think that there are problems with the control plane or in how the switch handle the traffic (or just have nothing to do and we are wrong) but we don't know how see this or which would be the best solution for this, because besides this means that the management of the devices are very dificult because is interrupt constantly

Alberto Seg · ‎09-26-2016

Hi all,

Finally the problem was the bug CSCub04965 - TCP Session hung causing Packet Loss

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCub04965/?reffering_site=dumpcr

Regards

View solution in original post

pwwiddicombe · ‎06-23-2016

Do you have any messages in your log about MAC address flapping?

Have you enabled spanning tree, and consistently? Don't intermix rapid-stp and mstp.

Have client ports got bpduguard enabled?

Are any of the WAN circuits (or internal links) running into capacity issues?

Have you checked the interfaces between switches for errors (crc, framing, drops)?

Alberto Seg · ‎06-23-2016

Hello, I answer your questions

Do you have any messages in your log about MAC address flapping? No, there aren’t any messages with this evidence.

Have you enabled spanning tree, and consistently? Don't intermix rapid-stp and mstp. Yes, is consistently. Only have rspt.

Have client ports got bpduguard enabled? Yes

Are any of the WAN circuits (or internal links) running into capacity issues? No.

Have you checked the interfaces between switches for errors (crc, framing, drops)? Yes, and there aren’t any errors in the interfaces between switches

pwwiddicombe · ‎06-23-2016

Do you see this behavior all the time, most of the time, or occasionally?

What is the circuit, and subscribed bandwidth, for 10.25.110.10 ? Can you post "show interface" for that and the .9 end of the links? Is there any possibility there are high-volume replication tasks running when you run into issues? Remember typically PING traffic is low priority; however that much loss will indicate there is congestion or overload of some kind.

sh mem and sh proc on the two ends might also shed some light on processor use and memory - if you haven't reset the switches for a while, it might be you have a memory leak and insufficient memory remaining for buffers.

Alberto Seg · ‎09-26-2016

Hi all,

Finally the problem was the bug CSCub04965 - TCP Session hung causing Packet Loss

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCub04965/?reffering_site=dumpcr

Regards

packet loss between point to point connections in Catalyst 3750