cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3282
Views
5
Helpful
9
Replies

TCP Drops - Trying to troubleshoot

d.hodgson
Level 1
Level 1

Hi all,

can you please help me troubleshoot an issue we are seeing? It seems we are getting TCP drops but I don't know where. Running wireshark shows TCP drops and you can see where TCP window sizes are being affected by the drops.

I’ve done some testing with large amounts of data. Copying files from Site A (APGRPRT0001) to Site B (AFURX0001). Sanitized Router configs are attached. Routers at Site A and B have all interfaces connected at 1Gbps. The WAN link at Site A is 100Mbps, Site B is 200Mbps.

 

How the testing was performed…

 

Used Robocopy to transfer a 330MB file at a limit of 30Mbps (30Mbps is well under our QoS shapers, with no congestion on the link)

Packets are being marked as AF41 as per MQC.

Installed an ACL to capture the number of packets leaving Site A router

Installed an ACL to capture the number of packets arriving at Site B router

 

Here’s what I’ve noticed…

 

With QoS configured on the Interface, from Site A to Site B about 2000 packets get dropped during the robocopy.

Without QoS configured on the Interface, from Site A to Site B about 600 packets get dropped during the robocopy.

 

With QoS Applied…

 

APGRPRT0001#show ip access-list 199

Extended IP access list 199

    10 permit ip host 10.30.128.26 host 10.96.20.149 (231652 matches)

    20 permit ip any any (66946 matches)

 

APGRPRT0001#sh policy-map interface | in af41

        ip dscp af41

        ip dscp af41

          Match: ip dscp af41 (34) af42 (36) af43 (38)

            af41      232370/346533738      59/89326       2537/3834638          182           208  1/10

 

AFURX0001#show ip access-list 199

Extended IP access list 199

    6 permit ip host 10.30.16.163 host 10.96.20.149

    10 permit ip host 10.30.128.26 host 10.96.20.149 (229064 matches)

    20 permit ip any any (288686 matches)

AFURX0001#sh policy-map interface | in af41

        ip dscp af41

        ip dscp af41

        ip dscp af41

        ip dscp af41

        ip dscp af41

        af41          87/5661            0/0              0/0                 16            32  1/10

        af41        8941/4301327         0/0              0/0                104           208  1/10

        af41        3125/506129          0/0              0/0                104           208  1/10

        af41        4573/2135822         0/0              0/0                 83           166  1/10

        ip dscp af41

 

Without QoS applied

 

APGRPRT0001#show ip access-list 199

Extended IP access list 199

    10 permit ip host 10.30.128.26 host 10.96.20.149 (229175 matches)

    20 permit ip any any (89484 matches)

 

AFURX0001#show ip access-list 199

Extended IP access list 199

    6 permit ip host 10.30.16.163 host 10.96.20.149

    10 permit ip host 10.30.128.26 host 10.96.20.149 (228516 matches)

    20 permit ip any any (582056 matches)

What can I do to troubleshoot this issue?

thanks

Dave

9 Replies 9

I take it the 2 sites are not directly connected as I don't see any matching IP addresses, Is this on some sort of Telco supplied MPLS ?

Hi Richard,

yes, there is an MPLS cloud between the sites. I'm trying to determine if the fault is on our end, and if so what can we do to fix it. Or if it's the Telco and hand ball the problem to them.

thanks

Dave

Joseph W. Doherty
Hall of Fame
Hall of Fame

As you're counting packets on both sides, to reveal lost packets, how do those lost packet counts compare with your egress interface drop counts?  I.e. comparing those values, you should be able to determine where the drops are happening.

I see you're shaping for 100 Mbps, which is good, but I believe most Cisco shapers don't count L2, they count L3.  If that's true for you, you'll need to shape slower than the L2 CIR rate.  Unfortunately, L2 overhead varies based on packet size, but I've found allowing for about 15% L2 overhead usually works well.

I also noticed you're using random-detect.  Unless you're a QoS expert, I recommend you don't use it.  Instead I recommend using fair-queue.  Also either WRED or FQ often needs parameter tuning when using with high-speed WANs, as their defaults are more suitable for low-speed WANs or LANs.

Hi Joseph,

thanks for your help, I've configured fair-queue and dropped the shaper to 85Mbps. Thanks for those tips.

I ran the copy again and am still getting drops.

Here's the output...

interface GigabitEthernet0/0/3
description TO TELSTRA PE
bandwidth 100000
ip address xxxxx
ip wccp 62 redirect in
ip flow monitor NTA_MONITOR input
ip flow monitor NTA_MONITOR output
ip access-group 199 out
negotiation auto
service-policy output QoS_SHAPE_85
end

GigabitEthernet0/0/3

Service-policy output: QoS_SHAPE_85

Class-map: class-default (match-any)
345974 packets, 372149552 bytes
5 minute offered rate 8610000 bps, drop rate 31000 bps
Match: any
Queueing
queue limit 353 packets
(queue depth/total drops/no-buffer drops) 0/980/0
(pkts output/bytes output) 356235/372507070
shape (average) cir 85000000, bc 850000, be 850000
target shape rate 85000000

Service-policy : QoS_WAN_OUT

queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 1451/261131

Class-map: RealTime_EF (match-any)
1451 packets, 261131 bytes
5 minute offered rate 4000 bps, drop rate 0000 bps
Match: ip dscp ef (46)
Match: ip precedence 5
Priority: 250 kbps, burst bytes 6250, b/w exceed drops: 0

Class-map: AF4x (match-any)
238989 packets, 348341215 bytes
5 minute offered rate 8071000 bps, drop rate 29000 bps
Match: ip precedence 4
Match: ip dscp af41 (34) af42 (36) af43 (38)
Queueing
queue limit 353 packets
(queue depth/total drops/no-buffer drops/flowdrops) 0/980/0/980
(pkts output/bytes output) 238893/346941621
bandwidth remaining 20%
Fair-queue: per-flow queue limit 88 packets

APGRPRT0001#sh int g0/0/3
GigabitEthernet0/0/3 is up, line protocol is up
Hardware is ISR4451-X-4x1GE, address is a0e0.afd4.0023 (bia a0e0.afd4.0023)
Description: TO TELSTRA PE
Internet address is 172.24.70.62/29
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 43/255, rxload 18/255
Encapsulation ARPA, loopback not set
Keepalive not supported
Full Duplex, 1000Mbps, link type is auto, media type is RJ45
output flow-control is off, input flow-control is off
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:02:17, output 00:00:32, output hang never
Last clearing of "show interface" counters 00:02:56
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 980
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 7320000 bits/sec, 1801 packets/sec
5 minute output rate 17057000 bits/sec, 2289 packets/sec
355966 packets input, 202384986 bytes, 0 no buffer
Received 1 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
435274 packets output, 389681587 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

thanks

Dave

Also, if I remove the service-policy from the G0/0/3 interface the rate of the transfer decreases from about 80Mbps to about 30Mbps, yet there are no drops on the interface. Our SP also doesn't see any drops on there network.

Uh-huh - w/o the service policy, you don't see any interface drops (which makes sense) but SP doesn't see any drops (which doesn't make sense) although your ACL counters show missing packets end-to-end.  One might wonder about the abilities of your SP.

When you remove your service policy, and transfer rate decreases, that generally indicates over subscription is causing issues in the SP network which your service policy otherwise precludes.

We ran the tests again and did see drops on the shape policy of the SP after we removed our shaper. the shaper of the SP is set to 100Mb.

So I need to find out why with fair queue or WRED configured we are dropping packets on our router when there is no congestion.

thanks

Dave

When I increase the queue limit on AF41 queue the drops disappear.

Then you're dealing with burst congestion.  Which, the common fix for that issue is - increasing queue size.