02-21-2017 04:47 PM - edited 03-05-2019 08:04 AM
Hi all,
can you please help me troubleshoot an issue we are seeing? It seems we are getting TCP drops but I don't know where. Running wireshark shows TCP drops and you can see where TCP window sizes are being affected by the drops.
I’ve done some testing with large amounts of data. Copying files from Site A (APGRPRT0001) to Site B (AFURX0001). Sanitized Router configs are attached. Routers at Site A and B have all interfaces connected at 1Gbps. The WAN link at Site A is 100Mbps, Site B is 200Mbps.
How the testing was performed…
Used Robocopy to transfer a 330MB file at a limit of 30Mbps (30Mbps is well under our QoS shapers, with no congestion on the link)
Packets are being marked as AF41 as per MQC.
Installed an ACL to capture the number of packets leaving Site A router
Installed an ACL to capture the number of packets arriving at Site B router
Here’s what I’ve noticed…
With QoS configured on the Interface, from Site A to Site B about 2000 packets get dropped during the robocopy.
Without QoS configured on the Interface, from Site A to Site B about 600 packets get dropped during the robocopy.
With QoS Applied…
APGRPRT0001#show ip access-list 199
Extended IP access list 199
10 permit ip host 10.30.128.26 host 10.96.20.149 (231652 matches)
20 permit ip any any (66946 matches)
APGRPRT0001#sh policy-map interface | in af41
ip dscp af41
ip dscp af41
Match: ip dscp af41 (34) af42 (36) af43 (38)
af41 232370/346533738 59/89326 2537/3834638 182 208 1/10
AFURX0001#show ip access-list 199
Extended IP access list 199
6 permit ip host 10.30.16.163 host 10.96.20.149
10 permit ip host 10.30.128.26 host 10.96.20.149 (229064 matches)
20 permit ip any any (288686 matches)
AFURX0001#sh policy-map interface | in af41
ip dscp af41
ip dscp af41
ip dscp af41
ip dscp af41
ip dscp af41
af41 87/5661 0/0 0/0 16 32 1/10
af41 8941/4301327 0/0 0/0 104 208 1/10
af41 3125/506129 0/0 0/0 104 208 1/10
af41 4573/2135822 0/0 0/0 83 166 1/10
ip dscp af41
Without QoS applied
APGRPRT0001#show ip access-list 199
Extended IP access list 199
10 permit ip host 10.30.128.26 host 10.96.20.149 (229175 matches)
20 permit ip any any (89484 matches)
AFURX0001#show ip access-list 199
Extended IP access list 199
6 permit ip host 10.30.16.163 host 10.96.20.149
10 permit ip host 10.30.128.26 host 10.96.20.149 (228516 matches)
20 permit ip any any (582056 matches)
What can I do to troubleshoot this issue?
thanks
Dave
02-21-2017 10:08 PM
I take it the 2 sites are not directly connected as I don't see any matching IP addresses, Is this on some sort of Telco supplied MPLS ?
02-22-2017 07:39 PM
Hi Richard,
yes, there is an MPLS cloud between the sites. I'm trying to determine if the fault is on our end, and if so what can we do to fix it. Or if it's the Telco and hand ball the problem to them.
thanks
Dave
02-23-2017 05:42 AM
As you're counting packets on both sides, to reveal lost packets, how do those lost packet counts compare with your egress interface drop counts? I.e. comparing those values, you should be able to determine where the drops are happening.
I see you're shaping for 100 Mbps, which is good, but I believe most Cisco shapers don't count L2, they count L3. If that's true for you, you'll need to shape slower than the L2 CIR rate. Unfortunately, L2 overhead varies based on packet size, but I've found allowing for about 15% L2 overhead usually works well.
I also noticed you're using random-detect. Unless you're a QoS expert, I recommend you don't use it. Instead I recommend using fair-queue. Also either WRED or FQ often needs parameter tuning when using with high-speed WANs, as their defaults are more suitable for low-speed WANs or LANs.
02-23-2017 11:00 PM
Hi Joseph,
thanks for your help, I've configured fair-queue and dropped the shaper to 85Mbps. Thanks for those tips.
I ran the copy again and am still getting drops.
Here's the output...
interface GigabitEthernet0/0/3
description TO TELSTRA PE
bandwidth 100000
ip address xxxxx
ip wccp 62 redirect in
ip flow monitor NTA_MONITOR input
ip flow monitor NTA_MONITOR output
ip access-group 199 out
negotiation auto
service-policy output QoS_SHAPE_85
end
GigabitEthernet0/0/3
Service-policy output: QoS_SHAPE_85
Class-map: class-default (match-any)
345974 packets, 372149552 bytes
5 minute offered rate 8610000 bps, drop rate 31000 bps
Match: any
Queueing
queue limit 353 packets
(queue depth/total drops/no-buffer drops) 0/980/0
(pkts output/bytes output) 356235/372507070
shape (average) cir 85000000, bc 850000, be 850000
target shape rate 85000000
Service-policy : QoS_WAN_OUT
queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 1451/261131
Class-map: RealTime_EF (match-any)
1451 packets, 261131 bytes
5 minute offered rate 4000 bps, drop rate 0000 bps
Match: ip dscp ef (46)
Match: ip precedence 5
Priority: 250 kbps, burst bytes 6250, b/w exceed drops: 0
Class-map: AF4x (match-any)
238989 packets, 348341215 bytes
5 minute offered rate 8071000 bps, drop rate 29000 bps
Match: ip precedence 4
Match: ip dscp af41 (34) af42 (36) af43 (38)
Queueing
queue limit 353 packets
(queue depth/total drops/no-buffer drops/flowdrops) 0/980/0/980
(pkts output/bytes output) 238893/346941621
bandwidth remaining 20%
Fair-queue: per-flow queue limit 88 packets
APGRPRT0001#sh int g0/0/3
GigabitEthernet0/0/3 is up, line protocol is up
Hardware is ISR4451-X-4x1GE, address is a0e0.afd4.0023 (bia a0e0.afd4.0023)
Description: TO TELSTRA PE
Internet address is 172.24.70.62/29
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 43/255, rxload 18/255
Encapsulation ARPA, loopback not set
Keepalive not supported
Full Duplex, 1000Mbps, link type is auto, media type is RJ45
output flow-control is off, input flow-control is off
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:02:17, output 00:00:32, output hang never
Last clearing of "show interface" counters 00:02:56
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 980
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 7320000 bits/sec, 1801 packets/sec
5 minute output rate 17057000 bits/sec, 2289 packets/sec
355966 packets input, 202384986 bytes, 0 no buffer
Received 1 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
435274 packets output, 389681587 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
thanks
Dave
02-23-2017 11:29 PM
Also, if I remove the service-policy from the G0/0/3 interface the rate of the transfer decreases from about 80Mbps to about 30Mbps, yet there are no drops on the interface. Our SP also doesn't see any drops on there network.
02-24-2017 04:50 AM
Uh-huh - w/o the service policy, you don't see any interface drops (which makes sense) but SP doesn't see any drops (which doesn't make sense) although your ACL counters show missing packets end-to-end. One might wonder about the abilities of your SP.
When you remove your service policy, and transfer rate decreases, that generally indicates over subscription is causing issues in the SP network which your service policy otherwise precludes.
02-27-2017 05:09 PM
We ran the tests again and did see drops on the shape policy of the SP after we removed our shaper. the shaper of the SP is set to 100Mb.
So I need to find out why with fair queue or WRED configured we are dropping packets on our router when there is no congestion.
thanks
Dave
02-27-2017 11:02 PM
When I increase the queue limit on AF41 queue the drops disappear.
02-28-2017 02:04 AM
Then you're dealing with burst congestion. Which, the common fix for that issue is - increasing queue size.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide