Re: Hi, Josef!

SmirnovEug · ‎12-18-2015

Hi all!

We run pretty developed network infrastructure with 100+ offices on all continents. Recently we have performed network hardware upgrade and installed 2 pairs of Cisco ASR1001 in two key offices separated by Atlantic ocean. We use GRE point-to-point tunnels for inter-office routing, and we run OSPF on them.

The issue is as follows. At these two offices (the network latency between them is ~150 ms) we have 500 Mbps internet access lines. According to graphs that we see in our monitoring system, we barely reach half of allocated bandwidth of internet connections in both locations, nevertheless we can barely reach the aggregated* speed of 200 Mbps on tunnel interfaces between them. That is, internet access lines on both sides are not overloaded with traffic, but we cannot attain aggregated speed close to CIR of WAN link on tunnel interfaces neither in Office1, nor in Office2.

Several tests were performed and below are provided the results that we've obtained:

1st setup: laptop - switch - ISP - Internet - ISP - switch - laptop

Both laptops run Linux OS. When using ftp download between these two locations we were able to get 300+ Mbps speed of file transfer after we tuned some of sysctl variables on which TCP congestion window depends, because we have TCP path with High Bandwidth x Delay Product - there was a task to get maximum possible speed of data transfer between very distant locations using only one TCP connection.

--------------- GRE Tunnel --------------------

| |

2nd setup: laptop - switch - ASR1001 - switch - ISP - Internet - ISP - switch - ASR1001 - switch - laptop

Those same two laptops showed ~30 Mbps MAX speed of transfer of the same file.

I've created parallel GRE tunnel that was not used by users in both these offices, and created two static host-routes for these two laptops, which were pointing to newly created empty GRE tunnel. By using traceroute command i verified that the traffic for the remote host goes through the intended tunnel. The maximum speed that was achieved in this case was again ~30 Mbps.

Could someone explain why i could not get higher speeds of data transfer provided that the access lines were utilized not more than a half of their CIR ?

______________________________________________________________

* I mean this was the speed of all the flows that were generated at the moment by all user applications in one direction.

Thank you!

Joseph W. Doherty · ‎12-18-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Two common issue to watch for when dealing with GRE and LFNs, you want to insure there's as little IP fragmentation as possible with the former, and that TCP hosts have RWIN sized for the BDP for the latter.

Oh, also keep in mind, TCP packet loss, that causes TCP to go into congestion avoidance, tends to very much slow transfer rate on high latency (long distance) network. It also slows TCP slow start, but impact isn't as bad as it is for CA.

Saurabh Gera · ‎12-18-2015

MTU ?

Path Discovery ?

&

CPU on Both of Box after you run tunnel ?

Regards,

Saurabh Gera

SmirnovEug · ‎12-23-2015

Hi Saurabh!

MTU ? - No, there is no fragmentation, I have double checked it

Path Discovery ? - We don't need this because at TCP level two hosts agree on MSS that, after addition of IP header, produces MTU less or equal to IP MTU set on interface Tunnel

&

CPU on Both of Box after you run tunnel ? - less than 1%

Interface Tunnel configuration is as follows (sensitive data hidden):

interface Tunnel264
description Backup Data 'One' - 'Two'
ip address x.x.x.x 255.255.255.252
ip mtu 1400
ip tcp adjust-mss 1360
ip ospf cost 15
keepalive 5 4
tunnel source x.x.x.x
tunnel destination x.x.x.x
tunnel bandwidth transmit 300000
tunnel bandwidth receive 300000
end

Thanks!

SmirnovEug · ‎12-23-2015

Hi, Josef!

Thank you for reply!

I have checked:

1) There is no fragmentation due to "ip tcp adjust-mss 1360" command issued in interface Tunnel configuration mode.

2) Following are tcp parameters after tuning on both sides:

net.core.wmem_max = 16777216
net.core.rmem_max = 16777216

net.core.wmem_default = 284284
net.core.rmem_default = 229376

net.ipv4.tcp_wmem = 4096 284284 16777216
net.ipv4.tcp_rmem = 4096 284284 16777216

Other sysctl parameters were not adjusted.

Will check how congestion avoidance operates and how frequent TCP losses are. Now one of two algorithms are used:

net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_available_congestion_control = cubic reno
net.ipv4.tcp_allowed_congestion_control = cubic reno

Also we use selective acknowledgments:

net.ipv4.tcp_dsack = 1
net.ipv4.tcp_sack = 1

Thank you!

Vijey · ‎03-08-2019

Hi,

Have you resolved this issue?? any solution found??

I am also facing similar issue in our environment

SmirnovEug · ‎04-03-2019

Yes. The issue didn't relate to GRE tunnels performance - actually we moved to IPsec VTIs since then. The issue was mostly due long fat pipe between offices so we needed to tune TCP parameters to achieve a bit better throughput. But still some teams report low throughput but they use Windows platforms to send data over the ocean and I didn't find congestion avoidance algorithm that would show better figures. It was only for Linux that I achieved 300+ Mbps end-to-end over the Atlantic ocean.

BTW: if average traffic load that you see in monitoring system exceeds 1/3d of interface bandwidth - packet drops start to occur, if average traffic load exceeds 1/2 of interface bandwidth - packet drops are constant and quite intense because of the nature of bulk traffic - it is bursty. So if you loose a TCP segment on outgoing direction to the remote office over the ocean - your local server will know about this event after 150+ ms - that is with a huge delay and this makes reliable data transfer much more slow. That's why you need TCP Selective Acks, TCP Window Scaling, and more aggressive congestion avoidance algorithm that adjusts Transmitter Window by analyzing changes in delay of incoming TCP Acknowledgments rather than relying on TCP drops and which sends larger bulks of data (TCP Sender Window Scaling).

Hope that helps

Joseph W. Doherty · ‎04-03-2019

Ah, nice to read confirmation of what I posted 4 years ago, i.e. you've confirmed your issues were "typical" LFN issues.

Regarding Windows performance, yea it's often not on par with Linux, especially if using the latest Linux TCP variants. Don't know if it's still true with Windows, but to get it to full rate on a LFN, you generally would need to enable CTCP on clients (believe it's on by default on Windows servers).

Generally on any TCP variant, packet lost recovery time is going to be an issue. The best way to deal with this is it use a file transfer utility that can slice and dice a large data transfer into multiple flows, then (hopefully) when one flow slows due to a packet lost, the other flows continue to push data.

Low performance of GRE tunnels without encryption on ASR1001