12-18-2015 05:55 AM - edited 03-05-2019 02:58 AM
Hi all!
We run pretty developed network infrastructure with 100+ offices on all continents. Recently we have performed network hardware upgrade and installed 2 pairs of Cisco ASR1001 in two key offices separated by Atlantic ocean. We use GRE point-to-point tunnels for inter-office routing, and we run OSPF on them.
The issue is as follows. At these two offices (the network latency between them is ~150 ms) we have 500 Mbps internet access lines. According to graphs that we see in our monitoring system, we barely reach half of allocated bandwidth of internet connections in both locations, nevertheless we can barely reach the aggregated* speed of 200 Mbps on tunnel interfaces between them. That is, internet access lines on both sides are not overloaded with traffic, but we cannot attain aggregated speed close to CIR of WAN link on tunnel interfaces neither in Office1, nor in Office2.
Several tests were performed and below are provided the results that we've obtained:
1st setup: laptop - switch - ISP - Internet - ISP - switch - laptop
Both laptops run Linux OS. When using ftp download between these two locations we were able to get 300+ Mbps speed of file transfer after we tuned some of sysctl variables on which TCP congestion window depends, because we have TCP path with High Bandwidth x Delay Product - there was a task to get maximum possible speed of data transfer between very distant locations using only one TCP connection.
--------------- GRE Tunnel --------------------
| |
2nd setup: laptop - switch - ASR1001 - switch - ISP - Internet - ISP - switch - ASR1001 - switch - laptop
Those same two laptops showed ~30 Mbps MAX speed of transfer of the same file.
I've created parallel GRE tunnel that was not used by users in both these offices, and created two static host-routes for these two laptops, which were pointing to newly created empty GRE tunnel. By using traceroute command i verified that the traffic for the remote host goes through the intended tunnel. The maximum speed that was achieved in this case was again ~30 Mbps.
Could someone explain why i could not get higher speeds of data transfer provided that the access lines were utilized not more than a half of their CIR ?
______________________________________________________________
* I mean this was the speed of all the flows that were generated at the moment by all user applications in one direction.
Thank you!
12-18-2015 06:17 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Two common issue to watch for when dealing with GRE and LFNs, you want to insure there's as little IP fragmentation as possible with the former, and that TCP hosts have RWIN sized for the BDP for the latter.
Oh, also keep in mind, TCP packet loss, that causes TCP to go into congestion avoidance, tends to very much slow transfer rate on high latency (long distance) network. It also slows TCP slow start, but impact isn't as bad as it is for CA.
12-18-2015 07:00 AM
MTU ?
Path Discovery ?
&
CPU on Both of Box after you run tunnel ?
Regards,
Saurabh Gera
12-23-2015 05:53 AM
Hi Saurabh!
MTU ? - No, there is no fragmentation, I have double checked it
Path Discovery ? - We don't need this because at TCP level two hosts agree on MSS that, after addition of IP header, produces MTU less or equal to IP MTU set on interface Tunnel
&
CPU on Both of Box after you run tunnel ? - less than 1%
Interface Tunnel configuration is as follows (sensitive data hidden):
interface Tunnel264
description Backup Data 'One' - 'Two'
ip address x.x.x.x 255.255.255.252
ip mtu 1400
ip tcp adjust-mss 1360
ip ospf cost 15
keepalive 5 4
tunnel source x.x.x.x
tunnel destination x.x.x.x
tunnel bandwidth transmit 300000
tunnel bandwidth receive 300000
end
Thanks!
12-23-2015 06:54 AM
Hi, Josef!
Thank you for reply!
I have checked:
1) There is no fragmentation due to "ip tcp adjust-mss 1360" command issued in interface Tunnel configuration mode.
2) Following are tcp parameters after tuning on both sides:
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.core.wmem_default = 284284
net.core.rmem_default = 229376
net.ipv4.tcp_wmem = 4096 284284 16777216
net.ipv4.tcp_rmem = 4096 284284 16777216
Other sysctl parameters were not adjusted.
Will check how congestion avoidance operates and how frequent TCP losses are. Now one of two algorithms are used:
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_available_congestion_control = cubic reno
net.ipv4.tcp_allowed_congestion_control = cubic reno
Also we use selective acknowledgments:
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_sack = 1
Thank you!
03-08-2019 04:44 AM
Hi,
Have you resolved this issue?? any solution found??
I am also facing similar issue in our environment
04-03-2019 12:07 AM - edited 04-03-2019 12:18 AM
Yes. The issue didn't relate to GRE tunnels performance - actually we moved to IPsec VTIs since then. The issue was mostly due long fat pipe between offices so we needed to tune TCP parameters to achieve a bit better throughput. But still some teams report low throughput but they use Windows platforms to send data over the ocean and I didn't find congestion avoidance algorithm that would show better figures. It was only for Linux that I achieved 300+ Mbps end-to-end over the Atlantic ocean.
BTW: if average traffic load that you see in monitoring system exceeds 1/3d of interface bandwidth - packet drops start to occur, if average traffic load exceeds 1/2 of interface bandwidth - packet drops are constant and quite intense because of the nature of bulk traffic - it is bursty. So if you loose a TCP segment on outgoing direction to the remote office over the ocean - your local server will know about this event after 150+ ms - that is with a huge delay and this makes reliable data transfer much more slow. That's why you need TCP Selective Acks, TCP Window Scaling, and more aggressive congestion avoidance algorithm that adjusts Transmitter Window by analyzing changes in delay of incoming TCP Acknowledgments rather than relying on TCP drops and which sends larger bulks of data (TCP Sender Window Scaling).
Hope that helps
04-03-2019 04:59 AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide