Re: Throughput over DMVPN ipsce tunnel

nec82 · ‎09-30-2022

Hello colleagues! It might be a silly question but I've encountered with strange behavior. I tested throughput with iperf over an DMVPN ipsec tunnel and got some counterpart results. For instance, one session tcp test generates throughput no more 11 mbit/s, the 10 session tcp about 70-80 mbit/s. The same test with UDP might generate 50-70 mbit/s flow but with some packet loss (about 6-10 percent). Here is the scheme

Server - ISR4221 (GREoverIPSEC(MTU 1400) over PPPOE Dialer with MTU 1492) - Internet -(GREoverIPSEC(MTU 1400) over Eth) ISR4331 - Server .

All tunnels have ADJ TCP MSS 1360 and MTU 1400. Ping with overall size 1400 and df-bit between servers works fine.

While testing I haven't seen any errors on both side like CERM LIMITS or overload of PPEs. In addition, I tried packet-tracer on ISR 4221 and didn't see any packet drops. Here is statistics

show platform packet-trace statistics
Packets Summary
Matched 80661
Traced 50400
Packets Received
Ingress 30261
Inject 0
Packets Processed
Forward 55461
Punt 0
Drop 0
Consume 0

show platform hardware qfp active datapath utilization
CPP 0: Subdev 0 5 secs 1 min 5 min 60 min
Input: Priority (pps) 0 0 0 0
(bps) 0 0 0 0
Non-Priority (pps) 5359 5131 4314 4098
(bps) 33659800 32960664 22664464 18163528
Total (pps) 5359 5131 4314 4098
(bps) 33659800 32960664 22664464 18163528
Output: Priority (pps) 0 0 0 0
(bps) 0 0 0 0
Non-Priority (pps) 5292 5071 4256 4076
(bps) 32410936 31830096 22183672 18079072
Total (pps) 5292 5071 4256 4076
(bps) 32410936 31830096 22183672 18079072
Processing: Load (pct) 20 17 14 13

So the question is why one session tcp flow has so low throughput ? Is there a way how to overcome such an issue?

Georg Pauwen · ‎09-30-2022

Hello,

are you using the values below on your tunnel interfaces ? And if not, can you configure these and test again ?

interface Tunnel0
ip mtu 1400
ip tcp adjust-mss 1360

nec82 · ‎09-30-2022

I'm using exactly the same values

Georg Pauwen · ‎09-30-2022

Hello,

my apologies, I saw that you already mentioned that you are using these values in your original post...

Either way, what you are experiencing might just be related to the way TCP and UDP work. I have found the article below that explains it quite well I think:

--> TCP is a connection-oriented protocol, whereas UDP is a connectionless protocol. A key difference between TCP and UDP is speed, as TCP is comparatively slower than UDP. Overall, UDP is a much faster, simpler, and efficient protocol, however, retransmission of lost data packets is only possible with TCP.

https://www.lifesize.com/en/blog/tcp-vs-udp/#:~:text=TCP%20is%20a%20connection%2Doriented,is%20only%20possible%20with%20TCP.

Joseph W. Doherty · ‎09-30-2022

BTW, I would take some of the information provided in the Lifesize Blog, with a very large grain of salt.

TCP vs. UDP, in actual usage, and their impact on networks, is, just a bit, more complex than what the article describes.

Without going into a full desertion on TCP vs. UDP, but if you believe the forgoing article's "But because UDP avoids the unnecessary overheads of TCP transport, it’s incredibly efficient in terms of bandwidth, and much less demanding of poor performing networks, as well." I would recommend anyone start with https://en.wikipedia.org/wiki/Network_congestion#Congestive_collapse. You might also look deeper into, applications that use UDP, because although they don't really need all of TCP features, often "re-implement" some key features of TCP. UDP usage can easily lead to "poor performing networks" and w/o using some key TCP features can easily lead to congestion collapse.

Basically, "buyer beware" when reading any vendor's literature touting the benefits of their application when they use UDP and/or their "better than TCP" proprietary protocol. (NB: BTW, not saying all UDP applications are bad, but TCP's features also include avoiding network collapse. Also, TCP isn't quite as bad as it is often purported to be, often one only needs to better understand it [see research subjects, mentioned in my prior posting].)

Giuseppe Larosa · ‎09-30-2022

Hello @nec82 ,

>> For instance, one session tcp test generates throughput no more 11 mbit/s, the 10 session tcp about 70-80 mbit/s. The same test with UDP might generate 50-70 mbit/s flow but with some packet loss (about 6-10 percent).

>> So the question is why one session tcp flow has so low throughput ? Is there a way how to overcome such an issue?

a TCP sender has to wait for an ACK from the receiver and the TCP window size plays a role here. Better results can be achieved if the two TCP endpoints implement the extended TCP window.

I would say your results are reasonable with standard TCP windows size.

Hope to help

Giuseppe

nec82 · ‎09-30-2022

Thanks for explantion. Today I tried the same tests with no encryption on routers (just gre). Here are results:

One session tcp generates about 17mbit/s in comparison with the same test with encryption which gives me 7-10mbit/s

I checked PPE utilization on both routers while doing testing with encryption. The result is no huge utilization (about 30 percent) and no CERM LIMIT messages. Look like it is not the routers performance but why throughput might differ by two times?

MHM Cisco World · ‎09-30-2022

router#show platform cerm-information | include pkt

can you share output of this ?

Joseph W. Doherty · ‎09-30-2022

Insufficient information to really say what's the cause of your issues, but in general, what @Giuseppe Larosa mentions about TCP window sizes, and ACKs, are very important when it comes to TCP transfer rate (a few other variable come into play, too). (More on this subject might be found by searching the Internet for BDP [bandwidth delay product] and also LFN [long fat networks], the latter, though, possibly not quite as impactful in your case.)

There are other variables, too, that would impact your transfer rate. For example, interesting is your UDP loss rate. (TCP is much slowed by packet loss, especially if it trigger time outs.) Also interesting is your PPPoE (with its reduced MTU).

With your UDP transfer rate stats, I wondering whether you might need a shaper. It's seems counter intuitive, but by slowing your transmission rate from your router, effect transfer rate might increase.

Regarding lack of information, to start with, what's the physical interface bandwidths at your two sites. Do either of your site's physical Internet connections have a ISP bandwidth cap, in and/or out? What's the usual latency between your two routers?

nec82 · ‎12-12-2022

Hello folks! The solution is to change ISP :). I don't know why but the old ISP somehow reduces throughput of IPSEC traffic

Joseph W. Doherty · ‎12-12-2022

Yea, changing ISP (and/or SP) can sometime do wonders. (Laugh - sometimes, explicitly, showing where their network is not delivering as "promised" sometimes does wonders too.)

An ISP can causes issues, even when they (almost always) say, it's not them. (Don't misunderstand, most of the time it's not them, but not always.)

Because of possible ISP issues, I earlier wrote "Do either of your site's physical Internet connections have a ISP bandwidth cap, in and/or out?"

This wasn't yet to imply an actual issue with your ISP, but just an initial item opening the door to considering the ISP aspect.

langhel18 · ‎05-21-2023

We were about to go the route of switching ISPs as well, but after our 3rd ticket with the carrier I finally got a good technician who worked with us and found the following:

https://www.lumen.com/help/en-us/network/resolving-throughput-issues.html

If your ISP is dealing with a separate carrier for the last mile, there is a chance this could be happening and was causing reordering of packets for us due to a hashing issue between provider protocols. Very strange, indeed but finding this answer helped my sanity!

Cisco TAC pretty much confirmed the reordering with some captures and ruled out our equipment.