Solved: Confusion Exploring with TCP Scaling Option

Shirshendu Nandi · ‎02-14-2013

Confusion Exploring with TCP Scaling Option

Hi I have a confusion Appreciate if you can clarify that

Suppose i set my TCP window size at 17 MB ( I chnaged default TCP value of 64K) in both my server end.Using TCP scaling Now i want to know

1> To achieve full throughput,only on the endpoints TCP scaling need to be done or also in all intermediate swiches and routers has to be set at this scaling Value of 17 M .

2> Does the TCP Window Size value is per connection basis or shared .Means if a first client connect and get window size of 17 M and starts transferring Data at full rate , if another client want to connect in the same time on the same TCP Port what window size he will get ? Will he get leftover window size of the first connection OR again another fresh 17M window will be allocated to this new connection and transfer will be done on best available BW

3> Suppose My server have three 1 G Network card so that window value should I set is is 17 M or 17M X3 value

4> whether the window size is applicable for per connection or Per Port or Per interface values

I have this problem

0 Votes

Joseph W. Doherty · ‎02-14-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

1> To achieve full throughput,only on  the endpoints TCP scaling need to be done or also in all intermediate swiches and routers has to be set at this scaling  Value of 17 M .

Well first, from your diagram, if RTT is 30 ms and bandwidth is gig, you only need a receive buffer of 3,750,000 bytes, not 17 MB. (This is the BDP; the optimal value.)

Assuming your end-to-end bandwidth is gig, intermediate switches would need no buffer resources. (gig in, gig out)

If there's a bandwidth mismatch, then you need to allow for buffering at the bottleneck. For example, say your two servers have 10g links to the routers, and the routers only have the gig link. The servers then, can send at 10x the rate the routers can send. So you need to allow for the biggest burst and the speed difference. If RWIN allows up to 3.75 MB in-flight, and if TCP slow start doubles its transmission window every successful transmission window round-trip, biggest burst should be 1.875 MB. As router will drain 10g as it's arriving, the first hop router should allow for about 90% of the biggest burst, or (about) 1.7 MB.

Regarding questions 2, 3 and 4, I believe most hosts offer the RWIN allocation to each TCP session. Newer TCP implementations will hopefully back RWIN with "sparse" virtual memory.

View solution in original post

Joseph W. Doherty · ‎02-16-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

The TCP widowsize, on a Cisco devices, sets the TCP RWIN for the device when it's acting as a TCP host. The newer IOSs also support scaled (i.e. >64 KB). However, that's not the command for adjusting interface egress buffering. (NB: increasing this value, from its default, may increase rate of TCP downloads to the device, or it may increase BGP routing information transfer rate.)

Egress buffering is interface and feature set dependent. Usually there's some command to set the maximum number of packets. If you "know" what's your expected packet size will be, you divide your desired buffer size by packet size. E.g. 1.7 MB / 1500 = (about) 1,134 packets.

When working with very large BDPs, you might find the Cisco device won't support the optimal buffer depth. One work around, if your application supports it, is to use multiple concurrent TCP flows that are not exactly concurrent in time. This because you only need the maximum egress buffer during TCP slow-start. Once the flow has its maximum RWIN "in-flight" it self-clocks, i.e. it will only send packets as it receives ACKs.

The "self-clocking" will also allow a TCP flow to increase its transmission rate, without requiring huge egress buffers, while in congestion avoidance mode. Unfortunately, when working with large BDP due to latency, congestion avoidance increases tranmission rate very slowly.

View solution in original post

Peter Paluch · ‎02-14-2013

Hello,

1> To achieve full throughput,only on  the endpoints TCP scaling need  to be done or also in all intermediate swiches and routers has to be  set at this scaling  Value of 17 M .

The window scaling has to be configured only on the endpoints. Intermediate devices do not participate in TCP sessions they carry.

2> Does the TCP Window Size value is per connection basis or shared

This is a great question. To be honest, I believe this depends on the implementation of the TCP/IP driver in the particular operating system. Technically, the TCP receive window is the size of the memory buffer in the operating system that is used to store and defragment TCP segments. Whether this buffer is shared or is unique per connection is up to the decision of the implementor of the operating system.

3> Suppose My server have three 1 G Network card so that window value should I set is is 17 M or 17M X3 value

My choice would be 17M.

4> whether  the window size is applicable for  per connection or Per Port or Per interface values

Same as 2>.

I hope Joe Doherty joins this thread. Joe, I would love to hear your opinion on this!

Best regards,

Peter

Joseph W. Doherty · ‎02-14-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

1> To achieve full throughput,only on  the endpoints TCP scaling need to be done or also in all intermediate swiches and routers has to be set at this scaling  Value of 17 M .

Well first, from your diagram, if RTT is 30 ms and bandwidth is gig, you only need a receive buffer of 3,750,000 bytes, not 17 MB. (This is the BDP; the optimal value.)

Assuming your end-to-end bandwidth is gig, intermediate switches would need no buffer resources. (gig in, gig out)

If there's a bandwidth mismatch, then you need to allow for buffering at the bottleneck. For example, say your two servers have 10g links to the routers, and the routers only have the gig link. The servers then, can send at 10x the rate the routers can send. So you need to allow for the biggest burst and the speed difference. If RWIN allows up to 3.75 MB in-flight, and if TCP slow start doubles its transmission window every successful transmission window round-trip, biggest burst should be 1.875 MB. As router will drain 10g as it's arriving, the first hop router should allow for about 90% of the biggest burst, or (about) 1.7 MB.

Regarding questions 2, 3 and 4, I believe most hosts offer the RWIN allocation to each TCP session. Newer TCP implementations will hopefully back RWIN with "sparse" virtual memory.

Shirshendu Nandi · ‎02-15-2013

What is Normal Buffering Value of routers

Joseph W. Doherty · ‎02-15-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

What is Normal Buffering Value of routers

Believe it varies based on platform and interface. Many of the older smaller router platforms often defaulted to 40 packets for the interface.

Shirshendu Nandi · ‎02-16-2013

Hi Joe

Thanks for your replies . How do we change the default Buffering value . ip TCP windowsize commnd ? OR some Other way

BR

Nandi

Shirshendu Nandi · ‎02-16-2013

TakingYour example If i need to allocate 1.7 M of buffersize what should i do .I mean what additional configs required to acheive this value

Joseph W. Doherty · ‎02-16-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

The TCP widowsize, on a Cisco devices, sets the TCP RWIN for the device when it's acting as a TCP host. The newer IOSs also support scaled (i.e. >64 KB). However, that's not the command for adjusting interface egress buffering. (NB: increasing this value, from its default, may increase rate of TCP downloads to the device, or it may increase BGP routing information transfer rate.)

Egress buffering is interface and feature set dependent. Usually there's some command to set the maximum number of packets. If you "know" what's your expected packet size will be, you divide your desired buffer size by packet size. E.g. 1.7 MB / 1500 = (about) 1,134 packets.

When working with very large BDPs, you might find the Cisco device won't support the optimal buffer depth. One work around, if your application supports it, is to use multiple concurrent TCP flows that are not exactly concurrent in time. This because you only need the maximum egress buffer during TCP slow-start. Once the flow has its maximum RWIN "in-flight" it self-clocks, i.e. it will only send packets as it receives ACKs.

The "self-clocking" will also allow a TCP flow to increase its transmission rate, without requiring huge egress buffers, while in congestion avoidance mode. Unfortunately, when working with large BDP due to latency, congestion avoidance increases tranmission rate very slowly.