TCP reassembly queue overflow?

emphillips00 · ‎09-18-2007

Hi all,

I am getting the following message about once every 30 seconds:

09-18-2007 14:12:03 Syslog.Warning 10.2.3.254 145322: Sep 18 19:12:39.339: %FW-4-TCP_OoO_SEG: Dropping TCP Segment: seq:357141655 1500 bytes is out-of-order; expected seq:357116835. Reason: TCP reassembly queue overflow - session 10.1.4.4:52255 to 64.15.119.173:80

It is always from a different source and to a different destination.

Googling that error provided only one reply, and that was to a Chinese website.

What does TCP reassembly queue overflow mean? I am assuming it is referring to resequencing out of sequence TCP packets. Do you all see this too on your Internet-facing routers? And how do I stop it from filling up my syslogs?

CPU usage is at ~10%, so it is not a CPU problem. The router is a 2821. The connection that is reporting the drops is a multilink PPP with two T1s in it. Here is a display from the past month of a "show int multi 1"

Input queue: 1/75/359/0 (size/max/drops/flushes)

I am using fifo, not any sort of special queuing on this port.

Any thoughts?

Thanks,

Eric

tdrais · ‎09-18-2007

It basically means that it got the end of the packet before a previous part arrived. Since the router must hold the end part until the first part comes to reassemble the packet correctly it uses memory. This message mean that the router hit a limit on how long it could hold the partial packet and had to discard it.

You may be able to adjust this memory but you are better off tring to fix the cause.

Since this is a ISP connection it is a little harder to troubleshoot than if you controlled both ends but A good ISP will work with you.

1st verify that both lines are clean. You should verify that you are getting no errors on the underlining T1 lines. You will need to check both end of both lines, clear the counters and verify that when you receive one of these messages the error counters do not change.

Next verify that the latency on both lines is the same. This is hard to do onces you are up and running. I normally do this at install by running the lines as separate point to point connection and testing them. With your ISP help you could take a line out of the multilink bundle and redefine it as a single point to point to test and then put it back. It does not take a lot of difference in latency to cause the problem you are having. Unfortuanly if you find this issue it is almost impossible to fix since this is related to line provisioning. We always request that both lines are provisioned identically when we plan to run MPPP. Even then I still have gotten lines with huge differences.

The lazy way to fix this is to work with your ISP and have them turn off the packet fragmentation. It will then more or less load balance by packet like cisco CEF does. The down side is that more packets will be delivered out of order to the client machines.

Most machines can tolerate this more than a router but it will depend on the application.