CSS clearing the flows before the communication ends?

sajithvmg · ‎08-17-2005

We use Oracle Application servers (four of them) which serve jsp/js pages. This is currently load balanced and stickiness is maintaned using arrowpoint-cookie.

we get intermittent '408 -' errors on the server access log. On analysing using tcpdump, we found these '408 -' errors are appearing exactly after 5 minutes from the GET request has reached the server. This may be a simple GET request for a .js or .svg file. 5 minutes is the Timeout period [not the keepalive timeout] mentioned in the httpd.conf of the Application server.

Is it that the flow is deleted by the CSS [due to its garbage collection] while the server is using the flow for sending the data back?

Keepalive timeouts on our web/app servers are currently set to 15 seconds

These 408 errors are currently causing intermittent timeout/hang issues for the users.

An interesting point to note here is that these 408 errors were appearing initially on only one of the servers - serverA.Thinking that the server has an issue we shut it down and after almost 20 days, we found the 408 errors have now shifted to a different server - serverB.

We brought up serverA after a few days but the 408 errors remained on the same server - serverB.

We compared just about everything among these servers, starting from tcp parameters, network statistics, config files, Application server installation etc, but could not find anything conclusive..

We were able to reproduce the 408 error by pulling the network cable at the PC end, just after the GET request has reached the app server and that led us to the thought that CSS must be deleting the flows before they are formally closed.

Since these errors are happening only on one of the servers, it is a bit confusing..

Please give us your thoughts?

Gilles Dufour · ‎08-17-2005

When a CSS deletes a flow, there is no communication possible between client and server.

The client should even get a RESET if it tries to send requests to the server.

The 408 is generated by the server, so it means the flow is still there.

Did you see this 408 error with only 1 server ?

If this is the case I don't see how it could be a loadbalancing issue.

Can we see the sniffer trace and your config ?

Thanks,

Gilles.