08-07-2006 03:23 PM
Hi All,
I recently moved the backend of the CSS and associated servers to a new vlan. 2 days prior to that, there was a database upgrade for the same application.
Ever since the vlan change, customers are complaining that they are forced out of the application automatically.
SSL termination is done on the CSS and I am using advance-balance sticky-srcip for stickiness. I did some packet captures and it shows that the users are sticking to one server and suddenly they do ssl handshake with the second server and then I get a reset from the client. So, it looks like stickiness based on the sourceip is not working.
I do not know why it worked for 7 months and when I made the vlan change it stopped working. We bypassed the CSS and users were directly accessing the server and they were not being forced out. I have the following config
content www.sm.com
vip address x.x.26.11
add service smweb02-80
add service smweb01-80
protocol tcp
advanced-balance sticky-srcip
port 81
active
content www.sm.com-decrypt
vip address x.x.26.11
add service redirect-sm
add service smweb02-80
add service smweb01-80
advanced-balance sticky-srcip
protocol tcp
port 81
url "/SM_Web_Net*"
active
The packet capture shows that the application is using SSLv3. Can I just replace the sticky-sourceip with just ssl on the above content rule?
thanks,
Meena
08-08-2006 02:44 AM
We'll need a complete trace showing the issue.
Are your clients complaining about the issue going through a proxy ?
You can't use ssl stickyness because on the backend you are not doing ssl but plain http.
Moreover, I'm 100% sure that sticky-src ip works. There must something else breaking.
Most of the time issues araise when sessions stay idle.
So I would suggest to blindly increase the idle timeout with the command 'flow-timeout-multiplier 30'. Let me know if it does improve.
If not, you will need a sniffer trace front-end and backend showing the issue.
Gilles.
08-08-2006 07:54 AM
Gilles,
I added the "flow-timeout-multipler 30" to both my content rules above and now customer say their session hangs after 5 minutes and they have to kill it and start a new session often
9:56 am logged in
10:01 am Application ?hung? - ?Error on Page? and ?javascript:voidProcs_Close()
10:05 am logged in
10:09 am timed out
10:10 am logged in
10:12 am timed out
10:13 am logged in
10:18 am timed out
08-08-2006 05:45 PM
One of the server's database instance changed into dev environment causing the problem.
The problem went away now and do not know if the flow-timeout caused it or the application caused it.
I have a feeling that the database caused it to begin with.
08-09-2006 02:14 AM
what about stickyness ?
Is it working now ?
Gilles.
08-09-2006 04:19 AM
Looks like it is working. I tried packet capture and let it run for more than an hour and customers never had problem and after 2 retries I decided that everything is back to normal.
Like you said, stickiness was never a problem I think. It is just the application DB upgrade in combination of the dev instance caused the whole thing. Just because, CSS and servers were moved to a new vlan, it was initially thought of as the CSS problem.
Thanks for your help.
Meena
08-10-2006 07:48 PM
The problem appeared again today.
Packet capture on both sides of the CSS showed that the client is sticking to server1 (x.x.235.11) and then initiates a connection to server2 (x.x.235.12) which I do not see it on the front end but only see it on the backend. Looks like the client is bypassing the CSS.
The client side packet capture did not show any traffic bypassing the CSS but still they were forced out. But 2 packet capture showed more traffic on the backend that I could not match up on the frontend.
I am not sure if source group NAT would work here since the client is bypassing the CSS and not the server.
I can't explaian the reason for the asymmetric routing though.
08-25-2006 11:48 AM
Looks like we may have found the problem.
When the IP address on the servers were changed, the dual NICs on the servers were made active and configured for load balancing. Before that, only one NIC was active at any time. I am pretty sure that dual NICs with load balancing is causing the issue with the upstream CSS.
Servers were moved out of the CSS so that customers won't complain and so I am in the process of deploying couple of test servers behind the CSS to test this theory. Just curious to see if anyone else has experienced this before.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide