Re: SSL stickiness with src-ip does not work

mchockalingam · ‎08-07-2006

Hi All,

I recently moved the backend of the CSS and associated servers to a new vlan. 2 days prior to that, there was a database upgrade for the same application.

Ever since the vlan change, customers are complaining that they are forced out of the application automatically.

SSL termination is done on the CSS and I am using advance-balance sticky-srcip for stickiness. I did some packet captures and it shows that the users are sticking to one server and suddenly they do ssl handshake with the second server and then I get a reset from the client. So, it looks like stickiness based on the sourceip is not working.

I do not know why it worked for 7 months and when I made the vlan change it stopped working. We bypassed the CSS and users were directly accessing the server and they were not being forced out. I have the following config

content www.sm.com

vip address x.x.26.11

add service smweb02-80

add service smweb01-80

protocol tcp

advanced-balance sticky-srcip

port 81

active

content www.sm.com-decrypt

vip address x.x.26.11

add service redirect-sm

add service smweb02-80

add service smweb01-80

advanced-balance sticky-srcip

protocol tcp

port 81

url "/SM_Web_Net*"

active

The packet capture shows that the application is using SSLv3. Can I just replace the sticky-sourceip with just ssl on the above content rule?

thanks,

Meena

Gilles Dufour · ‎08-08-2006

We'll need a complete trace showing the issue.

Are your clients complaining about the issue going through a proxy ?

You can't use ssl stickyness because on the backend you are not doing ssl but plain http.

Moreover, I'm 100% sure that sticky-src ip works. There must something else breaking.

Most of the time issues araise when sessions stay idle.

So I would suggest to blindly increase the idle timeout with the command 'flow-timeout-multiplier 30'. Let me know if it does improve.

If not, you will need a sniffer trace front-end and backend showing the issue.

Gilles.

mchockalingam · ‎08-08-2006

Gilles,

I added the "flow-timeout-multipler 30" to both my content rules above and now customer say their session hangs after 5 minutes and they have to kill it and start a new session often

9:56 am logged in

10:01 am Application ?hung? - ?Error on Page? and ?javascript:voidProcs_Close()

10:05 am logged in

10:09 am timed out

10:10 am logged in

10:12 am timed out

10:13 am logged in

10:18 am timed out

mchockalingam · ‎08-08-2006

One of the server's database instance changed into dev environment causing the problem.

The problem went away now and do not know if the flow-timeout caused it or the application caused it.

I have a feeling that the database caused it to begin with.

Gilles Dufour · ‎08-09-2006

what about stickyness ?

Is it working now ?

Gilles.

mchockalingam · ‎08-09-2006

Looks like it is working. I tried packet capture and let it run for more than an hour and customers never had problem and after 2 retries I decided that everything is back to normal.

Like you said, stickiness was never a problem I think. It is just the application DB upgrade in combination of the dev instance caused the whole thing. Just because, CSS and servers were moved to a new vlan, it was initially thought of as the CSS problem.

Thanks for your help.

Meena

mchockalingam · ‎08-10-2006

The problem appeared again today.

Packet capture on both sides of the CSS showed that the client is sticking to server1 (x.x.235.11) and then initiates a connection to server2 (x.x.235.12) which I do not see it on the front end but only see it on the backend. Looks like the client is bypassing the CSS.

The client side packet capture did not show any traffic bypassing the CSS but still they were forced out. But 2 packet capture showed more traffic on the backend that I could not match up on the frontend.

I am not sure if source group NAT would work here since the client is bypassing the CSS and not the server.

I can't explaian the reason for the asymmetric routing though.

mchockalingam · ‎08-25-2006

Looks like we may have found the problem.

When the IP address on the servers were changed, the dual NICs on the servers were made active and configured for load balancing. Before that, only one NIC was active at any time. I am pretty sure that dual NICs with load balancing is causing the issue with the upstream CSS.

Servers were moved out of the CSS so that customers won't complain and so I am in the process of deploying couple of test servers behind the CSS to test this theory. Just curious to see if anyone else has experienced this before.