cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3995
Views
0
Helpful
2
Replies

Newly Occuring CSS SSL Issue in Chrome, FF10, IE9 with L5 rules; 3 second delay, loss of L5 stickyness

joekislo
Level 1
Level 1

We recently started suffering an issue with our CSS11501S-K9 units not performing URL stickiness on our SSL wrapped L5 rules.  I've spent dozens of manhours working on the problem, and have quite a bit of information to report, including a solution.  There is a high probability that anybody who uses SSL to an L5 rule on a CSS unit will become affected by this problem over the next few weeks/months as users update their browsers with new SSL patches.  

We hadn't made any changes to our config in months, and eliminated hardware problems by testing a second unit. 

Here are the exact symptoms we saw:

  Browsers affected: Firefox 10, Chrome, IE9, others (and some earlier versions of IE depending on patch levels)

  Browsers not affected: FireFox 3.5, w3m 0.5.2, curl7.19.7

  Impact 1: For SSL Rules backed by L5 rules, the initial response to the first request would be 3 seconds.  Further requests on the same TCP connection would not be delayed

  Impact 2: L5 rules being accessed via SSL would nolonger perform any URL based stickiness.  Accessing the same rule skipping SSL, would work fine

I focused on the 3 second delay, since that was a new issue and was easier to debug than monitoring multiple servers to see if stickiness was broken.  This is what I found when a client tries to connect to an SSL rule that ultimately is routed to a L5 HTTP rule:

1. Client/CSS perform initial TLS handshake, crypto cyphers determined (nearly instantly)

2. Client sends HTTP 1.1 request for resource (nearly instantly)

3. 3 seconds of no traffic in our out of the CSS related to this request

4. CSS opens an HTTP connection to backend webserver, backend webserver responds (nearly instantly)

5. The CSS seems to route to the backend server using the balance method (round-robin) instead of the advanced-balance method (url)

6. Response is sent to the client with the resource (nearly instantly)

7. Future requests sent from the browser on the same TCP connection have no delay, but the advanced-balance continues to be ignored

The 3 seconds is quite an exact figure (within a few milliseconds) and appears to be entirely happening inside of the CSS unit itself, since it does not connect to the backend server until after the 3 seconds elapse.  3 seconds smelled like some sort of internal timeout set in the CSS unit after it gives up waiting for something.

Looking at the packets from affected browsers I discovered that the GET /foobar HTTP/1.1 request was being broken into two separate TLSv1 application messages, the first was 24 bytes and the second was 400 bytes.  Decrypting these messages I found the first message was a

G

and the second message was:

ET /foobar HTTP/1.1

This essentially splits the initial request the client is sending into two pieces.  This confuses wireshark so much, it doesn't decode this as a HTTP request, and just decodes it as "continuation or non-HTTP traffic".

On the working browsers I saw only one TLSv1 application message, decrypting it I saw:

GET /foobar HTTP/1.1

(obviously I'm simplifying the contents of the request, there were lots of headers and stuff)

I am aware that the CSS can't handle L5 rules appropriately if they get fragmented, so I suspected this was the problem.  I pulled a packet trace from a few years ago, and at that time confirmed we never saw a double TLSv1 application messages before. 

A number of openssl vulnerabilities were recently fixed: http://www.ubuntu.com/usn/usn-1357-1

and browsers may have been recently updated to fix some of these issues, changing the way they encode their traffic. 

Solution:

Our ssl config looked something like this:

ssl-proxy-list SSL_ACCEL

  ssl-server 10 vip address XX.XX.XX.XX

  ssl-server 10 rsakey XXXX

  ssl-server 10 cipher rsa-with-3des-ede-cbc-sha XX.XX.XX.XX 80

  ssl-server 10 cipher rsa-with-rc4-128-sha XX.XX.XX.XX 80

  ssl-server 10 cipher rsa-with-rc4-128-md5 XX.XX.XX.XX 80

  ssl-server 10 unclean-shutdown

  ssl-server 10 rsacert XXXXXX

Removing:

  ssl-server 10 cipher rsa-with-3des-ede-cbc-sha XX.XX.XX.XX 80

Solves the problem.  After that's removed, the browsers will nolonger fragment the first character of their request into a separate TLSv1 message.  The 3 second delay goes away, and L5 stickiness is fixed.  The "CBC" in the cyper refers to Cypher-Block-Chaining (a great article here:

http://en.wikipedia.org/wiki/Cipher-block_chaining), and breaking the payload into multiple packages may have been an attempt to initialize the IV for encryption -- although I'm really just guessing, I stopped researching once I verified this solution was acceptable.

This issue became serious enough for us to notice first on Monday Feb 13th 2012. We believe a number of our large customers distributed workstation updates over the weekend.  The customers affected were using IE7, although my personal IE7 test workstation did not appear to be affected.  It's quite possible our customers were going through an SSL proxy.  I suspect as more people upgrade their browsers, this will become a more serious issue for CSS users, and I hope this saves somebody a huge headache and problems with their production environment.

-Joe

2 Replies 2

Daniel Arrondo Ostiz
Cisco Employee
Cisco Employee

Hi Joe,

That's a very good analysis you did.

As you already suspected, the issue comes from the TLS record fragmentation feature that was introduced in the latest browser versions to overcome a SSL vulnerability (http://www.kb.cert.org/vuls/id/864643). Unfortunately, similar issues are happening with multiple products.

For CSS, the bug tracking this issue is CSCtx68270. The development team is actively working on a fix for it, which should be available (in an interim software release, so to get it you wil have to go through TAC) in the next couple of weeks

In the meantime, as workaround, you can configure the CSS to use only RC4 cyphers (which is what you were suggesting also). These are not affected by the vulnerability, so, browsers don't apply the record fragmentation when they are in use. This workaround has been tested by several customers already, and the results seem to be very positive.

Regards

Daniel

hostmaster
Level 1
Level 1

Excellent post Joe.  I just ran into the same issue.  I initially   thought the SSL proxy somehow prevented L5 rule matching from working,   then I tried disabling rule persistence before finally finding this post   via another one that talked about that persistence.  In this post, you   don't mention disabling rule persistence, but in

https://supportforums.cisco.com/thread/2128682 you do and tie it back to this post.

Did you find that disabling rule peristence was really necessary after removing that CBC cipher in the SSL proxy list?

Review Cisco Networking for a $25 gift card