Newly Occuring CSS SSL Issue in Chrome, FF10, IE9 with L5 rules; 3 second delay, loss of L5 stickyness
We recently started suffering an issue with our CSS11501S-K9 units not performing URL stickiness on our SSL wrapped L5 rules. I've spent dozens of manhours working on the problem, and have quite a bit of information to report, including a solution. There is a high probability that anybody who uses SSL to an L5 rule on a CSS unit will become affected by this problem over the next few weeks/months as users update their browsers with new SSL patches.
We hadn't made any changes to our config in months, and eliminated hardware problems by testing a second unit.
Here are the exact symptoms we saw:
Browsers affected: Firefox 10, Chrome, IE9, others (and some earlier versions of IE depending on patch levels)
Browsers not affected: FireFox 3.5, w3m 0.5.2, curl7.19.7
Impact 1: For SSL Rules backed by L5 rules, the initial response to the first request would be 3 seconds. Further requests on the same TCP connection would not be delayed
Impact 2: L5 rules being accessed via SSL would nolonger perform any URL based stickiness. Accessing the same rule skipping SSL, would work fine
I focused on the 3 second delay, since that was a new issue and was easier to debug than monitoring multiple servers to see if stickiness was broken. This is what I found when a client tries to connect to an SSL rule that ultimately is routed to a L5 HTTP rule:
2. Client sends HTTP 1.1 request for resource (nearly instantly)
3. 3 seconds of no traffic in our out of the CSS related to this request
4. CSS opens an HTTP connection to backend webserver, backend webserver responds (nearly instantly)
5. The CSS seems to route to the backend server using the balance method (round-robin) instead of the advanced-balance method (url)
6. Response is sent to the client with the resource (nearly instantly)
7. Future requests sent from the browser on the same TCP connection have no delay, but the advanced-balance continues to be ignored
The 3 seconds is quite an exact figure (within a few milliseconds) and appears to be entirely happening inside of the CSS unit itself, since it does not connect to the backend server until after the 3 seconds elapse. 3 seconds smelled like some sort of internal timeout set in the CSS unit after it gives up waiting for something.
Looking at the packets from affected browsers I discovered that the GET /foobar HTTP/1.1 request was being broken into two separate TLSv1 application messages, the first was 24 bytes and the second was 400 bytes. Decrypting these messages I found the first message was a
and the second message was:
ET /foobar HTTP/1.1
This essentially splits the initial request the client is sending into two pieces. This confuses wireshark so much, it doesn't decode this as a HTTP request, and just decodes it as "continuation or non-HTTP traffic".
On the working browsers I saw only one TLSv1 application message, decrypting it I saw:
GET /foobar HTTP/1.1
(obviously I'm simplifying the contents of the request, there were lots of headers and stuff)
I am aware that the CSS can't handle L5 rules appropriately if they get fragmented, so I suspected this was the problem. I pulled a packet trace from a few years ago, and at that time confirmed we never saw a double TLSv1 application messages before.
Solves the problem. After that's removed, the browsers will nolonger fragment the first character of their request into a separate TLSv1 message. The 3 second delay goes away, and L5 stickiness is fixed. The "CBC" in the cyper refers to Cypher-Block-Chaining (a great article here:
http://en.wikipedia.org/wiki/Cipher-block_chaining), and breaking the payload into multiple packages may have been an attempt to initialize the IV for encryption -- although I'm really just guessing, I stopped researching once I verified this solution was acceptable.
This issue became serious enough for us to notice first on Monday Feb 13th 2012. We believe a number of our large customers distributed workstation updates over the weekend. The customers affected were using IE7, although my personal IE7 test workstation did not appear to be affected. It's quite possible our customers were going through an SSL proxy. I suspect as more people upgrade their browsers, this will become a more serious issue for CSS users, and I hope this saves somebody a huge headache and problems with their production environment.
As you already suspected, the issue comes from the TLS record fragmentation feature that was introduced in the latest browser versions to overcome a SSL vulnerability (http://www.kb.cert.org/vuls/id/864643). Unfortunately, similar issues are happening with multiple products.
For CSS, the bug tracking this issue is CSCtx68270. The development team is actively working on a fix for it, which should be available (in an interim software release, so to get it you wil have to go through TAC) in the next couple of weeks
In the meantime, as workaround, you can configure the CSS to use only RC4 cyphers (which is what you were suggesting also). These are not affected by the vulnerability, so, browsers don't apply the record fragmentation when they are in use. This workaround has been tested by several customers already, and the results seem to be very positive.
Excellent post Joe. I just ran into the same issue. I initially thought the SSL proxy somehow prevented L5 rule matching from working, then I tried disabling rule persistence before finally finding this post via another one that talked about that persistence. In this post, you don't mention disabling rule persistence, but in
Good afternoon Friends, these days I was accessing the DNAC manager and my surprise is that I did not have access, after a basic analysis I observed that the equipment was operational, but without being able to access it via DNA GUI (browser) I also ident...
Today we are going to talk about how to configure backups in the Cisco ACI APIC Dashboard. As you might know APIC is a UCS based CIMC controller and we can check the configuration backups on the Cisco ACI APIC Dashboard. Remember that the HA solution must...
Listen: https://smarturl.it/CCRS9E14 Follow us: https://twitter.com/CiscoChampion
Organizations are undergoing digital transformation like never before. Global spending on digital transformation of business practices, products, and organization...
Recommended for customers on ACI releases 4.2(7r) and above or 5.2(3g) and above. Are you tired of manually gathering ACI logs and tech-support data?To this day, the process of gathering ACI logs and uploading them to your TAC case was cumbersome - y...