cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1524
Views
0
Helpful
6
Replies

CSS 11503 resetting connections

eclipsefnd
Level 1
Level 1

A CSS 11503 recently started resetting connections for a specific service set after the TWHS is completed.

INTERNET   <---->    ASA 5540   <--->   CSS 11503  <---> Catalyst 2970 switches  <--> Real servers

All the real servers use the CSS as their Default Gateway. Everything works normally for the dozen or so content/service rules that are on the CSS, some of which are much busier than the "faulty" one.

Here's a capture from the ASA's outside interface:

   1: 11:36:16.526172 X.X.X.X.41809 > Y.Y.Y.Y.80: S 3023827538:3023827538(0) win 14224 <mss 1460,sackOK,timestamp 992102125 0,nop,wscale 7>

   2: 11:36:16.526202 Y.Y.Y.Y.80 > X.X.X.X.41809: S 2506723382:2506723382(0) ack 3023827539 win 0 <mss 1460>

   3: 11:36:16.560472 X.X.X.X.41809 > Y.Y.Y.Y.80: . ack 2506723383 win 14224

   4: 11:36:16.561753 Y.Y.Y.Y.80 > X.X.X.X.41809: R 2506723383:2506723383(0) win 0

   5: 11:36:17.596328 X.X.X.X.41810 > Y.Y.Y.Y.80: S 280453289:280453289(0) win 14224 <mss 1460,sackOK,timestamp 992103195 0,nop,wscale 7>

   6: 11:36:17.596358 Y.Y.Y.Y.80 > X.X.X.X.41810: S 2292305502:2292305502(0) ack 280453290 win 0 <mss 1460>

   7: 11:36:17.630369 X.X.X.X.41810 > Y.Y.Y.Y.80: . ack 2292305503 win 14224

   8: 11:36:17.631025 Y.Y.Y.Y.80 > X.X.X.X.41810: . ack 280453290 win 8760

   9: 11:36:17.664745 X.X.X.X.41810 > Y.Y.Y.Y.80: P 280453290:280453403(113) ack 2292305503 win 14224

  10: 11:36:17.666896 Y.Y.Y.Y.80 > X.X.X.X.41810: . ack 280453403 win 5840

As you can see, we start with a SYN, SYN+ACK, which is ACK'ed in line 3, when RST in line 4.  The retry succeeds immediately afterwards. On the real server, a tcpdump does not reveal packets 1,2,3 and 4.

From the Internet client, here's what I'm seeing:

$ wget -O /dev/null -S http://www.Y.Y.Y.Y/

--2013-05-07 11:36:16--  http://www.Y.Y..Y/

Resolving www.Y.Y.Y.Y... Y.Y.Y.Y

Connecting to www.Y.Y.Y.Y|Y.Y.Y.Y|:80... connected.

HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.

Retrying.

--2013-05-07 11:36:17--  (try: 2)  http://www.Y.Y.Y.Y/

Connecting to www.Y.Y.Y.Y|Y.Y.Y.Y|:80... connected.

HTTP request sent, awaiting response...

  HTTP/1.1 200 OK

  Date: Tue, 07 May 2013 15:36:17 GMT

[snip]

I've suspended all but one of the services that are serving this content to facilitate debugging.  On the real server, I am not seeing these RESETS in tcpdumps.  I've tried suspending and activating various servers, as perhaps one was faulty, but it does not change anything. 

The active service itself is not under any particular strain, is not flapping between state transitions and has plenty of available connections:

CSS11503(config)# show service www-vm3             

Name: www-vm3           Index: 20   

  Type: Local            State: Alive

  Rule ( 172.X.X.7  ANY  ANY )

  Session Redundancy: Disabled

  Redirect Domain: 

  Redirect String:

  Keepalive: (TCP-80   5   3   5 )

  Keepalive Encryption:      Disabled

  Last Clearing of Stats Counters: 05/06/2013 12:51:13

  Mtu:                       1500        State Transitions:            6

  Total Local Connections:   91742691    Total Backup Connections:     0

  Current Local Connections: 324         Current Backup Connections:   0

  Total Connections:         91742691    Max Connections:              65534

  Total Reused Conns:        2912169     Weight Reporting:             None

  Weight:                    1           Load:                         9

Portions of my show run:

!************************** SERVICE **************************

service www-vm1

  ip address 172.X.X.5

  keepalive type tcp

service www-vm2

  ip address 172.X.X.6

  keepalive type tcp

service www-vm3

  ip address 172.X.X.7

  keepalive type tcp

  active

!*************************** OWNER ***************************

owner Y.org

  content www

    protocol tcp

    port 80

    add service www-vm1

    add service www-vm2

    add service www-vm3

    flow-timeout-multiplier 40

    primarySorryServer moving

    advanced-balance sticky-srcip

    vip address Y.Y.Y.Y

    active

Any hints on what I should be looking at, or how I can get the CSS to tell me why it's RST'ing so many connections?  This happens about 1 in every 12 wget's I'll execute.


1 Accepted Solution

Accepted Solutions

Denis-

Gather a trace on the links the CSS uses to connect to the upstream device. (in other words, all traffic in and out of the box). The CSS doesn't have a full TCP stack and litterally can't terminate traffic randomly.  Unless there is an L5 rule that applies to the traffic in question, the CSS can't terminate it. So, its likely there is something else at play.

Chris

View solution in original post

6 Replies 6

chrhiggi
Level 3
Level 3

Hello Denis!

  Its a bit vauge, can you supply all of the configuration that uses IP address Y.Y.Y.Y on the CSS? It looks like the VIP IP is L5 based on what you are noting.

Regards,

Chris Higgins

Hi Christopher,

I'm not sure what else you're looking for -- there is only one other content rule for Y.Y.Y.Y:

  content www-https

    protocol tcp

    primarySorryServer moving

    port 443

    add service www-vm1

    add service www-vm2

    add service www-vm3

    flow-timeout-multiplier 4

    vip address Y.Y.Y.Y

    active

There are other content rules for other services, but nothing else for that VIP on the CSS.

Denis-

  Then you might have a rule that has no vip ip, but is in service.  Or... there was an outbound connection sent from the CSS but the way you captured did not reflect it.  The reason I say this, for a L4 rule that matches only port, the CSS recieves a SYN and forwards it to the server IP.  It only terminates the TCP session when the rule is L5 (meaning a HTTP URL or header match of some sort.)  If you noted that you saw a SYN,ACK from the CSS - but on the server, you never saw a SYN, that would mean either the CSS is terminating it, or the SYN the CSS is forwarding is going somewhere else.

  I would check the ARP table and verify the MAC of the servers in the CSS are what you expect.  Also, span the physical interface(s) the CSS connects from and verify the packets in/out instead of a trace on the server. If you can't do that, try opening another vip rule on a different port, same VIP IP with the same servers.  Telnet on that port number to the VIP and see if you see it terminate or the syn is forwarded to the server.  If not... you know you have some sort of routing issue.

Regards,

Chris

Christopher,

Thanks for your reply.  You're right, I did have an L5 rule in there, referencing the same vip:port using url ''.  I've suspended then removed it, and triple-checked that the only rules affecting that vip are the L4 ones in this post.  I waited about an hour yet the CSS is still terminating the TWHS, and every 12th or so connection I'm still getting RSTs yet the real server is not seeing the TWHS in those cases.

I am seeing the keepalive test connection from the CSS to the real server every 5 seconds, which is a TWHS + RST.

Very odd.  All my other services work fine, so I'll start tracing my way back, examining the arp tables on the CSS, the switches and the server.

Denis-

Gather a trace on the links the CSS uses to connect to the upstream device. (in other words, all traffic in and out of the box). The CSS doesn't have a full TCP stack and litterally can't terminate traffic randomly.  Unless there is an L5 rule that applies to the traffic in question, the CSS can't terminate it. So, its likely there is something else at play.

Chris

Thanks, Chris.  I believe we've found the problem.

While sniffing the CSS outside interface, we noticed it was occasionally receiving only the third packet of the TWHS.  The CSS was understandably RST'ing the rogue connection.

After sniffing the ASA's outside connection, we noticed the ASA was SYN+ACK'ing some connections to Y.Y.Y.Y.  As it turns out, we are hitting the embryonic connection count for IP Y.Y.Y.Y on our ASA.  While the docs state that if the client ACK's a SYN+ACK it will be patched through transparently, this is obviously failing in many cases.

So we'll dramatically increase our connection counts and decrease our timeouts on the ASA.  It's odd that this condition wasn't being logged.

Thanks for your help.