11-22-2013 07:53 AM
Hi,
I have a problem with our ACE load balancers. We run a public FTP server farm which is load balanced using the ACEs. I have come across a problem which is very peculiar and is only affecting a single host, one of our offices in Poland.
Basically, the servers support Passive and Active FTP. We have clients that access the FTP service every other second so we know the existing configuration works just fine. However, our office in Poland, which sits behind a public NAT on a firewall, is unable to access our FTP service.
I have taken good and bad traces and noticed that the final TCP ACK from the client that should complete the hand shake, never makes it past the ace. The ACE show me the connection in SYNSEEN and SYNACK but never make it to the ESTABLISHED state.
PERSLOACE1/FRONT_END# show conn address x.x.x.x netmask 255.255.255.255 detail | i :21
1770432 1 in TCP 201 x.x.x.x:30951 87.83.27.52:21 SYNSEEN
934881 1 out TCP 200 y.y.y.y:21 x.x.x.x:30951 SYNACK
The ACE is carved into contexts and the same is seen on all of them.
The office in Poland can access other FTP sites on the Internet (i.e Mozilla) so we know the problem is not localised there. Also, various customers access our FTP service so we know there is nothing wring there either.
I have spent hours trying to find related issues on the Internet but haven't found any!
In looking at my traces, the only difference I can see is that healthy packets have the DF bit set on the IP header, whereas packets from Poland do not. Could it be related? Something to do with fragmentation and/or normalisation?
When issuing the show serverfarm command, I can see failure counters incrementing but I am unable to make a connection with a cause. The 'show np' outputs are not very clear.
PERSLOACE1/FRONT_END# show serverfarm Download_FTP detail
serverfarm : Download_FTP, type: HOST
total rservers : 3
active rservers: 3
description : -
state : ACTIVE
predictor : ROUNDROBIN
failaction : -
back-inservice : 0
partial-threshold : 0
num times failover : 0
num times back inservice : 0
total conn-dropcount : 0
Probe(s) :
FTP_DL, type = FTP
---------------------------------
----------connections-----------
real weight state current total failures
---+---------------------+------+------------+----------+----------+---------
rserver: FT Server 1
x.x.x.x:0 8 OPERATIONAL 13 83477 15
description : -
max-conns : - , out-of-rotation count : -
min-conns : -
conn-rate-limit : - , out-of-rotation count : -
bandwidth-rate-limit : - , out-of-rotation count : -
retcode out-of-rotation count : -
load value : 0
rserver: FTP Server 2
x.x.x.x:0 8 OPERATIONAL 17 1269 10
description : -
max-conns : - , out-of-rotation count : -
min-conns : -
conn-rate-limit : - , out-of-rotation count : -
bandwidth-rate-limit : - , out-of-rotation count : -
retcode out-of-rotation count : -
load value : 0
rserver: FTP Server 3
x.x.x.x:0 8 OPERATIONAL 21 2378 23
description : -
max-conns : - , out-of-rotation count : -
min-conns : -
conn-rate-limit : - , out-of-rotation count : -
bandwidth-rate-limit : - , out-of-rotation count : -
retcode out-of-rotation count : -
load value : 0
Here is the config below:
probe ftp FTP_DL
description FTP Probe
interval 60
passdetect interval 60
expect status 220 220
rserver host FTP Server 1
ip address x.x.x.x
inservice
rserver host FTP Server 2
ip address x.x.x.x
inservice
rserver host FTP Server 3
ip address x.x.x.x
inservice
serverfarm host Download_FTP
probe FTP_DL
rserver FTP Server 1
inservice
rserver FTP Server 2
inservice
rserver FTP Server 3
inservice
sticky ip-netmask 255.255.255.255 address both FTP_DL
timeout 20
replicate sticky
serverfarm Download_FTP
class-map match-any FTP_DL
2 match virtual-address 87.83.27.52 tcp eq ftp
3 match virtual-address 87.83.27.52 tcp eq ftp-data
4 match virtual-address 87.83.27.52 any
policy-map type loadbalance first-match FTP_DL
class class-default
sticky-serverfarm FTP_DL
policy-map type loadbalance first-match FTP_DL_Active
class class-default
sticky-serverfarm FTP_DL
policy-map multi-match FTP_Download
class FTP_DL
loadbalance vip inservice
loadbalance policy FTP_DL
class FTP_DL_Active
loadbalance vip inservice
loadbalance policy FTP_DL_Active
inspect ftp
interface vlan 201
description ACE-FW Vlan for FWLB
ip address "gw ip' x.x.x.x /y'
alias x.x.x.x /y
peer ip address x.x.x.x /y
mac-sticky enable
access-group input ANY
access-group output ANY
service-policy input FTP_Download
What is happening??! Please help. Happy to provide more show outputs
Many thanks
11-22-2013 11:06 AM
Hi,
So it seems that client ACK is not making it back to the ACE or the servers. Have you got the pcaps which show that ACK packet came to the ACE and ACE didn't send it back to the server?
If it is sending the SYN i am not sure why it would have a problem with ACK.
Normalization could have caused a problem if ACK was seen ACE without seeing the corresponding SYN so doesn't look like a Normalization issue here.
Can you take a simultaneous front end and back end pcap which establishes the fact that ACK packet was given to ACE but ACE didn't send it back to server?
Regards,
Kanwal
11-25-2013 01:01 AM
Hi Kanwal
Thanks for your input. Yes, I have taken traces in both the incoming vlan (inbound) and outgoing vlan (inbound) and I can see the client ack being lost in the ACE somewhere. The packet needs to switch contexts to make it from client (perimeter side) to the server (back end side). COuld this be related?
As you can see from the show serverfarm output, the failures count is incrementing so I am convinced the ACE is dropping the ACK for some unknown reason.
Regards,
Simon
11-25-2013 02:54 PM
Hi Simon,
Inter -context traffic could be a problem but it is working for every other host and not just this one(i assume you have proper routing in place for inter-context communication) , correct?
I would suggest opening a TAC case for detailed investigation.
Regards,
Kanwal
11-26-2013 01:02 AM
Hi Kanwal,
Yes, routing is correct as the initial SYN and SYN-ACK packets make it correcty to the end hosts.
I am waiting for our support contracts to be mapped onto our account so I can't open a TAC case yet, but if there are any more suggestions, especially with show command outputs, they'd be welcome.
Thanks for your input
Regards,
Simon
11-26-2013 04:45 AM
Hi Simon,
The Connections Failures counter for a real server in a server farm may increment for one of the following reasons:
Now, strange thing is this is happening for only for one HOST and you have pcap which shows that client ACK was sent to the ACE.
Can you send me the front end as well as backend pcap which shows that SYN from ACE made it to the backend but client ACK didn't?
Is it possible for you to take a ACE Backplane capture during the time of issue and send it to me for analysis?
Do grab two instances of show tech with a time difference of 5 minutes during the time of issue from the affected context.
Do indicate what are the IP' i need to look at like client IP, VIP etc.
I will have a look and see what comes out. I have a feeling that this might need to go to development and without TAC case that won't be possible. If we do see ACK making it to the ACE and lost in ACE then we will need to have all that information (requested above) , open a TAC case and go to development for analysis.
But let's see.
You can also try and take capture on ACE itself. One for working client and one for non working client would be very helpful.
Regards,
Kanwal
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide