cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4276
Views
0
Helpful
10
Replies

Cisco ACE 4710 ACE20-MOD-K9 Established sessions torn down when a Probe fails

Joana Manzano
Level 1
Level 1

Hi,

I have a question about how Load Balancers work when a Probe fails. I know that there are other discussions about the same issue but they are not very helpful in my scenario. Hopefully you can help me.

According to Cisco documentation "The default behaviour of the ACE is to do nothing with existing connections if a real server fails." So I have assumed that if a Probe configured for Server1 fails in a serverfarm, connections already established would be maintained and new sessions would go to Server2 that is Operational in the same serverfarm.

I have tested this scenario, making a Probe fails for Server1 and the sessions were torn down. However, new sessions went to Server2.

We don't want to tear down user sessions already established because we don't want any impact for end users. I don't know why our load balancers are not following the default behaviour. These are the details:

- 2 ACE 4710 A4(2.0) in Active/Standby configuration.

- 2 ACE20-MOD-K9 A2(1.6a) in Active/Standby configuraion.

I have done the testing for ACE 4710 and ACE20-MOD-K9 and I got the same results: Established connections were torn down from Server1 after the Probe failed.

I know that these versions are old... so do you know if it could be the reason because they are not working as expected? Do I need to include some special configuration/commands that are not enabled by default in these versions?

Thank you in advance.

Joana.

10 Replies 10

Cesar Roque
Level 4
Level 4

Hi Joana,

How did you confirm that existing connections are tear down?  Do you have simultaneous captures on both sides of the ACE showing this behavior?

How did you cause the probe failure? 

To confirm this behavior you mentioned we need to gather simultaneous captures on both sides, (client side and server side). 

---------------------
Cesar R
ANS Team

--------------------- Cesar R ANS Team

Hello First I would recommend working with version A2 (3.6a) as this will give you more stability.

You do not mention if your balancers work in FT or just a manual redundancy. FT must be careful that the command (show ft group summary) always show as active balancers STANDBY_HOT.

I would enable balancing persistence, although this will take you to that if you needed https traffic service certificates to open the communication calan and insert cookie persistence.

would also help to have (replicate sticky) in the configuration.

At least in this way will ensure that communication will keep the communication level, because the cookies were replicated in your partner when one of the servers fall.

But to get to the stage you want to no end customer afectasion application level should also ensure that the remaining live server to maintain session persistence your partner, not all task balancer.

Hi,

We have the load balancers configured for High Availability, one is ACTIVE and the other is STANDBY. Therefore we have configured FT groups in the Admin context and in case of failure, the failover is done automatically, we don’t need to do it manually.

Testing Environment:

  • •I configured a probe to poll TCP port 7 in the servers part of the SERVERFARM. There is also another probe configured in TCP port 80 for the SERVERFARM. Because we didn’t want to stop HTTP service we played with this new probe (TCP port 7) to make a server OPERATIONAL or not.
  • •SERVER1 and SERVER2 had a script running and listening in that port (TCP 7).
  • •Both servers appeared OPERATIONAL (show serverfarm SERVERFARM) and they had client sessions established. It is a web application running on the servers (SERVER1 and SERVER2) so we used the application, as normal users, to generate sessions in both servers through the Load Balancers.
  • •Then we stopped the script running on TCP port 7 in SERVER1.
  • •We were using the web application whilst my colleague stopped the script to be sure that we can still use the web application with no service interruption.
  • •At that point, we needed to re-login. Sessions were tear down for users connected to SERVER1 and I confirmed that using: show serverfarm SERVERFARM

ACE-Module-/context# show serverfarm SERVERFARM

serverfarm     : SERVER-farm, type: HOST

total rservers : 2

---------------------------------

                                                ----------connections-----------

       real                  weight         state        current    total      failures

   ---+---------------------+------+------------+----------+----------+---------

   rserver: SERVER1

       192.168.1.1:0           8      PROBE-FAILED  0          953550     4324

   rserver: SERVER2

       192.168.1.2:0           8      OPERATIONAL   3          941003     4421

The number of connections in SERVER1 were 0 and we got new connections in SERVER2 after re-login.

Context Configuration:

probe tcp tcp7_probe1

port 7

interval 20

passdetect interval 10

passdetect count 2

open 1

rserver host SERVER1

description UAT server 1

ip address 192.168.1.1

inservice

rserver host SERVER2

description UAT server 2

ip address 192.168.1.2

inservice

serverfarm host SERVER-Farm

predictor leastconns

probe probe1

rserver SERVER1

   inservice

rserver SERVER2

   inservice

sticky ip-netmask 255.255.255.255 address source SERVERFARM-Sticky

timeout 720

timeout activeconns

replicate sticky

serverfarm SERVER-Farm

class-map match-all L4VIPSERVERFARM

2 match virtual-address 192.168.2.10 tcp eq www

policy-map multi-match SERVERFARM-VIPs

class L4VIPSERVERFARM

   loadbalance vip inservice

   loadbalance policy SERVERFARM-Web-policy

   loadbalance vip icmp-reply active

   loadbalance vip advertise active

interface vlan 15

description Servers

ip address 192.168.1.250 255.255.255.0

alias 192.168.1.251 255.255.255.0

peer ip address 192.168.1.249 255.255.255.0

no normalization

access-group input any

nat-pool 240 192.168.1.240 192.168.1.240 netmask 255.255.255.255 pat

service-policy input SERVERFARM-VIPs

no shutdown

interface vlan 16

description ACE Public Client Side

ip address 192.168.2.9 255.255.255.0

alias 192.168.2.7 255.255.255.0

peer ip address 192.168.2.8 255.255.255.0

no normalization

access-group input any

service-policy input SERVERFARM-VIPs

no shutdown

ip route 0.0.0.0 0.0.0.0 192.168.2.1

Thank you very much for your help.

Joana.

Hi Joana,

Try with this command:

serverfarm host SERVER-Farm

failaction reassign==============add this line

predictor leastconns

probe probe1

rserver SERVER1

   inservice

rserver SERVER2

   inservice

---------------------
Cesar R
ANS Team

--------------------- Cesar R ANS Team

Hi,

I will try the "failaction reassing" command in the next few days and I will let you know if it makes any difference.

Thanks,

Joana.

Hi Joana,

Actually you need to test with "failaction purge", the reasssing works only when there is a backup-rserver configured

---------------------
Cesar R
ANS Team

--------------------- Cesar R ANS Team

Hi People, 

I need help fast . 

i have  2 ACE-4710-K9  in active/standby mode and one of them failed .

I opened RMA and i recived  new ACE-4710-K9  . 

Now i need to configure this and to connect to be again in active/standby .

Does anyone know what i need to do to configure HA again ( some documents ) ?

Will it recive config from currently active unit , and will it affect prodiuction ?

KR

VZ

chrhiggi
Level 3
Level 3

Joana-

  What you are looking for is actually failaction reassign, however, you can not use it unless you meet a strict criteria, most people choose not to use it. Usually, you would use reassign only with firewall loadbalancing. 

  Your basic issue is this ->

  By default, when a probe fails, ACE leaves all "active/established" connections on the failed server.  All new connection (wether they have a sticky entry that matches the failed server or not) go to the remaining servers left in the serverfarm based on the loadbalancing predictor configured.  Sticky entries are updated with the new server ip.  For your users, you shut down the port, the server is either not going to respond to the next packet the client sends, or it will trigger a reset.  Either way.. .that is not graceful for the client.

  With failaction purge, ace sends a reset to both the client and server IP for the failed rserver within the serverfarm it failed in.  As with the default behavior, all new connecitons are loadbalanced to the remaining servers.

  With failaction reassign, ace sends any packets that would have gone to the failed server on to whatever servers are left in the farm.  The moment the probe fails, ACE takes the existing connections for that server and rewrites the flow information to the remaining servers. This is not graceful for your client either.

  It sounds to me like you are looking for reassign in order for the users to not see a reset and gracefully handle a failed server.  Given that, you will need to check the guidelines under reassign located here:

http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA5_1_0/configuration/slb/guide/rsfarms.html#wp1117375

Regards,

Chris Higgins

Technical Leadership

ANS Loadbalancing Technologies

Hi,

First of all, thanks for your help!

After some more testing I was kind of wrong about the Load Balancer behaviour. I have done a big testing with more users using the Web Applications behind the Load Balancers. These are the results:

  • 2 ACE 4710: After the probe in port TCP 7 failed, web sessions were maintained until the user ended the session and new sessions were redirected to the other Operational servers. It is exactly what I wanted.

  • 2 ACE20-MOD-K9: From the point of view of the load balancers I got exactly the same behaviour, however, user sessions were torn down. “show serverfarm SERVERFARM” showed the status of Probe-Failed and the same number of sessions that the server had before the Probe (TCP 7) failed. New sessions were redirected to the others Operational servers and the number of sessions were decremented progressively in the server with a Probe-Failed status. But my colleague, who works with the web application servers behind, could see that all user sessions were torn down after the Probe failed although Load Balancers seemed to maintain the session already established. Users had to re-login.

Now I am confused… why the Server decides to close user sessions when the Probe fails if the ACEs still maintain the session established through them? The script listening on port TCP 7 doesn’t have any impact on the service running on TCP 80 in the same server. They are completely independent. It could be something in the web application itself? Or maybe it is some configuration on the serverfarm that specifically says to the server to finish the sessions when a Probe fails? Sticky sessions? Why is working fine in the first scenario and not in the second one? (Well, I have to say that the web applications are completly different).

I also tried the “failaction reassign” command, but it didn’t make any difference. I think you need a backup server configured in the serverfarm to get it working.

I really appreciate your help.

Cheers,

Joana.

Joana-

"But my colleague, who works with the web application servers behind,  could see that all user sessions were torn down after the Probe failed  although Load Balancers seemed to maintain the session already  established."

Who tore down the session (was it a reset or a fin - did it come from the "Client" or the server initiated it?)

Regards,

Chris

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: