Re: ACE is forwarding to wrong server farm

Unknown. · ‎10-29-2010

All,

situation overview:

I am talking about a ACE design with several contexts defined and each context is load balancing traffic of heavily used news websites (1.5Gbps avg).

Since some time we see that sometimes a packet is delivered to the wrong serverfarm. A request for website abc.com is received on a webserver for website xyz.com.

Initially we had the idea this was an issue of our Oracle webcache servers, but after a long period of troubleshooting we could not find a problem there. We now have an environment without the Oracle webcache servers and only the ACE module and we can see the same behaviour... reason to suspect that there is something wrong in the loadbalancing.

I know you guys will ask: give us some traces, but its hard since we are under heavy load all the time and its only happening 1 out of 10.000 connections or so. We are also talking about sticky environments, but all is configured good and no duplicate names are in place or what so ever.

Is there someone experiencing the same issue... I have no idea how and where to look what is causing this...

Thanks for your replies,

Ben

Diego Vargas · ‎10-29-2010

Ben,

Sounds like an HTTP persistence problem. The balancer will parse the first layer 7 packet and take the balancing decision, if another GET comes within same persistent connection a is a match to another rule, we might send it to the wrong server as that flow is already mapped.

Make sure you have persistence rebalance enabled so that the LB parse every GET and remap the flow in the backend if there a better match

Diego Vargas · ‎10-29-2010

Ben,

Forgot to add, if the traffic comes from a proxy with many clients behind, you may want to use the strict option which will force the balancer to create new backend flows for every request even if it is a match for same class map

Unknown. · ‎10-29-2010

Diego,

what do you mean with the "strict option"?

Ben

Unknown. · ‎10-29-2010

Diego,

was also my first idea, but the persitence rebalance is enabled and every GET is matched over and over again to our policy... It happens really occosionally, but our webdevelopers are starting to complain...

Regards,

Ben

Diego Vargas · ‎10-29-2010

Ben,

Look at this explanation, it referes to the strict option and it is an enhancement. I would however say that you should get sniffer traces and find out what exactly is going on with the requests that are balanced to the wrong server before you move into configuration changes. That would somehow be shooting in the dark.

Strict:

Currently, with persistence rebalance enabled, when successive GET requests result in load balancing that chooses the same Layer 7 class in the load-balancing policy, the ACE sends the request to the real server used for the last GET request over existing server-side connection. This behavior prevents the ACE from load balancing every request and recreating the server-side connection on every GET request, producing less overhead and better performance. 

In the past, with persistence rebalance enabled ACE load balanced 
every request after establishing a connection (receiving the first 
GET). In other words, if a successive request chose the same L7 
class in the policy as the previous request, the successive request 
was still load balanced. 

Even though the current behavior of load-balancing the first request only results in better performance (since there is no overhead of tearing down the existing and establishing a new connection), some of our customers would prefer that ACE load-balanced every request.  ACE, therefore, should be able to switch between these two behaviors. 

HTTP persistence rebalance feature is enabled or disabled by "persistence-rebalance" or "no persistence-rebalance" commands in an HTTP parameter map.  We can augment this command with the optional "strict" extension.

Unknown. · ‎11-04-2010

configuration changes (strict option) has been applied to some of our VIP's... Evaluating now to see if there is some improvement, but this can take some time...

Gilles Dufour · ‎11-04-2010

I'm afraid the strict option will have no impact here.

The command will force re-balance when not necessary.

What could be happening is HTTP parse error resulting in traffic defaulting to the default class-map.

switch/Admin# show np 1 me-stats "-shttp -v" | i arse
Parse result LB msgs sent:                    59188             0
Parse result Inspect msgs sent:                   0             0
Static parse errors:                              0             0
Max parselen errors:                              0             0

Check the counter above.

If it is not zero, you have parse errors (illegal http cookies) causing connection to be marked illegal and defaulting to the default class-map.

You could force those connections to get dropped instead of using the wrong class-map by configuring http inspection.

Gilles.

Unknown. · ‎11-04-2010

Gilles,

was expecting that the strict option would not be helping, but we are at a level that we try everything to solve this issue :-)

I just checked your comments and indeed I see a slight increase of those Static Parse Errors.

So when I understand correctly:

The Cisco ACE blade is parsing the http header to find some cookie information, but sees an illegal char and defaulting to the default class-map.

I am taking this info to our Web development team and see what the reply is. Will update you all soon!!!

Regards,

Ben