cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

How many concurrent connections that an ACS server version 4.2 latest patch can handle?

cciesec2011
Participant
Participant

I have about 50 routers and layer-3 switches that autheticate via tacacs+.  The AAA server used to be on a Linux machine running open-source tacacs+ built by me.  I have a perl script that will log into all 50 devices at the same time to collect statistics.  This script is multi-threaded.  Everything is working fine so far.

I recently out-sourced the AAA function to a 3rd party company, not by my choice.  The 3rd party uses Cisco ACS version 4.2 with the latest patch running on Windows 2003 Enterprise Server with 16GB RAM and quad processors with quad-cores, IBM x3650-M2 hardware. The connectivity between the 3rd party and my company is through a DS-3 connection.  Maximum bandwidth over this DS-3 connection is less than 10Mbps at most.

I noticed that for the past 3 months I have multiple failures with this perl script due to authentication failure with the ACS server.  If I just run the script again a few routers/switches, there are no issues; however, whenever I started the script to log into 50 devices all at the same time, it will fail.  If I made the configuration on all routers/switches to point back to the old open-source tacacs+ server, the issue goes away.  The minute I switched back to the

new ACS server, the issue came back.  If I modified the script to hit one device at a time, it works fine.  I think it is the ACS server can not handle a lot

of AAA requests at the same time.


Does anyone know how many concurrent connections that an ACS 4.2, with latest patches on Windows 2003 Enterprise Server with lot of memory and CPU power, can handle?  I can't seem to find this anywhere on Cisco website.

Thanks in advance.

10 REPLIES 10

dhananjoy chowdhury
Contributor
Contributor

Hi,

Is there any kind of Host IPS or Cisco Security agent installed on the Cisco ACS server running on Windows?

If so, check if there are any alerts for connections being blocked.

You may get some idea of con connections from this link.

http://www.cisco.com/en/US/products/sw/secursw/ps2086/products_white_paper09186a00801495a1.shtml

There is a known issue of auth per sec with v 4.1.4, but I am not aware of anuthing on v4.2.

Bug id - CSCsd46457.

No IPS or CSA agent installed on the box, only Norton AV is installed.

darpotter
Contributor
Contributor

I think the ACS Tacacs server by default will have a limited number of connections... memory failing but think it might be about 40. So if you are doing 50 concurrently that might be an issue. With the old ACS (pre 4.0) there were numerous registry tweaks that we could use to increase the max concurrency but not sure that Cisco do now as its all in SQL Anywhere and/or locked within the appliance.

So out of interest, whats the timeout/re-try config on the devices... maybe they need to be a little looser. It could just be that the devices are timing out too quickly? You should have at least 10 seconds.

retry is 20 seconds.

if the number of connections is limited to 40, how would ACS is scalable in a Service Provider environment where there are hundred of customers and thousand of routers and switches that require aaa authentication.  Are you saying that the ACS 4.2 performance is worse than open source tacacs+ that cisco released years ago?

I wondered if Cisco has a fix for this?  Thanks in advance.

No, Im not saying ACS cannot cope.

Concurrency and latency are very different things. ACS CSTacacs can handle many 100s of simple authentications/authorisations per second with users in the internal database. If 1000s of devices all send traffic in the same instant it would take some seconds to work through the backlog of traffic.

Also, worth considering that a limited number of tasks within ACS (or threads) can actually handle a much greater number of "logins" because they are generally multi-message allowing ACS to keep lots of plates spinning.

If users are in an external databases the latency (per authentication) can increase depending on where the users are (eg Windows AD) and if bad enough can have a serious effect on the overall authentication rate. At which point customers normally turn to load balancing.

If your device timeouts are 20 seconds (totally reasonable) I suggest the issue is more likely to be something else... a bug, perhaps specific to v4.2?