We recently purchased 2 new servers to replace our older ACS servers. The new servers are running Windows 2003 SP1 and we purchased ACS 3.3.2 for these 2 servers.
Primarly they are authenticating for Dial-In clients through AS5300s. The servers are setup for ACS Replication and appear to be replicating properly. However, recently the primary ACS server has begun having issues. Similiar issues that I have noticed other people reporting here without solutions. Hoping that maybe bringing this up again, maybe somebody will now have a solutions for it.
First of all, when the servers first boots up, not all of the services start up correctly. Quite often the CSAuth service will continually start and stop and fail to stay started. And the CSTacacs, and CSRadius services do the same. This problem has also occured at random on the server before as well.
I have also recieved emails from the Monitoring service
"Service CSAuth has been stopped or paused by the system. Monitoring will suspend until the service is restarted"
These are not the usual emails I get when DB replication occurs, those ones indicate it has been suspended for a configuration function to proceed.
It has also been noted by our staff that quite often the CSAdmin http interface "hangs" or is extremly slow, I have noticed that during these times that the services are restarting on the server for no reason... this often happens when our staff goes to look up or save user changes.
I've turned on full ACS logging to see if I can find anything useful. I found this in the TCS.log
"TCS 08/26/2005 15:59:04 I 0687 4428 Single Connect thread 27 waiting for work
TCS 08/26/2005 15:59:04 I 0344 4100 NC-NAS1: fd 1224 eof (connection closed)
TCS 08/26/2005 15:59:04 I 1433 4100 Thread 7 waiting for work
TCS 08/26/2005 15:59:04 I 1442 4100 Thread 7 allocated work
TCS 08/26/2005 15:59:04 I 0696 4428 Single Connect thread 27 allocated work
TCS 08/26/2005 15:59:04 I 0758 4428 thread 7 sock: 4c8 session_id 0x3d62fe56 seq no 1 AUTHEN:START login ascii login tty85 XXXXXXXXXX/YYYYYYYYYY
TCS 08/26/2005 15:59:04 A 0197 2508 API: Tcp_Connect: Failed to connect to 127.0.0.1, sock error 10061
TCS 08/26/2005 15:59:04 A 0197 2508 API: Transport connect failed
TCS 08/26/2005 15:59:04 A 0197 2500 API: Tcp_Connect: Failed to connect to 127.0.0.1, sock error 10061
TCS 08/26/2005 15:59:04 A 0197 2500 API: Transport connect failed
TCS 08/26/2005 15:59:04 I 0687 2500 Single Connect thread 30 waiting for work
TCS 08/26/2005 15:59:04 A 0197 5808 API: Tcp_Connect: Failed to connect to 127.0.0.1, sock error 10061
TCS 08/26/2005 15:59:04 A 0197 0232 API: Tcp_Connect: Failed to connect to 127.0.0.1, sock error 10061
I also start getting messages like this in the RDS.log
RDS 08/26/2005 15:56:51 P 2253 6052 User:boxing - CSAuth client cannot get a connection to server - no free connections
RDS 08/26/2005 15:56:51 P 2253 4076 User:carsa - CSAuth client cannot get a connection to server - no free connections
RDS 08/26/2005 15:57:02 E 2285 6052 Error -16 authenticating boxing - no NAS response sent
Unfortunatly I don't have a peice of a CSAUTH log, but it similairly shows errors. It shows the regular loading messages then repeats over and over again about tcp connect errors to localhost I believe.
The only workaround I have found to get Cisco out of the loop, is to manually stop all of the CS services, then I start CSAdmin and CSAuth and wait about 2 minutes, watching the csauth log file until it appears that it has finished loading going by the log file. I then load the other services and everything works as expected. Until it begins occuring occuring again, which seems to happen at random.
If anybody has any possible fixes for this problem i'd greatly appreciate any help.
Check your log files. For example, authorization failure logs, successful logins, failed attempts, etc. These logs tend to grow with time unless properly configured to clear them at regular intervals. There are configuration options to set the maximum limit to these logs.
We have our logs set to roll over daily. But I don't believe a log file growing too large is causing ACS to restart itself anyways.
Anyhow, the problem is still continuing and I managed to get a peice of the csauth.log file that shows the service restarting. There does not appear to be any kind of indication in the log file as to why it restarted though.
The same problem started occuring on one of my machines the other day.
After a reboot the CSTacacs and CSRadius services doesn't start. The Win2003 Service Control Manager reports:
Timeout (30000 milliseconds) waiting for the CSRadius service to connect.
The CSRadius service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
The services are supposed to be started automatically and there's no problem starting them manually after each reboot.
The patches that caused the first reboot have been backed out of, so that hopefully isn't the problem...
Any ideas are greatly appreciated!
After some investigation I have found the bug CSCsb81671 who describes exactly the problem.
Since this bug is still open I have openned a Tac case and received a positive answer.
The engineer has provided a workaround with a replacement DLL (ccmp.dll in c:\windows\system32).
Use the same way with mention to the bug ID in the case to get the file.