10-30-2013 11:38 PM
Hi!
We have a problem with two ACE20 modules installed in Cisco 6509 switches and working in fault tolerance mode.
Both nodes continue working and doing LB, but it's not possible to configure it, show config, change context, et cetera.
Looks like some process at Admin context stops working, or something like that.
So below is the login process when I try to connect to ACE:
Escape character is '^]'.
Process did not respond within the expected timeframe, using defaults
Password:
Login incorrect
ACE1 login: admin
Password:
Unable to retrieve context name, using default
Unable to obtain config mode lock info
Disabling config mode
NOTE: Configuration mode has been disabled on all sessions
Unable to get role information, using default
Cisco Application Control Software (ACSW)
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained herein are owned by
other third parties and are used and distributed under license.
Some parts of this software are covered under the GNU Public
License. A copy of the license is available at
http://www.gnu.org/licenses/gpl.html.
ACE1/Admin# sh run
Generating configuration....
Acfg: Device or resource busy
ACE1/Admin# changeto VC_SERVERS
Error: Called API timed out
ACE1/Admin#
So I have to reload the module by resetting appropriate slot in 6509. Then it works for another couple of weeks.
What could it be?
Thanks
10-31-2013 06:14 AM
Hi Anatoly,
Looking at the output it seems device is running out of resource. Please use "show resource usage" and see if you have denied counter increasing. Please see current and peak connections and see if they have hit the max limit.
You might have to increase the resource for management of the device.using "limit-resource mgmt-connections".. You have to make that change in "Admin" context and since you can access Admin context you can give it a shot.
Let me know if that fixes it.
Regards,
Kanwal
10-31-2013 08:18 AM
Hi Kanwal,
Thanks for your reply, but it doesn't work. I can't even see resource usage:
ACE1/Admin# sh resource usage
Error: Transport error
Maybe after I reset the ACE I'll see something, but as I mentioned in previous message, ACE works relatively long time after hardware reset. And regarding increasing resources for mgmt - actually there should be no mgmt connections at all, we log in only when we have to reconfigure load balancing.
10-31-2013 08:41 AM
Hi Anatoly,
A quick search internally regarding "Transport error" reveals that this error usually comes while configuring FT or some process related to it has shut down. But in your case you are unable to even execute basic show commands. It seems that we have no choice but reload to come out of this situation but i would also recommend opening a TAC case and get this investigated if you are facing this issue again and again. May be there is a known bug in a version you are running. I see some bugs here but cannot match symptoms exactly like you are facing.
Also, regarding the resource utilization if you haven't allocated a dedicated resource to management, then it means that if a resource is free you will be able to manage the device and if it is not , since it is being used by other traffic you will not get access. That's why it is always a good idea to have some dedicated resource allocated for management. We have seen this issue. Even a single connection to ACE would be denied or not work if ACE has no free resource.
Regards,
Kanwal
10-31-2013 08:57 AM
Hi Kanwal,
Thanks a lot, it's good to know we should allocate dedicated resources for management! Maybe it's the root of the problem. Maybe not. We'll do it after resetting modules anyway.
And on the second ACE (active for all context, the previous one was standby) we have another error:
ACE2/Admin# sh resource usage
Error: Called API timed out
Not sure it's the same problem. Software version is A2(3.1) [build 3.0(0)A2(3.1)]
10-31-2013 09:15 AM
Hi Anatoly,
This is what i found:
When processing large configurations, subsequent commands may encounter an "API timeout" error, until the processing ends. The way to know when the processing ends is to perform a "show processor cpu" and verify that the load has reduced to low single-digits, or 0. The queue stall messages are harmless, unless they do not stop appearing.
Unfortunately i don't see any workaround to this problem and it is suggested to let the process complete and it should go away. If it doesn't then i would suggest opening a TAC case.
You might also consider to upgrade your device to latest code as the version you are running is pretty old.
Let me know if you have any questions.
Regards,
Kanwal
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide