Introduction
Cisco ACE modules support virtualized architecture to increase datacenter scalability. You can create upto 250 virtualized contexts on an ACE module. Each context behaves like an independent ACE appliance with its own policies, interfaces, domains, server farms, real servers, and administrators. You can divide each context into multiple partitions called domains, which allow you to manage user access to the objects within a context. ACE modules are usually put in failover configuration to increase reliability.
Problem
Failover is not working properly. ACE modules in failover pair may end up both in active state for some context. L2 connectivity, like arp resolution, works but L3 connectivity is an issue.
Stateful Failover
Each peer appliance in a redundant group can contain one or more fault-tolerant (FT) groups. Each FT group consists of two members: one active context and one standby context. When a switchover occurs, the active member in the FT group becomes the standby member and the original standby member becomes the active member. The ACE uses the heartbeat to probe the peer ACE, rather than probe each context. Cisco ACE replicates flows on the active FT group member to the standby group member per connection for each context. Note that the ACE does not replicate SSL and other terminated (proxied) connections from the active context to the standby context.
Explanation
Surge in normal user traffic may cause the resource manager to drop the Admin traffic if no reservation is configured for the Admin context. When Admin traffic gets dropped the secondary assumes that primary failed and becomes active; although the primary is still active and has not failed. The drops can be seen in the following output
host1/Admin# show resource usage all
Context: Admin
Resource Current Peak Min Max Denied
-------------------------------------------------------------------------------
bandwidth 156 28333162 0 125000000 46288360
throughput 0 27365654 0 0 46288360
mgmt-traffic rate 156 967508 0 125000000
Solution
Check the resource allotted to the admin context. The problem happens when there is very little or no resources allotted to the admin context, which causes issues when there is heavy load. When all resources are reserved on the ACE by the members of the resource group, this leaves the Admin context, which is not configured in a resource group, without resources. Allocating resources to Admin context will resolve the issue.
Related Documents
Pointers for Locating ACE Module and ACE Appliance User Documentation
ACE behavior with static sticky and rserver down situation