cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7454
Views
0
Helpful
0
Comments
Sandeep Singh
Level 7
Level 7

 

 

Introduction

This document helps in troubleshooting issues related to High availability configuration on ACE. High Availability (or fault tolerance) uses a maximum of two ACE appliances to ensure that your network remains operational even if one of the appliances becomes unresponsive. Redundancy ensures that your network services and applications are always available. Each FT group consists of two members: one active context and one standby context. One virtual MAC address (VMAC) is associated with each FT group. Each FT group acts as an independent redundancy instance. When a switchover occurs, the active member in the FT group becomes the standby member and the original standby member becomes the active member. The ACE sends and receives all redundancy-related traffic (protocol packets, configuration data, heartbeats, and state replication packets) on a dedicated FT VLAN.

 

Issues

 

Show Interface command shows FT Interface is Down

 

Verify that the FT interface is not shutdown

If shutdown, issue “no shutdown” command to enable the interface

 

Verify that the FT VLAN is configured on MSFC

Issue “show vlan” command to show configured VLANs

 

Verify that the FT VLAN is assigned to the module

Issue “show svclc group” command

 

Verify that the FT VLAN is trunked across to the other Catalyst chassis

Issue “show interface trunk” command

 

Show FT Peer command shows PEER_DOWN

 

Check if the IP addresses for the local and peer are configured correctly on both modules

 

Verify that Ping or Telnet to Peer IP address works

 

If Ping fails, check if interface is UP

 

Verify that FT interface is UP and FT VLAN is assigned to the module

 

Enter command “show conn” on both sides to check if HA connections have been set up. If connections have not been setup, check HA DP manager log . Setup could have failed because IXP was hung and didn’t respond.

 

Enter command “show ft stats” on both boxes to see if heartbeats are being sent or received. If the heartbeats missed counter is incrementing, they could be getting dropped in the fastpath.

 

Check the Fastpath counters.

Enter command "show np <1 or 2> me-stats -sfp" to check the counters.

 

Show FT Peer command shows TL_RETRY

 

This would mean that the HA “CP to CP” telnet connection is not getting established. The heartbeats are flowing through just fine.

 

Verify heartbeats are flowing by checking “show ft stats”.

 

Verify if Telnet or Ping to the FT peer IP address works. If ping fails, the telnet will also likely fail.

 

Check IXP stats for fastpath and ICM. The telnet request is most likely getting dropped there

Enter command "show np <1 or 2> me-stats –sfp"

Verify if the following are incrementing

Packets forward to CM  :  2

DROP: RX Interface miss:  2

 

Enter command "show np <1 or 2> me-stats –sicm"

Verify if the following are incrementing

Drop [ACL deny]            :   0

Drop [IF FT Standby]       :   0

Drop [Encap Miss Msg stat] :   0               

 

These counters will indicate that telnet requests are being punted to the ICM. The ICM drops it because of ACL deny’s, encap misses or interface state indicating it is in standby mode.

 

Show FT Peer command shows FT_VLAN_DOWN

 

This would normally occur if the FT VLAN went down when the configured Query interface is UP. Heartbeats would fail immediately. A continuous ICMP ping is started on the query interface. If that succeeds, declare FT_VLAN_DOWN and not PEER_DOWN

 

To resolve, restore connectivity on FT VLAN. Do a ping/Telnet to FT VLAN Peer IP address to verify.

 

HA stuck in STANDBY_CONFIG

 

This is seen on the Standby module when it is receiving config from the Active. Depending on whether the Config is being rolled back on the standby or the Config is being synced from the Active it could take longer than 30 mins. If it eventually moves to standby_cold refer to the next symptom.

 

Show FT group command shows STANDBY_COLD

 

If the standby context is in STANDBY_COLD state, it could mean:

a) Both ACE modules do not have same SSL key(s) and certificate(s)

b) Both ACE modules do not have same script file(s)

c) Both ACE modules have different license(s)

d) Configuration Sync Failed

 

Configuration sync failure can be verified if peer state shows “Compatible” and FT group shows “STANDBY_COLD”. To check the reason for config sync failure:

 

Enter command “show ft history cfg_cntlr” to see where the failure occurred. You would generally see messages as follows in case of failures:

 

“error: could not rollback configuration file /tmp/Admin-cfgcntlr-rollback-cfg  log file name Admin-cfgcntlr-rollback-cfg-863-1.log context Admin.”

“error: could not apply peer running configuration (file /tmp/005_Admin_0_cfgcntlr-peerbulk-cfg ) for context Admin”

 

e) Connectivity lost on FT VLAN. Standby moved to Standby_Cold state because Active is still reachable on Query interface

 

Quickly verify if  Standby_cold is due to the FT Vlan going down by checking the Peer State. It should show “FT_VLAN_DOWN”. “show ft stats” will also show that heartbeats are being missed. Restore the connectivity for FT VLAN to restore it.

 

Connections Table Not Replicated to Standby

 

Possible reasons for this behavior:

 

a) Peer is not up. To verify “show ft peer status"

b) FT Group is not up. To verify “show ft group status

c) Config sync did not complete. Check HA status

d) Encap(s) were unknown to the standby. A connection cannot be created without known encaps, so the first sync may cause an encap lookup

 

Check the following counters in the output of “show ft stats <ft group id>”: replicate connection sent stat, replicate connection recv stat. They should both be incrementing. They may not be equal because UDP is used to send these messages, which is a non-reliable transport.

 

Also check if the connection using anything that would disqualify it from connection replication. For example connections for HTTP INSPECT is not eligible.

 

Syslog message shows "Peer is incompatible due to error str. Cannot be Redundant"

Make sure that software version and license details are identical in paired ACE devices.

 

Related Information

Configuring High Availability on Cisco ACE

ACE module Failover pair in active/active situation

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: