cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
755
Views
0
Helpful
2
Replies

ACE30 is running but not allowing management access

KURT HILLIG
Level 1
Level 1

We've got pairs of ACE30s in our data centers set up with active/standby FT.  Some time yesterday the active ACE in one data center started refusing management traffic - it accepts SSH connections but fails authentication (local password, no RADIUS/TACACS is configured); and ANM reports it as down (no XML connectivity):

Desktop > ssh -a admin@ace-macc-1

Password: ********

Password: ********

Password: ********

admin@ace-macc-1's password: ********

Received disconnect from 192.168.255.100: 2: Too many authentication failures for admin

r-MACC-A#show module 8

Mod Ports Card Type                              Model              Serial No.

--- ----- -------------------------------------- ------------------ -----------

  8    1  Application Control Engine Module      ACE30-MOD-K9       SAL1549XG39

Mod MAC addresses                       Hw    Fw           Sw           Status

--- ---------------------------------- ------ ------------ ------------ -------

  8  e05f.b9a1.fb4c to e05f.b9a1.fb53   1.0   ace2t_main_d A5(1.2)      Ok

Mod  Sub-Module                  Model              Serial       Hw     Status

---- --------------------------- ------------------ ----------- ------- -------

8/0 ACE Expansion Card  1       ACEMOD-EXPN-DC     SAL1549XAA9  1.1    Ok

8/1 ACE Expansion Card  2       ACEMOD-EXPN-DC     SAL1549XA9G  1.1    Ok

Mod  Online Diag Status

---- -------------------

  8  Pass

8/0 Pass

8/1 Pass

r-MACC-A#session slot 8 processor 0

The default escape character is Ctrl-^, then x.

You can also type 'exit' at the remote prompt to end the session

Trying 127.0.0.80 ... Open

ACE-MACC-1 login: admin      

Password: ********

Login incorrect

ACE-MACC-1 login:

Login timed out after 60 seconds.

[Connection to 127.0.0.80 closed by foreign host]

However it's still load-balancing traffic properly, and log messages (mostly health probe failures) are still showing up in the Sup720 syslog; and the standby ACE seems to be perfectly happy:

ACE-MACC-2/Admin# show ft gr br

FT Group ID: 1  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: Admin     Context Id: 0

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 2  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: UM-AAA    Context Id: 5

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 3  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: AIGWEB    Context Id: 1

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 4  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: UMCE-MAILSVCS     Context Id: 7

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 5  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: UMCE-DNSTEST      Context Id: 6

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 6  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: IAM-NONPROD       Context Id: 2

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 7  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: TL-PROD-MACC      Context Id: 4

                Running Cfg Sync Status:Running configuration sync has completed

FT Group ID: 8  My State:FSM_FT_STATE_STANDBY_HOT       Peer State:FSM_FT_STATE_ACTIVE

                Context Name: TL-NONPROD-MACC   Context Id: 3

                Running Cfg Sync Status:Running configuration sync has completed

We haven't opened a TAC case yet - someone's on his way over to see whether we can get in through the serial port first - but I'm wondering whether there are any other diagnostics we can gather (will resetting the module form the Sup force a coredump?) before we do.

2 Replies 2

Kanwaljeet Singh
Cisco Employee
Cisco Employee

Hi Kurt,

It looks like the ACE control plane has hung and that's why you are not getting any management access. But everything is working since there is no problem with data plane.

Reloading ACE should fix this. It can due to low memory conditions. Please do check logs before the issue happened. You may have had some low memeory condition warnings.

Regards,

Kanwal

Jorge Bejarano
Level 4
Level 4

Hello Kurt,

You may run a # show scp stats and try to collect a #show tech-support before reloading? If you reload it without collecting any evidence or output then it will be hard for Cisco TAC to determine anything. Please get those outputs before and after the reload, you can do a #show tech-support from the switch before and after as well

Jorge

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: