09-10-2012 10:05 AM
We've got pairs of ACE30s in our data centers set up with active/standby FT. Some time yesterday the active ACE in one data center started refusing management traffic - it accepts SSH connections but fails authentication (local password, no RADIUS/TACACS is configured); and ANM reports it as down (no XML connectivity):
Desktop > ssh -a admin@ace-macc-1
Password: ********
Password: ********
Password: ********
admin@ace-macc-1's password: ********
Received disconnect from 192.168.255.100: 2: Too many authentication failures for admin
r-MACC-A#show module 8
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
8 1 Application Control Engine Module ACE30-MOD-K9 SAL1549XG39
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
8 e05f.b9a1.fb4c to e05f.b9a1.fb53 1.0 ace2t_main_d A5(1.2) Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
8/0 ACE Expansion Card 1 ACEMOD-EXPN-DC SAL1549XAA9 1.1 Ok
8/1 ACE Expansion Card 2 ACEMOD-EXPN-DC SAL1549XA9G 1.1 Ok
Mod Online Diag Status
---- -------------------
8 Pass
8/0 Pass
8/1 Pass
r-MACC-A#session slot 8 processor 0
The default escape character is Ctrl-^, then x.
You can also type 'exit' at the remote prompt to end the session
Trying 127.0.0.80 ... Open
ACE-MACC-1 login: admin
Password: ********
Login incorrect
ACE-MACC-1 login:
Login timed out after 60 seconds.
[Connection to 127.0.0.80 closed by foreign host]
However it's still load-balancing traffic properly, and log messages (mostly health probe failures) are still showing up in the Sup720 syslog; and the standby ACE seems to be perfectly happy:
ACE-MACC-2/Admin# show ft gr br
FT Group ID: 1 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: Admin Context Id: 0
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 2 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: UM-AAA Context Id: 5
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 3 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: AIGWEB Context Id: 1
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 4 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: UMCE-MAILSVCS Context Id: 7
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 5 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: UMCE-DNSTEST Context Id: 6
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 6 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: IAM-NONPROD Context Id: 2
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 7 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: TL-PROD-MACC Context Id: 4
Running Cfg Sync Status:Running configuration sync has completed
FT Group ID: 8 My State:FSM_FT_STATE_STANDBY_HOT Peer State:FSM_FT_STATE_ACTIVE
Context Name: TL-NONPROD-MACC Context Id: 3
Running Cfg Sync Status:Running configuration sync has completed
We haven't opened a TAC case yet - someone's on his way over to see whether we can get in through the serial port first - but I'm wondering whether there are any other diagnostics we can gather (will resetting the module form the Sup force a coredump?) before we do.
09-10-2012 05:30 PM
Hi Kurt,
It looks like the ACE control plane has hung and that's why you are not getting any management access. But everything is working since there is no problem with data plane.
Reloading ACE should fix this. It can due to low memory conditions. Please do check logs before the issue happened. You may have had some low memeory condition warnings.
Regards,
Kanwal
09-10-2012 08:18 PM
Hello Kurt,
You may run a # show scp stats and try to collect a #show tech-support before reloading? If you reload it without collecting any evidence or output then it will be hard for Cisco TAC to determine anything. Please get those outputs before and after the reload, you can do a #show tech-support from the switch before and after as well
Jorge
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide