04-16-2023 06:25 AM
Dears,
i've got a strange Problem with a pair of 5520, IOS 8.5.182.0:
2 Weeks ago i did a reboot of the cluster, Machine 2 failed to boot and is dead.
We had a spare Machine and installed it today. Update IOS, Configure Interfaces, not the biggest deal. Both see each other by redundancy, if the active boots the standby takes over. Fine so far.
But then:
The Spare (WLC2, new) is able to manage APs, the old one (WLC1) not - not longer, if he is the active Part.
What i did into the process is change the state of "Redundancy Unit" of WLC1 from secondary to primary - including 2 reboots because of SSO disable/enable, because this machine had the good running configuration. It works, WLC2 Joins as Secondary/Standby and receive the Config. I do not know if this was needed.
the "show AP summ" in CLI of WLC1 gives a alternating count of APs, 3, 20, 35, 6, ... On Webinterface nothing is shown. We run 440 APs.
On WLC2 they all join in a few seconds.
Any Ideas?
BR
Siggi
04-16-2023 07:00 AM
Hi
Really hard to say anything without logs. But maybe what cause the problem in one of your WLC also affected the other pair. If you are able to, I would recommend RMA for the dead one and the half dead one. After all, have an WLC in cluster which is not able to manage the AP worth nothing.
04-16-2023 07:54 AM - edited 04-16-2023 07:55 AM
- Start with a checkup review of the current configuration on the active controller (use this procedure) :
https://community.cisco.com/t5/networking-knowledge-base/show-the-complete-configuration-without-breaks-pauses-on-cisco/ta-p/3115114#toc-hId-1039672820
Have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer/
Check basic redundancy setup with (CLI) : show redundancy summary
show redundancy detail
, following commands may be useful too :
show redundancy infra statistics
show redundancy transport statistics
show redundancy keepalive statistics
show redundancy gw-reachability statistics
show redundancy config-sync statistics
show redundancy ap-sync statistics
show redundancy client-sync statistics
Also check (active) controller logs when an AP can not join , and check the AP boot process (through console e.g.) ;
also review the output of : show ap summary
M.
04-16-2023 08:03 AM
(.....Also) : If possible go for : https://software.cisco.com/download/home/286284738/type/280926587/release/8.10.183.0 ,
according to https://www.cisco.com/c/en/us/support/docs/wireless/wireless-lan-controller-software/200046-tac-recommended-aireos.html
M.
04-16-2023 08:38 AM
When the initial secondary died, all the ap's joined the primary controller fine, correct? You took a spare controller and you configured the basics for network connectivity and then you added that and now the primary doesn't work at all? If you remove the secondary, what happens? It seems like the initial configuration might have been wrong. If the primary works with the secondary removed, then factory reset the secondary and try again. Validate the license also on each unit.
04-17-2023 12:52 AM
@scott: When we talk about Primery and Secondary we both mean the "Redundant Unit", right?
Then: no. Primary dies 2 Weeks ago, secondary takes over and runs without Problem. Yesterday we add the spare machine. i change the old secondary to primery to be sure this config will copied to the new one. And set the new to Secondary, of course. i do not know if this was really needed, but i feel better with it
New secondary is able to manage the APs, Primery not. if secondary is the active Part, all is fine, changes the redundancy State it is not.
(Cisco Controller-Standby) >show redundancy summary
Redundancy Mode = SSO ENABLED
Local State = STANDBY HOT
Peer State = ACTIVE
Unit = Primary
Unit ID = 00:C8:8B:99:63:C9
Redundancy State = SSO
Mobility MAC = 00:C8:8B:99:63:C9
Redundancy Port = UP
Average Redundancy Peer Reachability Latency = 1141 Micro Seconds
Average Management Gateway Reachability Latency = 658 Micro Seconds
(Cisco Controller-Standby) >show redundancy det
Redundancy Management IP Address................. 172.24.13.152
Peer Redundancy Management IP Address............ 172.24.13.151
Redundancy Port IP Address....................... 169.254.13.152
Peer Redundancy Port IP Address.................. 169.254.13.151
Redundancy Timeout Values.....:
----------------------------------------------------
Keep Alive Timeout : 100 msecs
Peer Search Timeout : 120 secs
I found a few errors:
Failed to acquire license from the licensing module
could not process TSPEC Cac Stats Update Msg failed
could not process Dynamic Core Add Msg
could not process HREAP/OEAP dynamic core add msg failed
able to add AP *XYZ* entry in the temporary AP database used for CAPWAP HA while Processing capwap shadow core add chkpt message
Topic Licencing:
(Cisco Controller-Standby) >show license all
Feature name: ap_count
License type: Evaluation
License Eula: Not Accepted
Evaluation total period: 12 weeks 6 days
Evaluation period left: 89 days
License state: Active, Not-In-Use
License Nodelocked: Yes
RTU License Count: 1500
==================================
Total available count : 1500
Total inuse count : 0
(Cisco Controller-Standby) >show license summ
Feature name: ap_count
License type: Evaluation
License Eula: Not Accepted
Evaluation total period: 12 weeks 6 days
License state: Active, Not-In-Use
RTU License Count: 1500
And the same from the active part:
(Cisco Controller) >show license all
This is a Controller with HA-SKU license.
The AP base count license was inherited from the Primary Controller.
Any AP base count license on HA-SKU controller is disregarded.
(Cisco Controller) >show license summ
This is a Controller with HA-SKU license.
The AP base count license was inherited from the Primary Controller.
Any AP base count license on HA-SKU controller is disregarded.
(Cisco Controller) >
And from Webinterface:
Any Idea?
04-17-2023 12:57 AM
(Cisco Controller-Standby) >show redundancy infra statistics
RF Client brief
--------------------------------------------------------------
clientID = 0 clientSeq = 0 RF_INTERNAL_MSG
clientID = 4105 clientSeq = 1 SIM_INTERFACE_COMPONENT
clientID = 25 clientSeq = 69 CHKPT RF
clientID = 35 clientSeq = 177 History RF Client
clientID = 4100 clientSeq = 272 RF_CAPWAP client
clientID = 4101 clientSeq = 273 RF_RRM Client
clientID = 4108 clientSeq = 274 RF_APFSEC client
clientID = 4109 clientSeq = 275 RF_DOT1X client
clientID = 4107 clientSeq = 278 RF_ConfigSync client
clientID = 4110 clientSeq = 279 RF_MOBILITY HA client
clientID = 4113 clientSeq = 280 rf_ha_sso client
clientID = 4111 clientSeq = 331 PMIPv6 HA client
clientID = 4112 clientSeq = 332 mDNS HA client
clientID = 14 clientSeq = 333 CHKPT_HA_DHCP_CLIENT_ID
clientID = 15 clientSeq = 335 CHKPT_HA_SLEEP_CLIENT_SYNC_CLI
clientID = 16 clientSeq = 336 CHKPT_HA_DOT11V_DMS_DB_SYNC_CL
clientID = 17 clientSeq = 337 CHKPT_HA_APF_PROFILER_ID
clientID = 18 clientSeq = 338 CHKPT_HA_STANDBY_TO_ACTIVE_MSG
clientID = 19 clientSeq = 339 CHKPT_HA_LYNC_ID
--More-- or (q)uit
clientID = 20 clientSeq = 340 CHKPT_HA_TUNNEL_ID
clientID = 21 clientSeq = 342 CHKPT_HA_CTS_ID
clientID = 22 clientSeq = 343 CHKPT_HA_OPENDNS_ID
clientID = 65000 clientSeq = 344 RF_LAST_CLIENT
--------------------------------------------------------------
Sanity Counters..................
--------------------------------------------------------------
Sanity Messages succefully sent..........: 68621
Sanity Messages failed to send...........: 0
Sanity Messages received from peer.......: 136760
--------------------------------------------------------------
(Cisco Controller-Standby) >
04-17-2023 01:00 AM
(Cisco Controller-Standby) >show redundancy transport statistics
Transport Counters..................
--------------------------------------------------------------
Number of messages in the hold Queue..........: 0
Application mesage Max Size...................: 8840
IPC message Max Size..........................: 8976
Time to hold IPC messages.....................: 100
IPC sequence number in the TX side............: 39572
IPC sequence number mismatches(Low)...........: 0
IPC sequence number mismatched(High)..........: 0
--------------------------------------------------------------
IPC STATS...................
--------------------------------------------------------------
IPC_STATUS:Reliable MSG Send.................. 39572
IPC_STATUS:ACK Received....................... 39572
IPC_STATUS:ACK Received Invalid............... 0
IPC_STATUS:MSG Send Failures.................. 0
IPC_STATUS:MSG Send ERROR in IPCQ............. 0
IPC_STATUS:MSG Send ERROR in IPCSENDQ......... 0
IPC_STATUS:MSG Send Total..................... 0
IPC_STATUS: RsyncMgr socket reopen count...... 0
--------------------------------------------------------------
--More-- or (q)uit
IPC_TIMER_STATS ...................
--------------------------------------------------------------
IPC_STATUS:No of times all retries exhaust.... 0
IPC_STATUS:No of times 1 retry is exhauste.... 0
IPC_STATUS:No of times 2 retries are exhau.... 0
IPC_STATUS:No of times retry is error......... 0
--------------------------------------------------------------
Q_STATS ...................
--------------------------------------------------------------
IPC_STATUS:No of messages in IPC_Q............ 0
IPC_STATUS:No of messages in IPC_SEND_LIST.... 0
IPC_STATUS:IPC_SEND_LIST Max Hit.............. 4
--------------------------------------------------------------
IPC_DELAY_STATS(Avg of 5000 RTs)...................
--------------------------------------------------------------
908911903886
--------------------------------------------------------------
(Cisco Controller-Standby) >
04-17-2023 01:01 AM
(Cisco Controller-Standby) >show redundancy keepalive statistics
Keepalive Counters........:
--------------------------------------------------------------
Keepalive requests sent.................................: 668915
Keepalive responses received............................: 668915
Keepalive requests received from peer...................: 334171
Keepalive responses sent to peer........................: 334171
Keepalive requests failed to send.......................: 0
Keepalive responses failed to send......................: 0
Number of times two Keepalives are lost consecutively...: 0
--------------------------------------------------------------
Network Latencies (RTT) for the Peer Reachability on the Redundancy Management Interface in micro seconds for the past 10 intervals
Peer Reachability Latency[ 1 ] : 1146 Micro Seconds
Peer Reachability Latency[ 2 ] : 1141 Micro Seconds
Peer Reachability Latency[ 3 ] : 1142 Micro Seconds
Peer Reachability Latency[ 4 ] : 1146 Micro Seconds
Peer Reachability Latency[ 5 ] : 1147 Micro Seconds
Peer Reachability Latency[ 6 ] : 1146 Micro Seconds
Peer Reachability Latency[ 7 ] : 1143 Micro Seconds
Peer Reachability Latency[ 8 ] : 1133 Micro Seconds
--More-- or (q)uit
Peer Reachability Latency[ 9 ] : 1141 Micro Seconds
Peer Reachability Latency[ 10 ] : 1143 Micro Seconds
(Cisco Controller-Standby) >
04-17-2023 01:06 AM
- Post the AP boot process when it can not join a controller (in whatever situation) ,
M.
04-17-2023 01:14 AM
Oha, thats not possible. i had to disable the active controller - and kill our wifi (flexconnect)
BR
Siggi
04-17-2023 01:19 AM
with the working machine as active there is no Problem:
CAPWAP State: Discovery
[*03/14/2023 09:21:17.1716] Discovery Response from 172.24.13.150
[*04/17/2023 08:17:43.0000]
[*04/17/2023 08:17:43.0000] CAPWAP State: DTLS Setup
[*04/17/2023 08:17:43.2299]
[*04/17/2023 08:17:43.2299] CAPWAP State: Join
[*04/17/2023 08:17:43.2299] Sending Join request to 172.24.13.150 through port 5248
[*04/17/2023 08:17:47.9485] Join Response from 172.24.13.150
04-17-2023 01:04 AM
(Cisco Controller-Standby) >show redundancy gw-reachability statistics
Gw Reachability Counters........
--------------------------------------------------------------
Gw pings succesfully sent...............................: 67014
Gw responses received...................................: 67014
Gw pings failed to send.................................: 0
Current consecutive Gw responses lost...................: 0
High water mark of Gw responses lost....................: 1
--------------------------------------------------------------
Network Latencies (RTT) for the Management Gateway Reachability in micro seconds for the past 10 intervals
Gateway Reachability Latency[ 1 ] : 697 Micro Seconds
Gateway Reachability Latency[ 2 ] : 515 Micro Seconds
Gateway Reachability Latency[ 3 ] : 488 Micro Seconds
Gateway Reachability Latency[ 4 ] : 549 Micro Seconds
Gateway Reachability Latency[ 5 ] : 622 Micro Seconds
Gateway Reachability Latency[ 6 ] : 533 Micro Seconds
Gateway Reachability Latency[ 7 ] : 668 Micro Seconds
Gateway Reachability Latency[ 8 ] : 631 Micro Seconds
Gateway Reachability Latency[ 9 ] : 667 Micro Seconds
Gateway Reachability Latency[ 10 ] : 605 Micro Seconds
--More-- or (q)uit
(Cisco Controller-Standby) >
04-17-2023 01:04 AM
(Cisco Controller-Standby) >show redundancy config-sync statistics
Config Sync Counters........
--------------------------------------------------------------
Usmdb syncs received........................................: 3409
Failed sync for usmdb sync..................................: 0
(Cisco Controller-Standby) >
04-17-2023 01:09 AM
show redundancy ap-sync statistics
show redundancy client-sync statistics
don't exist in this IOS, just ap or client summery - both are 0
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide