Our WLC running on 8.5.171(recently upgraded) with 1852 AP's every four or five hours my AP's are disconnecting from the controller for unknown reason, Switchport is up, AP uptime is good, so the AP is not rebooting, but AP couldn't able to join the controller. All AP's are in Flexconnect mode with Flex connect local switching enabled.
Controller log: Show AP join stats 00:00:00 shows that
Last AP connection failure: "retransmission to the AP has reached maximum"
last Error occurred: "AP got or has been disconnected"
Last AP disconnect reason: "Unknown failure reason"
Last join error summary
- Type of error that occurred last......................... AP got or has been disconnected
- Reason for error that occurred last...................... Timed out while waiting for ECHO repsonse from the AP
- Time at which the last join error occurred...............
AP disconnect details
- Reason for last AP connection failure.................... Timed out while waiting for ECHO repsonse from the AP
Rebooted controller, rebooted all AP's
Have 25 AP's, around 17 AP's are facing this issue per day,
any immediate help will be appreciable
what is the IP of WLC use in discovery "which I think via DHCP op" is it same primary WLC of HA or it the IP of the WLC backup primary ? 192.168.162.65 is the primary WLC, and 192.168.162.66 is the secondary
backup primary WLC IP is config on both WLC ?? Yes please refer the attached image
are the debug is from Primary or backup primary WLC ? Debug is from Primary WLC
AP & Ver. ? 8.5.151
WLC both Ver.? both are in 8.5.171
show ap image <- check WLC have image Last Image of AP. : yes all AP's are in same image
FYI :Last night i downgraded the WLC and AP from 8.5.171 to 8.5.151. to check if it helps.
i tried 8.5.171, 8.5.151, now trying with 8.3.150 so far no luck. again checking my entire network, starting for switch to core, and their respective uplinks and STP loop. if you any any other readings to help on this let me know
STP loop must solve and need network topology to see why there is Loop.
I will start long long troubleshooting hope in end solve issue,
1- config the Static IP for each AP or make the DHCP lease time long up to one weak,
this only for the AP not for client.
2-AP local mode
3-check that the interface connect to AP have same VLAN of native VLAN ID push from WLC.
4-MUST ALL Config between primary and backup WLC identical ,
Bonjour <- this must be identical if one is OFF the other must be OFF
except the redundancy
primary and backup image must be identical in both WLC
6-config the redundancy by config only the primary backup WLC name and IP in Primary WLC no need to config in in backup WLC
7- disable the Fast heartbeat for both flex and local mode
7-in global config of each WLC
in primary:- config the unit as primary
disable the AP SSO
in backup:- config the unit as secondary
disable the AP SSO
9-license must be support the no. of AP in primary and use HA SKU or license in backup.
if use DHCP as discovery method please add IP of primary WLC ONLY
if it L3 discovery then master flag must enable in Primary WLC
12-disconnect backup WLC
13-connect primary WLC and now turn up AP & primary WLC
Now monitor what happened after 24 hr if it stable then connect backup WLC, and after that also monitor the AP stable.
Have you tried to set one of the access points to local mode? Have you tried to define the high availability on the ap to maybe use the other controller just to see if maybe you have an issue with the one controller or is this happening on both? You do have high availability set on the access points correct?
before upgrade it was in local mode and after upgrade we changed to flex mode. and yes HA is working fine on the event of failure it is switching to another controller with certain interval... i will move some access points to local lets hope after the upgrade it will work.. will let u know the results
What I was trying to say is to have some ap's on each controller, this way you can see if the issue is happening on both or just one controller. I want to make sure you don't have an issue with one of the controllers.
If you do notice that the access points you moved are stable, that means you have an issue with the primary. You might first try to reboot that and then move a few back to the primary to see what happens. I have ran into issues in the past where a reboot fixed the issue or even a factory reset and manual configuration also fixed the issue.