Cisco AIR-AP1852I Not Connecting to WLC - Page 2

KGH0511 · ‎08-19-2023

I'm managing a network with 266 of the above WAPs with dual 5520 WLC's. Up until yesterday all WAPs were connected to the controller and everything was humming along nicely. The switch stack that the WLC's connect to had a wobbly and needed to be rebooted. After the reboot I noticed that I had about 10 WAP's not coming back on line. I physcially observed each unit and the LED status is flashing red on all 10 of them.

I took one WAP to the workbench, connected it to the appropriate VLAN (power is via PoE from the switch) connected in over the console with a laptop to see what is going on. I could see it wouldn't connect to the controller and discovery failed unable to open SSH Daemon. The WAP was not getting an IP via DHCP. I assigned a static IP as well as manually configured the primary controller. Below is a snapshot from what happened after that;

[*08/19/2023 09:37:33.6517]
[*08/19/2023 09:37:33.6517] CAPWAP State: Discovery
[*08/19/2023 09:37:33.6517] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:37:33.6617] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:37:33.6617] Discovery Request sent to 255.255.255.255, discovery type UNKNOWN(0)
[*08/19/2023 09:38:03.1125]
[*08/19/2023 09:38:03.1125] CAPWAP State: Discovery
[*08/19/2023 09:38:03.1125] Discovery failed 5 times. Check Release/Renew DHCP AP CAPWAP MODE:[1] controller previously connected:[0]
[*08/19/2023 09:38:03.1125] CAPWAPd forces DHCP restart.
[*08/19/2023 09:38:03.1225] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:38:03.1225] Discovery Request sent to 255.255.255.255, discovery type UNKNOWN(0)
[*08/19/2023 09:38:22.5664] WTP IP address changed from 0.0.0.0 to 10.61.XX.XX, restart CAPWAP.
[*08/19/2023 09:38:22.5664]
[*08/19/2023 09:38:22.5664]
[*08/19/2023 09:38:22.5664] Going to restart CAPWAP (reason : WTP IP address changed)...
[*08/19/2023 09:38:22.5664]
[*08/19/2023 09:38:22.5664] Restarting CAPWAP State Machine.
[*08/19/2023 09:38:22.5664] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: Discovery(2).
[*08/19/2023 09:38:22.6064]
[*08/19/2023 09:38:22.6064] CAPWAP State: DTLS Teardown
[*08/19/2023 09:38:22.7564] upgrade.sh: Script called with args:[ABORT]
[*08/19/2023 09:38:22.7964] do ABORT, part2 is active part
[*08/19/2023 09:38:22.8263] upgrade.sh: Cleanup tmp files ...
[*08/19/2023 09:38:22.8563] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Teardown(4).
[*08/19/2023 09:38:22.8563] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Teardown(4).
[*08/19/2023 09:38:37.4318]
[*08/19/2023 09:38:37.4318] CAPWAP State: Discovery
[*08/19/2023 09:38:37.4718] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:38:37.4718] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:38:37.4718] Discovery Request sent to 255.255.255.255, discovery type UNKNOWN(0)
[*08/19/2023 09:39:06.9825]
[*08/19/2023 09:39:06.9825] CAPWAP State: Discovery
[*08/19/2023 09:39:06.9925] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:39:06.9925] Discovery Request sent to 10.61.XX.XX, discovery type STATIC_CONFIG(1)
[*08/19/2023 09:39:07.0225] Discovery Request sent to 255.255.255.255, discovery type UNKNOWN(0)

On the WLC under security and AP policies, Accept Manufactured Installed Certificate is selected and nothing else. I have tried manually adding the MAC address of the AP's primary ethernet interface under the AP Authorization list and it's made no difference.

I have noticed that the time on the AP is not the same as the time on the WLC. However, I do not know the command to manually change the time when consoled into the AP. I've ? searched through all the menus and I don't see any option for manually adjusting the time. All the other AP's in the plant have sucessfully come back barring 10 pieces which are all behaving in the same manner. I have taken a spare AP from inventory and connected it in place of one of the units thats not behaving and the new unit connects to the WLC and comes online right away. I have also done a full factory reset on the unit I'm working with on the bench.

KGH0511 · ‎08-22-2023

Here is the switchport config for the test port I configured with a working WAP from spares connected;

Name: Gi4/0/28
Switchport: Enabled
Administrative Mode: static access
Operational Mode: static access
Administrative Trunking Encapsulation: dot1q
Operational Trunking Encapsulation: native
Negotiation of Trunking: Off
Access Mode VLAN: 18 (mgmt_ap)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: disabled
Voice VLAN: none
Administrative private-vlan host-association: none
Administrative private-vlan mapping: none
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk Native VLAN tagging: enabled
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk associations: none
Administrative private-vlan trunk mappings: none
Operational private-vlan: none
Trunking VLANs Enabled: ALL
Pruning VLANs Enabled: 2-1001
Capture Mode Disabled
Capture VLANs Allowed: ALL

This currently has a new WAP connected and is operational. All other switchports with WAPs connected are configured in the same manner, including the ones that have stopped talking to the controller.

Sorry, I thought that you were referring to the DHCP option on the WLC, under Controller => Advanced => DHCP. That is set to option 82. On the DHCP server is it set to 43.

Still have 10 units which do not connect to the WLC. One or two I can accept as hardware issues. But 10 pieces right after a switch reboot seems like an excessive amount to simultaneously develop hardware issues, after a single event caused all WAPs to disconnect from the WLC.

marce1000 · ‎08-22-2023

>...Still have 10 units which do not connect to the WLC
- Could you try a reboot of the controller (too) and check if that changes anything ?

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

KGH0511 · ‎08-22-2023

It would be better to wait until the weekend, just in case I lost any more AP's. Presently I can get by with 10 out of action. The coverage is patchy in some locations, but it's not critical.

marce1000 · ‎08-22-2023

>It would be better to wait until the weekend, just in case I lost any more AP's. Presently I can get by with 10 out of action. The coverage is patchy in some locations, but it's not critical.
Good plan ,

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R · ‎08-22-2023

Agreed 10 sounds like a lot even for 18xx, although we do RMA around 20 x 1832 per month (installed base of many thousands). The failure rate is considerably higher than any other model of AP we use.

I meant the running-config on the port - can you do "sh run int Gi4/0/28"?

Your AP logs do not seem to show any indication of receiving option 43 from DHCP. If it received option 43 then I would expect to see "Got WLC address x.x.x.x from DHCP." in the logs.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

KGH0511 · ‎08-23-2023

Normally I have a failure rate of 1 a month on average. I've had the controller offline on a number of occasions in the past and never experienced anything like this.

The running config on the port is below;

Current configuration : 309 bytes
!
interface GigabitEthernet4/0/28
description AV-OFFICE 70802-1
switchport access vlan 18
switchport mode access
authentication event fail action authorize vlan 888
authentication open
authentication order mab
authentication port-control auto
mab
dot1x pae authenticator
spanning-tree portfast
end

Thats my test port, have compared with other ports that APs are currently running on and ones that APs are not running on and they are all the same. Using one of the test AP's from spares (the new unit without the static address), while logged in via the console I can ping the DHCP server.

One thing I have noticed. Of the 265 WAP's the controller is showing 255 at any given time. I put two units from spares inventory on the network, and they connect fine, still 255 AP's. It should be 257. I take the two spares off, and I've still got 255. There may be something going on with the license.

marce1000 · ‎08-23-2023

>...Of the 265 WAP's the controller is showing 255 at any given time. I...
- This could be a bug too , in that context consider https://www.cisco.com/c/en/us/support/docs/wireless/wireless-lan-controller-software/200046-tac-recommended-aireos.html , referring to https://software.cisco.com/download/home/286284738/type/280926587/release/8.10.185.0
As the aireos platforms are getting older it becomes more and more advisable to use the last recommended release for a particular model.

- You can also have a checkup of the controller configuration using https://community.cisco.com/t5/networking-knowledge-base/show-the-complete-configuration-without-breaks-pauses-on-cisco/ta-p/3115114#toc-hId-1039672820
Have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer/

You may get insights concerning current issues too.

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R · ‎08-24-2023

Yes limit of 255 does sound suspicious!

That port config looks a whole lot more complicated than ours. You're sure the switch is not putting the port into blocking or err-disabled? Maybe try with a simpler config and see whether it helps?
interface GigabitEthernet4/0/28
switchport access vlan 18
switchport mode access
spanning-tree portfast
spanning-tree bpdufilter enable
spanning-tree bpduguard disable
speed auto
duplex auto
cdp enable
end
The bpduguard and bpdufilter was config TAC advised us to use because the APs randomly send BPDUs at boot time which can trigger the port to be shut.
You might also want to take a look at https://bst.cisco.com/bugsearch/bug/CSCwf45495 expected to be fixed in forthcoming 8.10MR10.

> I've had the controller offline on a number of occasions in the past and never experienced anything like this.
We found the failures increased exponentially when the APs reached 2-3 years old.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390