08-23-2024 01:25 AM - edited 08-31-2024 01:55 PM
So behind the "strange" title, is a something strange that happened when I upgraded my lab to 17.15.1.
All APs joined and works except a single 9166i (that was originally an -MR, do not know if it is relevant to the "case").
The AP pre-downloaded the software, and everything seems fine, but after reloading everything the 9166i cannot join the WLC.
It gets an IP, CDP says its running the new software, I can ping the AP (until it reloads because it cannot join the WLC).
Disconnect reason when looking at WLC is : "DTLS close alert from peer".
From what I can tell from the radioactive trace i did, the AP joins, and then it seems like the DTLS phase completes ?!?! (key DTLS Sess: 300000000000003 Inserted successfully).
Then a few ms later : (note): MAC: 10a8.2931.0ee0 AP disconnect initiated. Reason: DTLS close alert from peer, Phase: Join
I found out that the above errors is because the AP boots into the old 17.12.3 software (backup image), after a few times rebooting on the 17.15.1 software where it does not even start its discovery process it seems, then tries to join the WLC that runs 17.15.1. It then closes the DTLS session and reboots into 17.15.1 (primary image) because it already has this software already. - See thread for more interesting feedback and logs from the AP(s).
The more appropriate headline for this should, at the moment, be: "Perhaps just a fluke. - 17.15.1 - 9166 model(s) do not send / do Discovery Request."
Since my other APs ( 9164 normal CAPWAP from "birth" ) are joined and works just fine, I have a suspicion that it might be because this AP originally was an MR ? But that is of course just speculation.
I have attached the radioactive trace if anyone wants to have a look.
When I can physically get to the AP i might know more.
/Thomas
Solved! Go to Solution.
08-31-2024 03:55 AM
I took the two APs with me home. (I dont currently have a WLC booted at home, but let me just boot them and post the results here).
10 - 15 Mins. Hang on.
08-31-2024 04:18 AM - edited 08-31-2024 04:19 AM
08-31-2024 06:54 AM - edited 08-31-2024 07:00 AM
So I booted a 17.12.3 9800-CL because the AP (9166i) had a 17.12.3 software in its backup.
It joined the WLC, as expected.
Now trying the same thing with the 9166D1.
If successful I will upgrade that WLC to 17.15.1 and see what happens.
Radioactive trace and console on the AP for, perhaps, interesting output.
PS: 9166D1 joined 17.12.3 as well.
08-31-2024 07:09 AM - edited 08-31-2024 07:13 AM
APs of course already have the 17.15.1 software .. so , no real upgrade.
I could of course overwrite the 17.12.3 part .. but right now, I dont really want to
[*08/31/2024 14:07:52.0135] Image pre-download request for version 17.15.1.6.
[*08/31/2024 14:07:52.0654] status 'upgrade.sh: Script called with args:[NO_UPGRADE]'
[*08/31/2024 14:07:52.1104] do NO_UPGRADE, part2 is active part
PS:
nothing interesting in the radioactive trace.
08-31-2024 07:14 AM
9800 activating "new" image on the AP:
[*08/31/2024 14:13:15.1444] status 'upgrade.sh: Script called with args:[ACTIVATE]'
[*08/31/2024 14:13:15.1796] do ACTIVATE, part2 is active part
[*08/31/2024 14:13:15.2506] status 'upgrade.sh: Verifying image signature in part1'
[*08/31/2024 14:13:28.4010] status 'upgrade.sh: status 'Successfully verified image in part1.''
[*08/31/2024 14:13:28.4263] status 'upgrade.sh: activate part1, set BOOT to part1'
[*08/31/2024 14:13:28.7150] status 'upgrade.sh: AP primary version after reload: 17.15.1.6'
[*08/31/2024 14:13:28.7327] status 'upgrade.sh: AP backup version after reload: 17.12.3.31'
[*08/31/2024 14:13:28.8844] status 'upgrade.sh: Create after-upgrade.log'
[*08/31/2024 14:13:28.8981] Ending image activate task
08-31-2024 07:18 AM
And I think we are back to infinite reload.
08-31-2024 07:25 AM
It actually never does any discovery with option 43, or DNS, or broadcast it seems. (full console output attached from after the upgrade above, and a couple of reboots) - I think CAPWAP is broken.
The very end of the APs "boot life" ends with.
[*08/31/2024 14:18:41.0489] dtls_enable_sudi: Unable to load RSA SUDI certificate from ACT2, rc: 259
[*08/31/2024 14:18:41.0490] dtls_init: Unable to load SUDI certificate
[*08/31/2024 14:18:41.0491] dtls_init: MIC certificate not present
[*08/31/2024 14:18:41.0491] dtls_init: Unable to load device certificate
[*08/31/2024 14:18:41.0491] DTLS Initialization failed. Status (3)
[*08/31/2024 14:18:41.0744] AP Rebooting: Reset Reason - DTLS init failed
08-31-2024 07:30 AM - edited 08-31-2024 07:45 AM
on the WLC this shows up in the radio active trace I had running , now I wonder what LUID is.
But still, the AP does not seem to even try do discover the WLC, I would think that this is something else (perhaps from the conditional global debug).
2024/08/31 15:14:00.454334457 {wncd_x_R0-0}{1}: [image-dwnld-mgr] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.461331105 {wncd_x_R0-0}{1}: [capwapac-smgr-srvr] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480765607 {wncd_x_R0-0}{1}: [ap-join-info-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480768512 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480768763 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480866037 {wncd_x_R0-0}{1}: [errmsg] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.534690341 {wncd_x_R0-0}{1}: [ewlc-dtls-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.548728313 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.622810296 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.622925214 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.627174269 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.635467380 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.682650223 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.850667892 {wncd_x_R0-0}{1}: [sanet-shim-miscellaneous] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.252595255 {wncd_x_R0-0}{1}: [apmgr-ap-global] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.260716979 {wncmgrd_R0-0}{1}: [ewlc-infra-evq] [13930]: (note): LUID Resolve Failed
2024/08/31 15:14:01.402285254 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.519258109 {wncd_x_R0-0}{1}: [tdllib] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535143549 {wncd_x_R0-0}{1}: [capwapac-smgr-srvr] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535161082 {wncd_x_R0-0}{1}: [ap-join-info-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535161964 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535162144 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535434141 {wncd_x_R0-0}{1}: [ewlc-dtls-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535479307 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535674338 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535717650 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535759369 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535817009 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535967234 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536067004 {wncd_x_R0-0}{1}: [sanet-shim-miscellaneous] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536410026 {wncd_x_R0-0}{1}: [apmgr-ap-global] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536807271 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.537855200 {wncmgrd_R0-0}{1}: [ewlc-infra-evq] [13930]: (note): LUID Resolve Failed
2024/08/31 15:14:03.499908636 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:03.499937530 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:03.747085742 {iosrp_R0-0}{1}: [bfd] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:04.452879283 {iosrp_R0-0}{1}: [bfd] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.301863354 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.301991998 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.302119420 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.302246793 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.430244530 {iosrp_R0-0}{1}: [parser_cmd] [24971]: (note): LUID Resolve Failed
PS: When I do not have conditional debug global enabled, nothing shows up in the radioactive trace for the AP, as expected.
08-31-2024 07:31 AM
RIP 9166 models on 17.15.1 is my current conclusion.
08-31-2024 07:47 AM
- I have enormous respect for your efforts , but that is not mine. As an engineer I would then like to see the issue confirmed from different sources and with different APs from the same model. From a 'world view' , that conclusion is too soon.
M.
08-31-2024 07:52 AM - edited 08-31-2024 07:56 AM
Yes I know, it is also my "current", I did not say that I cannot change my mind *smiley face*
... Would anyone else like to try some 9166 APs on 17.15.1 and do me that favour ?
08-31-2024 08:50 AM
Because the AP reverts back to 17.12.3 once in a while, it actually tries to join the WLC on this software, but then of course resets to its backup image that is 17.15.1 - but I can now compare the two start-ups (even though a lot has changed between versions in the output of the console of the AP it seems).
If anyone wants to see, I have attached them here.
08-31-2024 03:50 AM
I was also thinking corrupt image, but surely Cisco would have tested everything right .... RIGHT !!!
(all APs where not upgraded directly on the same LAN as the WLC, so no low MTU or packetlossy WAN links in between).
08-31-2024 03:56 AM
Might be a good time to start using https downloads because I think the "lossy links" story was just an excuse for the poor quality code which makes the 9800 CAPWAP downloads and image verification inherently unreliable. It's not rocket science but you'd think it was by how they've struggled to get something so basic working correctly which worked fine for years on the old code!
08-31-2024 04:05 AM
I really do like the TCP downloads (as I call them
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide