cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8435
Views
27
Helpful
64
Replies

Perhaps just a fluke. - 17.15.1 - 9166i cannot join correctly anymore.

So behind the "strange" title, is a something strange that happened when I upgraded my lab to 17.15.1.

All APs joined and works except a single 9166i (that was originally an -MR, do not know if it is relevant to the "case").

The AP pre-downloaded the software, and everything seems fine, but after reloading everything the 9166i cannot join the WLC.

It gets an IP, CDP says its running the new software, I can ping the AP (until it reloads because it cannot join the WLC).

Disconnect reason when looking at WLC is : "DTLS  close alert from peer".

From what I can tell from the radioactive trace i did, the AP joins, and then it seems like the DTLS phase completes ?!?! (key DTLS Sess: 300000000000003 Inserted successfully).

Then a few ms later : (note): MAC: 10a8.2931.0ee0 AP disconnect initiated. Reason: DTLS close alert from peer, Phase: Join

I found out that the above errors is because the AP boots into the old 17.12.3 software (backup image), after a few times rebooting on the 17.15.1 software where it does not even start its discovery process it seems, then tries to join the WLC that runs 17.15.1. It then closes the DTLS session and reboots into 17.15.1 (primary image) because it already has this software already. - See thread for more interesting feedback and logs from the AP(s).

The more appropriate headline for this should, at the moment, be: "Perhaps just a fluke. - 17.15.1 - 9166 model(s) do not send / do Discovery Request."

Since my other APs ( 9164 normal CAPWAP from "birth" ) are joined and works just fine, I have a suspicion that it might be because this AP originally was an MR ? But that is of course just speculation.

I have attached the radioactive trace if anyone wants to have a look.

When I can physically get to the AP i might know more.

/Thomas

64 Replies 64

I took the two APs with me home. (I dont currently have a WLC booted at home, but let me just boot them and post the results here).

10 - 15 Mins. Hang on.

Here is the complete boot sequence of the 9166i (be aware no WLC is present and no options are given to the AP for finding a WLC).

PS: Currently setting up a VM, so that I can do some proper test.

 

So I booted a 17.12.3 9800-CL because the AP (9166i) had a 17.12.3 software in its backup.

It joined the WLC, as expected.

Now trying the same thing with the 9166D1.

If successful I will upgrade that WLC to 17.15.1 and see what happens.

Radioactive trace and console on the AP for, perhaps, interesting output.

 

PS: 9166D1 joined 17.12.3 as well.

APs of course already have the 17.15.1 software .. so , no real upgrade.

I could of course overwrite the 17.12.3 part .. but right now, I dont really want to

[*08/31/2024 14:07:52.0135] Image pre-download request for version 17.15.1.6.
[*08/31/2024 14:07:52.0654] status 'upgrade.sh: Script called with args:[NO_UPGRADE]'
[*08/31/2024 14:07:52.1104] do NO_UPGRADE, part2 is active part

 

PS:

nothing interesting in the radioactive trace.

9800 activating "new" image on the AP:

[*08/31/2024 14:13:15.1444] status 'upgrade.sh: Script called with args:[ACTIVATE]'
[*08/31/2024 14:13:15.1796] do ACTIVATE, part2 is active part
[*08/31/2024 14:13:15.2506] status 'upgrade.sh: Verifying image signature in part1'
[*08/31/2024 14:13:28.4010] status 'upgrade.sh: status 'Successfully verified image in part1.''
[*08/31/2024 14:13:28.4263] status 'upgrade.sh: activate part1, set BOOT to part1'
[*08/31/2024 14:13:28.7150] status 'upgrade.sh: AP primary version after reload: 17.15.1.6'
[*08/31/2024 14:13:28.7327] status 'upgrade.sh: AP backup version after reload: 17.12.3.31'
[*08/31/2024 14:13:28.8844] status 'upgrade.sh: Create after-upgrade.log'
[*08/31/2024 14:13:28.8981] Ending image activate task

And I think we are back to infinite reload.

It actually never does any discovery with option 43, or DNS, or broadcast it seems. (full console output attached from after the upgrade above, and a couple of reboots) - I think CAPWAP is broken.

The very end of the APs "boot life" ends with.

[*08/31/2024 14:18:41.0489] dtls_enable_sudi: Unable to load RSA SUDI certificate from ACT2, rc: 259
[*08/31/2024 14:18:41.0490] dtls_init: Unable to load SUDI certificate
[*08/31/2024 14:18:41.0491] dtls_init: MIC certificate not present
[*08/31/2024 14:18:41.0491] dtls_init: Unable to load device certificate
[*08/31/2024 14:18:41.0491] DTLS Initialization failed. Status (3)
[*08/31/2024 14:18:41.0744] AP Rebooting: Reset Reason - DTLS init failed

on the WLC this shows up in the radio active trace I had running , now I wonder what LUID is.

But still, the AP does not seem to even try do discover the WLC, I would think that this is something else (perhaps from the conditional global debug).

2024/08/31 15:14:00.454334457 {wncd_x_R0-0}{1}: [image-dwnld-mgr] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.461331105 {wncd_x_R0-0}{1}: [capwapac-smgr-srvr] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480765607 {wncd_x_R0-0}{1}: [ap-join-info-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480768512 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480768763 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.480866037 {wncd_x_R0-0}{1}: [errmsg] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.534690341 {wncd_x_R0-0}{1}: [ewlc-dtls-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.548728313 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.622810296 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.622925214 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.627174269 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:00.635467380 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.682650223 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:00.850667892 {wncd_x_R0-0}{1}: [sanet-shim-miscellaneous] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.252595255 {wncd_x_R0-0}{1}: [apmgr-ap-global] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.260716979 {wncmgrd_R0-0}{1}: [ewlc-infra-evq] [13930]: (note): LUID Resolve Failed
2024/08/31 15:14:01.402285254 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.519258109 {wncd_x_R0-0}{1}: [tdllib] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535143549 {wncd_x_R0-0}{1}: [capwapac-smgr-srvr] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535161082 {wncd_x_R0-0}{1}: [ap-join-info-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535161964 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535162144 {wncd_x_R0-0}{1}: [capwapac-smgr-sess-fsm] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535434141 {wncd_x_R0-0}{1}: [ewlc-dtls-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535479307 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535674338 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535717650 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535759369 {wncd_x_R0-0}{1}: [rrm-client] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.535817009 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.535967234 {wncd_x_R0-0}{1}: [capwapac-smgr-sess] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536067004 {wncd_x_R0-0}{1}: [sanet-shim-miscellaneous] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536410026 {wncd_x_R0-0}{1}: [apmgr-ap-global] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:01.536807271 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (ERR): LUID Resolve Failed
2024/08/31 15:14:01.537855200 {wncmgrd_R0-0}{1}: [ewlc-infra-evq] [13930]: (note): LUID Resolve Failed
2024/08/31 15:14:03.499908636 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:03.499937530 {wncd_x_R0-0}{1}: [apmgr-db] [14321]: (note): LUID Resolve Failed
2024/08/31 15:14:03.747085742 {iosrp_R0-0}{1}: [bfd] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:04.452879283 {iosrp_R0-0}{1}: [bfd] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.301863354 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.301991998 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.302119420 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.302246793 {iosrp_R0-0}{1}: [pki] [24971]: (note): LUID Resolve Failed
2024/08/31 15:14:09.430244530 {iosrp_R0-0}{1}: [parser_cmd] [24971]: (note): LUID Resolve Failed

 

PS: When I do not have conditional debug global enabled, nothing shows up in the radioactive trace for the AP, as expected.

RIP 9166 models on 17.15.1 is my current conclusion.

 

 - I have enormous respect for your efforts , but that is not mine. As an engineer I would then like to see the issue confirmed from different sources and with different APs from the same model. From a 'world view' , that conclusion is too soon. 

M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Yes I know, it is also my "current", I did not say that I cannot change my mind *smiley face*

... Would anyone else like to try some 9166 APs on 17.15.1 and do me that favour ?

Because the AP reverts back to 17.12.3 once in a while, it actually tries to join the WLC on this software, but then of course resets to its backup image that is 17.15.1 - but I can now compare the two start-ups (even though a lot has changed between versions in the output of the console of the AP it seems).

If anyone wants to see, I have attached them here.

I was also thinking corrupt image, but surely Cisco would have tested everything right .... RIGHT !!!

(all APs where not upgraded directly on the same LAN as the WLC, so no low MTU or packetlossy WAN links in between).

Might be a good time to start using https downloads because I think the "lossy links" story was just an excuse for the poor quality code which makes the 9800 CAPWAP downloads and image verification inherently unreliable.  It's not rocket science but you'd think it was by how they've struggled to get something so basic working correctly which worked fine for years on the old code!

I really do like the TCP downloads (as I call them ) they are so much more faster (and inherently more reliable) then the ehhh "old ones".