06-04-2024 09:28 AM
I've got a mix of 9120 and 9130 APs. We run EWC controllers on a handful of them. I think they would all work fine with all using EWC images, but I thought that it may be best to only allow the EWC functionality on a small subset of APs. All are running in Flex mode. I have one AP that I use at my home office (remote worker) that is running in flex mode as well. Whenever I do IOS upgrades at the HQ, all the upgrades complete fine, but I have issues with my AP at home. Connected via VPN of course. I understand that with the mix of 9120/9130 APs that I need an external tftp server, and I do have one. It works well, and is accessible to all of the APs.
My AP at home also works well, and runs the current IOS software fine. I recently had it replaced on RMA because I managed to brick it trying to get this image on the unit with brute force. When I go to the Administraton - Software Management page on my EWC, it shows all of the other APs have these versions:
Primary Image (AP/Controller)
17.9.5.47 /17.09.05.0.6450
Backup Image (AP/Controller)
17.9.4.27 /17.09.04a.0.6
I have 40 APs total, and maybe 5 of them are running EWC images. The rest have been converted to CAPWAP. Mine at home also seemed to work fine when running an EWC image but this issue appears after converting to CAPWAP. I was worried that it would not be good to leave a remote AP that should never be the controller, as an EWC because I didn't want it to end up ever accidentally becoming the controller for the HQ network.
When I look at this AP in my list of software, it shows this:
Primary Image (AP/Controller)
17.9.5.47 /
Backup Image (AP/Controller)
17.9.4.27 /
Note that the controller image is missing completely.
When I boot up this AP, it boots up, connects and registers and works fine, but, it then tries to download an image that I assume is the controller images, and it never succeeds. I'll paste below some of the relevant log lines to see if that helps make sense. In this case, I see it join the controller, which is 10.111.50.10. Then, for some reason, it selects the lowest IP address AP and calls it the "primary server", and sets that to 10.111.50.20. Not sure why it doesn't point it to the tftp server?
[*06/04/2024 15:08:18.5151] OOBImageDnld: OOB Image Download in ap_cap_bitmask(2)
[*06/04/2024 15:08:18.5161] Sending Join request to 10.111.50.10 through port 5259, packet size 1376
[*06/04/2024 15:08:23.2671] OOBImageDnld: OOB Image Download in ap_cap_bitmask(2)
[*06/04/2024 15:08:23.2671] Sending Join request to 10.111.50.10 through port 5259, packet size 896
[*06/04/2024 15:08:23.2911] Join Response from 10.111.50.10, packet size 917
[*06/04/2024 15:08:23.2911] AC accepted previous sent request with result code: 0
[*06/04/2024 15:08:23.7731] Received wlcType 2, timer 30
[*06/04/2024 15:08:23.7771] Primary-Replica: role 3, primary server '10.111.50.20', old 0
[*06/04/2024 15:08:23.7861] OOBImageDnld: Do common error handler for OOB image download..
[*06/04/2024 15:08:23.8681] status 'upgrade.sh: Script called with args:[CANCEL]'
[*06/04/2024 15:08:23.9111] do CANCEL, part2 is active part
[*06/04/2024 15:08:23.9351] status 'upgrade.sh: Cleanup tmp files ...'
[*06/04/2024 15:08:23.9601] Primary-Replica: Set to Replica, retry 44, set_tftp_server 0, Primary server 10.111.50.20, Join role Replica, tftp_timeoutCnt 3
[*06/04/2024 15:08:23.9661] Image pre-download request for version 17.9.5.47.
[*06/04/2024 15:08:24.0061] status 'upgrade.sh: Script called with args:[NO_UPGRADE]'
[*06/04/2024 15:08:24.0491] do NO_UPGRADE, part2 is active part
[*06/04/2024 15:08:24.0541] Resetting replica AP upgrade mode to local mode
It runs a script and sees that the primary image needs no upgrade, and I see this:
[*06/04/2024 15:08:24.1141]
[*06/04/2024 15:08:24.1141] CAPWAP State: Image Data
[*06/04/2024 15:08:24.1151] AP image version 17.9.5.47 backup 17.9.4.27, Controller 17.9.5.47
[*06/04/2024 15:08:24.1151] Version is the same, do not need update.
[*06/04/2024 15:08:24.1551] status 'upgrade.sh: Script called with arg
Then it joins the controller:
[*06/04/2024 15:08:24.6041] CAPWAP State: Run
[*06/04/2024 15:08:24.6581] AP has joined controller MY-EWC-C9800-1
[*06/04/2024 15:08:24.6581] wtpHandleImageSwapEwlcmeTimer: time=0
[*06/04/2024 15:08:24.6601] Flexconnect Switching to Connected Mode!
[*06/04/2024 15:08:24.9321] Previous AP mode is 2, change to 2
Now it does some additional comparing of EWC images and starts to try to download it. Again, from 10.111.50.20, which is not the tftp server:
[*06/04/2024 15:08:25.1961] Recvd Image identifier Payload Image type is 1
[*06/04/2024 15:08:25.1961]
[*06/04/2024 15:08:25.1961] Recvd Image identifier Payload version len is 15
[*06/04/2024 15:08:25.1961]
[*06/04/2024 15:08:25.1961] Recvd Image identifier Payload Image version is [17.09.05.0.6450]
[*06/04/2024 15:08:25.1961]
[*06/04/2024 15:08:25.2011] EWLC image primary len=4 IP=10.111.50.20
[*06/04/2024 15:08:25.2011] Controller eWLC image version len: 15 val[17.09.05.0.6450]
[*06/04/2024 15:08:25.2411] status 'upgrade.sh: Script called with args:[COMPARE_MEWLC_VERSION 17.09.05.0.6450]'
[*06/04/2024 15:08:25.2861] do COMPARE_MEWLC_VERSION, part2 is active part
[*06/04/2024 15:08:25.3121] status 'upgrade.sh: MEWLC version: Ctrl 17.09.05.0.6450, AP , src='
[*06/04/2024 15:08:25.3171] eWLC image_download deleted /part1/wlc/shared/me_controller_image_upgrade /part2/mewlc_info.ver
[*06/04/2024 15:08:25.3171] Downloading EWC image from primary 10.111.50.20
[*06/04/2024 15:08:25.3171] EWC capable
[*06/04/2024 15:08:25.3481] /usr/bin/cli_scripts/image_download.sh called with args:[-c]
[*06/04/2024 15:08:25.3491] Cleanup eWLC files ...
[*06/04/2024 15:08:25.4211] Got WSA Server config TLVs
[*06/04/2024 15:08:27.6563]
[*06/04/2024 15:08:27.6563] *** Unable to connect to: 127.0.0.1:4040 - [Errno 111] Connection refused
[*06/04/2024 15:08:27.6563]
[*06/04/2024 15:08:30.1303]
[*06/04/2024 15:08:30.1303] *** Unable to connect to: 127.0.0.1:4040 - [Errno 111] Connection refused
[*06/04/2024 15:08:30.1303]
[*06/04/2024 15:08:30.3573] Efficient join download did not start
[*06/04/2024 15:08:30.3573] Efficient join download did not start
[*06/04/2024 15:08:30.3573] Efficient join download did not start
[*06/04/2024 15:08:30.3573] Efficient join download did not start
[*06/04/2024 15:08:30.3573] Efficient join download did not start
[*06/04/2024 15:08:32.1803]
[*06/04/2024 15:08:32.1803] *** Unable to connect to: 127.0.0.1:4040 - [Errno 111] Connection refused
[*06/04/2024 15:08:32.1803]
[*06/04/2024 15:08:32.3453] Re-Tx Count=1, Max Re-Tx Value=5, SendSeqNum=17, NumofPendingMsgs=12
[*06/04/2024 15:08:32.3453]
[*06/04/2024 15:08:32.4043] AP tag change to MY-REMOTE-PT
[*06/04/2024 15:08:32.4043] Recvd FQDN Config Payload
[*06/04/2024 15:08:32.4043]
[*06/04/2024 15:08:35.3573] Efficient join download did not start
[*06/04/2024 15:08:40.3573] Efficient join download did not start
[*06/04/2024 15:08:45.3573] Efficient join download did not start
[*06/04/2024 15:08:50.3573] Efficient join download did not start
[*06/04/2024 15:08:53.8933] set cleanair [slot0][band0] enabled
[*06/04/2024 15:08:53.9093] set cleanair [slot0][band1] enabled
[*06/04/2024 15:08:53.9413] set cleanair [slot1][band1] enabled
[*06/04/2024 15:08:55.3573] Efficient join download did not start
I see all the notes for "Efficient join download did not start". Not sure why/what that's about.
Then it actually starts to download the image:
[*06/04/2024 15:08:55.4343] % Total % Received % Xferd Average Speed Time Time Time Current
[*06/04/2024 15:08:55.4343] Dload Upload Total Spent Left Speed
36 288M 36 105M 0 0 62684 0 1:20:30 0:29:25 0:51:05 0
36 288M 36 105M 0 0 62684 0 1:20:30 0:29:25 0:51:05 62684
This paste may be hard to read because of the spacing not showing up, but basically it knows theres a 288M file to download. It starts downloading it, and goes pretty slowly. It gets up to about 105M or so and then dies. Generally it dies around the same time.
Then, it starts a countdown to try again, and does the whole thing over again.
[*06/04/2024 15:38:25.3996] Error in transferring file 28
[*06/04/2024 15:38:25.3996] Failed command [tftp://10.111.50.20/mewlc.bin]
[*06/04/2024 15:38:25.4176] Retrying EWC image in 65 seconds
[*06/04/2024 15:39:27.0305] Controller eWLC image version len: 15 val[17.09.05.0.6450]
[*06/04/2024 15:39:27.0705] status 'upgrade.sh: Script called with args:[COMPARE_MEWLC_VERSION 17.09.05.0.6450]'
[*06/04/2024 15:39:27.1165] do COMPARE_MEWLC_VERSION, part2 is active part
[*06/04/2024 15:39:27.1425] status 'upgrade.sh: MEWLC version: Ctrl 17.09.05.0.6450, AP , src='
[*06/04/2024 15:39:27.1475] eWLC image_download deleted /part1/wlc/shared/me_controller_image_upgrade /part2/mewlc_info.ver
[*06/04/2024 15:39:27.1485] Downloading EWC image from primary 10.111.50.20
[*06/04/2024 15:39:27.1485] EWC capable
[*06/04/2024 15:39:27.1755] /usr/bin/cli_scripts/image_download.sh called with args:[-c]
[*06/04/2024 15:39:27.1755] Cleanup eWLC files ...
[*06/04/2024 15:39:32.2685] Efficient join download did not start
[*06/04/2024 15:39:32.2685] Efficient join download did not start
[*06/04/2024 15:39:32.2685] Efficient join download did not start
[*06/04/2024 15:39:32.2685] Efficient join download did not start
[*06/04/2024 15:39:32.2685] Efficient join download did not start
[*06/04/2024 15:39:37.2685] Efficient join download did not start
[*06/04/2024 15:39:42.2685] Efficient join download did not start
[*06/04/2024 15:39:47.2685] Efficient join download did not start
[*06/04/2024 15:39:52.2685] Efficient join download did not start
[*06/04/2024 15:39:57.2685] Efficient join download did not start
[*06/04/2024 15:39:57.3405] % Total % Received % Xferd Average Speed Time Time Time Current
[*06/04/2024 15:39:57.3405] Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:14 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:19 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:24 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:29 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:34 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:39 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:44 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:49 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:54 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:00:59 --:--:-- 0Efficient join download did not start
0 0 0 0 0 0 0 0 --:--:-- 0:01:00 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:01:00 --:--:-- 0
[*06/04/2024 15:41:02.2704] Efficient join download did not start
[*06/04/2024 15:41:02.2704] Error in transferring file 28
[*06/04/2024 15:41:02.2704] Failed command [tftp://10.111.50.20/mewlc.bin]
[*06/04/2024 15:41:02.2724] Retrying EWC image in 65 seconds
I'm not sure what I can do about this. In an ideal world, I would log on to the AP or controller, and tell it to go get the EWC image off the tftp server and install it. This should work fine. The VPN generally works great for everything, and I'm connected through it all day long. It's a site-to-site hardware VPN and is real reliable. I don't know if I am just not finding a good command to use to force a single AP to download an image from the tftp server, but I'm just at a loss as to getting this AP to be fully upgraded.
The biggest issue isn't that the AP doesn't work. It's that it just continuously all day will keep retrying to download that image, never succeeding. At this point the only way I think I could surely get it upgraded is if I put together another local EWC, and temporarily join it to that one here at home, and then let it re-join the HQ controllers after that.
Any tips to help me overcome this? I'd love to hear that I can download the image direct to the AP from any tftp server, but I'm just not finding the magic combination of commands to get a local download completed.
09-01-2024 10:50 PM
1. Efficient image upgrade tries to copy the image from a neighbouring AP. Should only ever be used when your APs all sit next to each other on a LAN - not over a VPN! Disable it in the flex profile:
wireless profile flex default-flex-profile
no predownload
If you're using a custom flex profile then obviously do it in the custom flex profile.
The name of the command is completely misleading - it does not disable predownload, just efficient image upgrade. One of the many commands where the IOS-XE developer who wrote the code didn't understand what it was doing!
2. 9800 series IOS-XE has major problems with image downloads being corrupted:
https://www.cisco.com/c/en/us/support/docs/field-notices/741/fn74109.html
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220443-how-to-avoid-boot-loop-due-to-corrupted.html
https://www.cisco.com/c/en/us/support/docs/wireless/wireless-lan-controller-software/221869-safely-upgrade-access-points-avoiding-i.html
You'll see me and others talking about this at length in other threads.
3. Recovery: personally I have found the safest way to recover a 9120 in this state (I've had to do it a few times) is to completely re-flash it. This also removes all traces of EWC. https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9120axi-access-point/217537-repairing-c9120-c9115-access-points-from.html
That will leave it with 8.10.130.0 CAPWAP image installed.
Then upgrade it manually to 8.10.196.0 CAPWAP using archive download-sw /reload tftp://<local TFTP IP>/ap1g7-k9w8-tar.153-3.JK11.tar
Download from https://software.cisco.com/download/home/286322988/type/286288051/release/15.3.3-JK11
Then upgrade to 17.6.7 CAPWAP using archive download-sw /reload tftp://<local TFTP IP>/ap1g7-k9w8-tar.153-3.JPK8.tar
Download from https://software.cisco.com/download/home/286322988/type/286288051/release/15.3.3-JPK8
Then upgrade to version of your choice. The mapping for AP image version to IOS-XE is in the compatibility matrix (link below).
Once you have reached the correct version you could then convert to EWC if you wanted to but I recommend keeping it on CAPWAP only.
Make sure you use a local TFTP server (for example your PC) - do not try doing it over your VPN.
Once you're on recent code it also supports http and sftp for the download - use ? after /reload to see the options. So for a quicker download you could use those if you have an SFTP server (eg FileZilla) or a web server.
As for what version to go for - I'd usually say go for the TAC recommended version (link below) but in this case 17.15.1 should have all the latest fixes for the download and image corruption issues so you might want to go for that.
Good luck ...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide