cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1819
Views
3
Helpful
11
Replies

3802i AP and Migration from 5508 to v9800-CL

russell.s
Level 1
Level 1

Hi All,

Thank you for taking the time to look at this post.  I've got a bit of a situation where as I am tasked with Migrating all of our AP's (3802i) site-by-site from a pair of Physical WLC's at the Datacenter to a Pair of Virtual WLCs in the Cloud.

So a bit about the setup: 

  • All sites are connected via Cisco SDWAN.  There are 2 Cloud vWLC 9800s running 17.9.4a (Cisco Recommended). 
  • At the Datacenter, there are 2x Physical 5508 WLCs.  These Physical WLCs are running 8.5.176.1
  • AP's at each site are Cisco 3802i running a primary version of 8.5.176.1

Now these sites on SDWAN are on 300+Mbps links, many are provided 1Gig DIA.  

The challenge I am having is that the first 3 sites took 1 Hour to download their image from the new vWLC in the cloud.  The last site I did had 2 AP's and it took 2hours 11 minutes per AP.  This is CRAZY!!  Tomorrow I have to migrate a site with 7 APs and if the time is similar to yesterday, It could surpass 15 hours to migrate the 7 APs from Physical on-prem to vWLCs.

The upgrade path I am taking is as follows. (Had a tac case to get to this):

  1. Update DHCP Option 43 at Site L3 Switch
  2. Login to 5508 WLC, Navigate to AP, Click "Clear All Config"
  3. AP Reboots, Gets Option 43, Reaches out to Cloud vWLC
  4. Sits in "Downloading" state for 1-2 hours
  5. Reboots
  6. Registers awaiting assignment of Join Profile

In doing some googling, I found something about being able to use a TFTP server at the site to pre-stage/download the AP software ahead of moving to the new controller.  So, I've ssh'd into the AP and found the commands, but I can not find the software that the AP should be running.  Looking at the Cisco site,  I only find "ap3g3-k9w8-tar.153-3.JPS.tar" Maybe I am missing something here, because 15 < 17.  AP's that have been migrated show that they are running 17.9.4.27 code.  I've looked on the vWLC and I do not find this code.

I am kinda at a loss here as I feel that there has to be a better way to do this migration.  The 5508 WLCs are EoL and we are trying to get off of them before they die on us.  We are not intending to make any sort of configuration change on the 5508 ahead of powering these devices off.  During each change window, we are migrating ALL of the APs at the site to the new vWLC and not planning on having any sort of roaming between the APs during the migration. 

 

Any help, suggestions, comments would be greatly helpful.  I've seen a few other posts on here, but they were lacking information such as versions, model numbers, etc...  I've tried to paint a good picture to help understand the current situation and the end goal.

Tac case mentioned above was due to the AP NOT registering with the cloud vWLC. The details there are that the AP did not trust the new vWLC's Certificate and thus the AP (not WLC) halted registration and went back and registered to the 5508 WLCs.  With TAC on the call, we did a factory reset of the AP and this allowed it to join and download (1hr) the upgrade.  This is what has led me to the procedure that I've outlined above.

Migrated-AP.pngNon-Migrated-AP.png

1 Accepted Solution

Accepted Solutions


@Leo Laohoo wrote:


@russell.s wrote:
What were you coming from?  Was the 9800 onsite?

4.  Instruct the WLC to "tell" the AP to download the 17.9.4a firmware:  debug ap command "archive download-sw /no-reload tftp://<IP ADDRESS>/ap3g3-k9w8-tar.153-3.JPN3.tar" <AP NAME>


I was able to successfully migrate all 7 APs in 2 hours.  It was a rocky start, but this command was pivotal in the process.

!=======================================
! SSH to AP, Tell AP to download from TFTP
!=======================================

  1. AP# archive download-sw /no-reload tftp://10.x.y.z/ap3g3-k9w8-tar.153-3.JPN3.tar  (took under 1 minute)
  2. AP# config boot path 2
  3. AP# capwap ap erase [all]

AP will reboot about 3 or 4 times, and then register with the WLC in the cloud via Option 43 configured in DHCP.

Reason I went with SSH'ing into the AP directly is because a few (3) of the AP's failed to download the image properly.  SSH'ing into the AP didn't fix this, but allowed me to see the error message.

On a whim, I rebooted the 1 of the 3 AP's that were failing to download the file from TFTP.  Upon reboot, they downloaded the file properly and worked perfectly.

Timing @ 6 minutes:

WAP 01 3:25pm - capwap ap erase all
WAP 01 3:31pm - CDP Neighbors Seen
WAP 01 3:31pm - In WLC for Configuration

-------


MY-FAVORITE-WAP01#archive download-sw /no-reload tftp://10.x.y.z/ap3g3-k9w8-tar.153-3.JPN3.tar

############################################################################################################################################################################ 100.0%
Image downloaded, writing to flash...
do PREDOWNLOAD, part1 is active part
Image signing verify success.

[5/15/2024 20:50:24] Programming master with bundle version 148Programming master with bundle version 148Magic : 62 6F 6F 74 69 6E 66 6F
Master:
Version : 0x0148
Source : bundle
Retry : 0x1
Timestamp : 1715806163
Shadow:
Version : 0x0124
Source : flash
Retry : 0x1
Timestamp : 1611273600
Blacklisted:
Version : 0x0000
Source : unknown

[5/15/2024 20:50:35] Master now upgradedMaster now upgradedupgrade.sh: part to upgrade is part2
upgrade.sh: Writing image to disk...
sh: 0: unknown operand
upgrade.sh: AP backup version: 17.9.4.27
do ACTIVATE, part1 is active part
upgrade.sh: activate part2, set BOOT to part2
upgrade.sh: AP primary version: 17.9.4.27
upgrade.sh: AP backup version: 8.5.176.1
Archive done.

MY-FAVORITE-WAP01#sh ver | be 3802

MY-FAVORITE-WAP01 uptime is 39 days, 1 hours, 06 minutes
Last reload time : Fri Sep 1 00:39:09 UTC 2023
AP Running Image : 8.5.176.1
Primary Boot Image : 8.5.176.1
Backup Boot Image : 17.9.4.27
1 Multigigabit Ethernet interfaces
1 Gigabit Ethernet interfaces
2 802.11 Radios

MY-FAVORITE-WAP01#config boot path 2

MY-FAVORITE-WAP01#capwap ap erase all
This command will clear ap config and reboot the AP.
Are you sure you want continue? [confirm]
MY-FAVORITE-WAP01#
Remote side unexpectedly closed network connection

 

 

 

 

View solution in original post

11 Replies 11

Leo Laohoo
Hall of Fame
Hall of Fame

Yeah, f*ck no.  I migrated about 2000 x 4800 (from 8.10.13X.0) to 9800 and the outage (per AP) is 4 minutes and 45 seconds flat.  And, here's the fun bit, I did this during office & clinical hours.  And the sites (plural) with different buildings in a 24x 7 hospital campus.  

But, sadly, I "lost" 2 of the APs (unrecovered and boot into u-boot).  However, the APs were easily replaced.

Leo, Sorry for the delay in responding... SOMEONE marked my post as SPAMWHOEVER THAT WAS SHOULD HAVE THEIR ACCESS REVOKED.  READ WHAT CISCO DEFINES SPAM AS BEFORE YOU MARK IT AS SPAM!!!🤬 I'm just a bit pissed.

As John Oliver would say, Moving On.... Could you expand on your topology?  I've ran into MTU issues in the past with SDWAN and WLCs.  I dont know that this is the issue currently as there is no feedback indicating that this might be an issue.  

In your post, you said you went to a 9800.  What were you coming from?  Was the 9800 onsite?  Needless to say, I am envious of your sub 5 minute time.


@russell.s wrote:
Leo, Sorry for the delay in responding... SOMEONE marked my post as SPAM WHOEVER THAT WAS SHOULD HAVE THEIR ACCESS REVOKED.  READ WHAT CISCO DEFINES SPAM AS BEFORE YOU MARK IT AS SPAM!!!🤬 I'm just a bit pissed.

As John Oliver would say, Moving On.... Could you expand on your topology?  I've ran into MTU issues in the past with SDWAN and WLCs.  I dont know that this is the issue currently as there is no feedback indicating that this might be an issue. 


All good.  TLDR:  I'm sure this was all an honest mistake.  

 


@russell.s wrote:
What were you coming from?  Was the 9800 onsite?

The APs were coming 8.10.13X.0 to 9800 on 17.6.5.  

This is the process:  
NOTE:  Destination firmware is 17.9.4a. 

1.  Download the 17.9.4a firmware for the 2800/3800/4800/1560 family of APs (filename:  ap3g3-k9w8-tar.153-3.JPN3.tar) and put the file in a TFTP server.  
NOTE:  Depending on the size of the WAN link, congestion, utilization, consider about putting the file "closer" to the site.  

2.  Get a list of APs (WLC Command:  sh ap summary)

3.  Get the AireOS WLC to prepare "remote commnand" to the AP:  debug ap enable <AP NAME>

4.  Instruct the WLC to "tell" the AP to download the 17.9.4a firmware:  debug ap command "archive download-sw /no-reload tftp://<IP ADDRESS>/ap3g3-k9w8-tar.153-3.JPN3.tar" <AP NAME>

5.  Depending on the speed of the link of the AP and the TFTP server (where the firmware file is stored), it will take 60 seconds for the AP to download the firmware and another 30 seconds to extract the files into the AP flash.  So wait for 2 minutes before proceeding to Step 6.  

6.  Configure the AP to immediately move to the new controller:  config ap primary-base <9800 NAME> <AP NAME> <9800 IP ADDRESS>

7.  Forcefully reboot the AP:  config ap reset <AP NAME>

IMPORTANT:  When entering the command in Step 6, the command to forcefully reboot the AP (Step 7) must follow immediately. 

 

I downloaded the files and placed them on a TFTP local to the site.  The commands above worked to download the file from the TFTP server in under 1 minute.  I executed Step 6 and 7 back-to-back.  The AP rebooted, but never came to the new controller.  I SSH'd into the AP and found that it had reverted back to the Old Controller.

I grabbed the following from the AP x.x.x.x is the old controllers, and y.y.y.y is new controller.

May 15 17:12:53 kernel: [*05/15/2024 17:12:53.8768] Discovery Response from 10.x.x.x
May 15 17:12:53 kernel: [*05/15/2024 17:12:53.8861] Discovery Response from 10.x.x.x
May 15 17:12:53 kernel: [*05/15/2024 17:12:53.8938] Discovery Response from 10.y.y.y
May 15 17:12:53 kernel: [*05/15/2024 17:12:53.9016] Discovery Response from 10.x.x.x
May 15 17:12:53 kernel: [*05/15/2024 17:12:53.9094] Discovery Response from 10.x.x.x
May 15 17:12:53 kernel: [*05/15/2024 17:12:53.9351] Discovery Response from 10.y.y.y
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.0000]
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.0000] CAPWAP State: DTLS Setup
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5417] spamCheck_valid_vWLC_X509: SSC Hash not allowed
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5417]
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5528] display_verify_cert_status: Verify Cert: FAILED at 1 depth: self signed certificate in certificate chain
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5535] dtls_verify_con_cert: Controller certificate verification error
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5535] dtls_process_packet: Controller certificate verification failed
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5556] sendPacketToDtls: DTLS: Closing connection 0xbb6c00.
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.5557] Restarting CAPWAP State Machine.
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.6600]
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.6600] CAPWAP State: DTLS Teardown
May 15 17:12:54 kernel: [*05/15/2024 17:12:54.0000]

After this, it rejoins to the OLD controller.

I saw something about disabling the SSC check, but this looks to be a controller based configuration.  I am not sure if this will have impacts on other AP's that are not in the change window.  I don't want to bring the other sites down

The difference between the TAC procedure and Leo's is that TAC's does a factory reset.
So I suggest instead of Leo's 6 & 7 do:
clear ap config <Cisco AP>

Then it will reboot and use your option 43 and should hopefully join the 9800.
You could still do that on those now.


@Leo Laohoo wrote:


@russell.s wrote:
What were you coming from?  Was the 9800 onsite?

4.  Instruct the WLC to "tell" the AP to download the 17.9.4a firmware:  debug ap command "archive download-sw /no-reload tftp://<IP ADDRESS>/ap3g3-k9w8-tar.153-3.JPN3.tar" <AP NAME>


I was able to successfully migrate all 7 APs in 2 hours.  It was a rocky start, but this command was pivotal in the process.

!=======================================
! SSH to AP, Tell AP to download from TFTP
!=======================================

  1. AP# archive download-sw /no-reload tftp://10.x.y.z/ap3g3-k9w8-tar.153-3.JPN3.tar  (took under 1 minute)
  2. AP# config boot path 2
  3. AP# capwap ap erase [all]

AP will reboot about 3 or 4 times, and then register with the WLC in the cloud via Option 43 configured in DHCP.

Reason I went with SSH'ing into the AP directly is because a few (3) of the AP's failed to download the image properly.  SSH'ing into the AP didn't fix this, but allowed me to see the error message.

On a whim, I rebooted the 1 of the 3 AP's that were failing to download the file from TFTP.  Upon reboot, they downloaded the file properly and worked perfectly.

Timing @ 6 minutes:

WAP 01 3:25pm - capwap ap erase all
WAP 01 3:31pm - CDP Neighbors Seen
WAP 01 3:31pm - In WLC for Configuration

-------


MY-FAVORITE-WAP01#archive download-sw /no-reload tftp://10.x.y.z/ap3g3-k9w8-tar.153-3.JPN3.tar

############################################################################################################################################################################ 100.0%
Image downloaded, writing to flash...
do PREDOWNLOAD, part1 is active part
Image signing verify success.

[5/15/2024 20:50:24] Programming master with bundle version 148Programming master with bundle version 148Magic : 62 6F 6F 74 69 6E 66 6F
Master:
Version : 0x0148
Source : bundle
Retry : 0x1
Timestamp : 1715806163
Shadow:
Version : 0x0124
Source : flash
Retry : 0x1
Timestamp : 1611273600
Blacklisted:
Version : 0x0000
Source : unknown

[5/15/2024 20:50:35] Master now upgradedMaster now upgradedupgrade.sh: part to upgrade is part2
upgrade.sh: Writing image to disk...
sh: 0: unknown operand
upgrade.sh: AP backup version: 17.9.4.27
do ACTIVATE, part1 is active part
upgrade.sh: activate part2, set BOOT to part2
upgrade.sh: AP primary version: 17.9.4.27
upgrade.sh: AP backup version: 8.5.176.1
Archive done.

MY-FAVORITE-WAP01#sh ver | be 3802

MY-FAVORITE-WAP01 uptime is 39 days, 1 hours, 06 minutes
Last reload time : Fri Sep 1 00:39:09 UTC 2023
AP Running Image : 8.5.176.1
Primary Boot Image : 8.5.176.1
Backup Boot Image : 17.9.4.27
1 Multigigabit Ethernet interfaces
1 Gigabit Ethernet interfaces
2 802.11 Radios

MY-FAVORITE-WAP01#config boot path 2

MY-FAVORITE-WAP01#capwap ap erase all
This command will clear ap config and reboot the AP.
Are you sure you want continue? [confirm]
MY-FAVORITE-WAP01#
Remote side unexpectedly closed network connection

 

 

 

 

All's well that ends well - glad to hear you've found the winning formula for you!

I am by no-means a wireless expert in any regard.  I was just handed the project lol.  So I really appreciate @Rich R and @Leo Laohoo for quickly chiming in here and giving great suggestions!  I am greatful that Cisco Community has great individuals that are willing to help fellow users overcome challenges such as this.  If yall are at Cisco Live in Vegas this year, we can connect.

Thanks again for your assistance on this problem.

Russell

Rich R
VIP
VIP

@russell.s for reference you can find the AP version to WLC version mapping in the Compatibility Matrix (link below).

Note that the current recommended version is 17.9.5 (always refer to the TAC recommended link below.  You might also want to consider 17.12.3 as there seem to be a few issues with 17.9.5.  CAPWAP downloads are slow because they work like TFTP so transfer speed directly slows down with increasing round trip delay.  

Another thing to watch out for with downloads - presume your APs are in flexconnect mode?
In 9800 Cisco changed the default from Efficient Image download OFF on AireOS to ON on IOS-XE.  It's also very badly named (because it doesn't turn predownload off/on, it turns Efficient Image upgrade off/on) - dev obviously didn't even understand what it does!  It's in the flex profile:
wireless profile flex <your-flex-profile>
 no predownload
end
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-9/config-guide/b_wl_17_9_cg/m_eff_image_upgrade_ewlc.html
The reason this could catch you out (we were caught when we first started using 9800) is that it tries to download to one of each AP type at the site and then the other APs download locally from that designated AP.  But if you have APs at different sites using the same site tag and flex profile then it may choose AP1 at site A for all APs at site B to download from (and likewise if you just have them all using default).  So if you want to use the feature then make sure the APs at each site have unique site tag and flex profile.  Otherwise turn the feature OFF as per config above.  If you're just using default (Registers awaiting assignment of Join Profile suggests you are) then:
wireless profile flex default-flex-profile
no predownload
end
Then every AP will try to download direct from WLC (which may still be very slow if RTD is high and/or you have rate limits applied in cloud) 
Also see https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-9/config-guide/b_wl_17_9_cg/m_predwnld_image_ap_ewlc.html#config-ap-image-download-time-enhancement for increasing capwap window size.

For future upgrades you might want to look at the new https download feature called Out-of-Band Image download - you'll need 17.12 for that.  That makes the downloads very much more efficient because it's done over https.
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-12/config-guide/b_wl_17_12_cg/m_eff_image_upgrade_ewlc.html#fht-oob

Recently we have seen the spammers furiously posting prolific amounts of complete junk seemingly in retaliation for us efficiently removing their SPAM.  I wonder if false SPAM reports are another tactic to try to clog the system and allow their SPAM to stay up for longer.

@Rich R Thank you for the information.  I was really interested in the CAPWAP Window Size.  The link you provided gave caution around it's use.  "Configure the window size only for AP profiles that are exclusively used for teleworker or OEAP."  

Secondly, your information about an AP downloading the config and becoming what it sounds like is a distribution point.  This sounds promising.  Each site is configured with their own APJoin Profile, Site Tag, RF Tag, Flex Profile, etc... So there is no mixing between sites.  However, I've been following the process of:

  1. Migrating 1 AP at a time. 
  2. Then configuring that AP to be in a different profile
  3. Testing 
  4. Return to Step 1.

It sounds like, if I leave the AP in the default-policy-tag, default-site-tag, default-rf-tag, and default location... Perhaps the Second AP will download it's software from the 1st AP instead of reaching back to the controller.  Does that sound right?  Looks like this is called Efficient Image Upgrade

Efficient Image upgrade is an optimized method of predownloading images to FlexConnect APs. For each Site Tag with FlexConnect APs joined, one AP per model in that Site Tag is selected as the primary AP, and downloads its image from the controller through the WAN link. Once the primary AP has the downloaded image, the APs in that Site Tag start downloading the image from the primary AP, via TFTP. At most three subordinate APs can download simultaneously from the primary. This reduces load on the WAN link.

Efficient Image Upgrade.png

The TOP AP I migrated today and it took about 1hr 15 minutes to migrate,  I'll leave it in the default's and do the migration of the second AP of 7 total ap's and see if it goes tremendously faster.  Im wondering if there is a way to identify which AP is designated as the Primary AP?  

All APs.png

Re window size - yes but on the other hand TAC have advised us to use it before <smile> so give a try and see.

Correct - it should do that.

> Im wondering if there is a way to identify which AP is designated as the Primary AP?  
The command is in the documentation we've linked above: 

show ap master list

 

Review Cisco Networking for a $25 gift card