04-19-2023 12:02 PM
I have had case open for about a month on this and at least 3 different TAC engineers. I have 9800L-F set up in HA-SSO (RP+RMI). We have tried 17.9.x, 17.10,1 and now are on 17.11.1.
I have set the AP Mgt as VLAN 2 (10.0.0.0 /22) and the 9800L Management Interface at 10.0.3.253. I have 3560X switch set up in my office "lab" with 3 APs (two 9115AXi and 1 2802i) connected as well as the Primary 9800L chassis. The APs will not connect -- or if they do, it takes days. We have done numerous traces / debugs / packet captures and TAC still cannot explain why. I am hoping a fresh set of eyes can see what the problem is and how to fix.
The AP console session repeats this:
[*04/19/2023 14:36:12.8578] CAPWAP State: Discovery
[*04/19/2023 14:36:12.8808] Discovery Request sent to 10.0.3.253, discovery type STATIC_CONFIG(1)
[*04/19/2023 14:36:12.8818] Discovery Request sent to 10.0.3.253, discovery type STATIC_CONFIG(1)
[*04/19/2023 14:36:12.8828] Discovery Request sent to 255.255.255.255, discovery type UNKNOWN(0)
[*04/19/2023 14:36:12.9098] Discovery Response from 10.0.3.253
[*04/19/2023 14:36:22.2708] Started wait dtls timer (60 sec)
[*04/19/2023 14:36:22.2778]
[*04/19/2023 14:36:22.2778] CAPWAP State: DTLS Setup
[*04/19/2023 14:36:37.3428] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Setup(3).
[*04/19/2023 14:37:19.3018] OOBImageDnld: OOBImageDownloadTimer expired for image download..
[*04/19/2023 14:37:19.3018] OOBImageDnld: Do common error handler for OOB image download..
[*04/19/2023 14:37:19.3288]
[*04/19/2023 14:37:19.3288] CAPWAP State: DTLS Teardown
[*04/19/2023 14:37:19.3778] OOBImageDnld: Do common error handler for OOB image download..
[*04/19/2023 14:37:19.4628] status 'upgrade.sh: Script called with args:[CANCEL]'
[*04/19/2023 14:37:19.5058] do CANCEL, part1 is active part
[*04/19/2023 14:37:19.5228] status 'upgrade.sh: Cleanup tmp files ...'
[*04/19/2023 14:37:19.5488] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Teardown(4).
[*04/19/2023 14:37:19.5488] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Teardown(4).
[*04/19/2023 14:37:24.0528] OOBImageDnld: OOBImageDownloadTimer expired for image download..
[*04/19/2023 14:37:24.0528] OOBImageDnld: Do common error handler for OOB image download..
[*04/19/2023 14:37:24.0728] No more AP manager addresses remain..
[*04/19/2023 14:37:24.0728] No valid AP manager found for controller 'CUN-WLC-9800LF' (ip: 10.0.3.253)
[*04/19/2023 14:37:24.0728] Failed to join controller CUN-WLC-9800LF.
[*04/19/2023 14:37:24.0728] Failed to join controller.
(TAC set a static IP on this AP of 10.0.2.1 /22 for a test. The other test APs use DHCP and have the same console messages)
The core where VLAN 2 is defined:
ip dhcp pool 9800_WLC_MGT
network 10.0.0.0 255.255.252.0
default-router 10.0.3.254
domain-name xxxxx.yyy
option 43 hex f104.0a00.03fd
dns-server 192.168.8.1 192.168.8.2
!
interface Vlan2
description 9800-WiFi_Mgt Subnet
ip address 10.0.3.254 255.255.252.0
no ip redirects
no ip unreachables
no ip proxy-arp
The 9800L Interfaces:
!
interface Port-channel10
description WLC AP MGMT PORTS
switchport mode trunk
!
interface TenGigabitEthernet0/1/0
description WLC AP MGMT PORT
switchport mode trunk
no negotiation auto
channel-group 10 mode on
service-policy output AutoQos-4.0-wlan-Port-Output-Policy
!
interface TenGigabitEthernet0/1/1
description WLC AP MGMT PORT
switchport mode trunk
no negotiation auto
channel-group 10 mode on
service-policy output AutoQos-4.0-wlan-Port-Output-Policy
CUN-WLC-9800LF#show wireless interface summ
Wireless Interface Summary
Interface Name Interface Type VLAN ID IP Address IP Netmask NAT-IP Address MAC Address
--------------------------------------------------------------------------------------------------
Vlan2 Management 2 10.0.3.253 255.255.252.0 0.0.0.0 8c1e.xxxx.yyyy
CUN-WLC-9800LF#show wireless management trustpoint
Trustpoint Name : ewlc-tp1
Certificate Info : Available
Certificate Type : SSC
Certificate Hash : 9a80a68f45b442770a4567d4xxxxxxxxxxxxxx
Private key Info : Available
FIPS suitability : Not Applicable
The 3560X switch that the 3 APs and Primary 9800L are connected to:
interface Port-channel10
description ** EtherChan to CUN-WLC-9800LF **
switchport trunk encapsulation dot1q
switchport mode trunk
spanning-tree portfast edge trunk
!
interface GigabitEthernet0/47
description CUN-WLC-9800LF LAG
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 10 mode on
!
interface GigabitEthernet0/48
description CUN-WLC-9800LF LAG
switchport trunk encapsulation dot1q
switchport mode trunk
channel-group 10 mode on
The AP connected ports:
interface GigabitEthernet0/1
description ** TEST AP PORTS **
switchport access vlan 2
switchport mode access
spanning-tree portfast edge
There is no VLAN pruning in place on trunk ports.
TIA - Perry
04-19-2023 12:56 PM - edited 04-19-2023 12:57 PM
I have a 9800-L-C in my home lab using two 10G in LAG. The only difference is that I setup the management vlan as native vlan. Might as well give that a try and make sure your switch has the native vlan defined:
interface Port-channel10
description ** EtherChan to CUN-WLC-9800LF **
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport mode trunk
spanning-tree portfast edge trunk
!
interface GigabitEthernet0/47
description CUN-WLC-9800LF LAG
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport mode trunk
channel-group 10 mode on
!
interface GigabitEthernet0/48
description CUN-WLC-9800LF LAG
switchport trunk encapsulation dot1q
switchport trunk native vlan 2
switchport mode trunk
channel-group 10 mode on
04-20-2023 03:46 AM
I actually had the Native VLAN 2 specified on the Port-Channels and the Interfaces on the 9800L and the 3560X switch. The current TAC Sr Engineer on the case had me remove them. It has not helped at all.
04-19-2023 01:00 PM - edited 04-19-2023 01:03 PM
Why are you not using the Gigabit ports? The fiber is for 10gig... just noticed that. You probably should try using these interfaces.
interface TwoGigabitEthernet0/0/0
interface TwoGigabitEthernet0/0/1
If the ports are not auto negotiating, that can be your issue using those ports. Here is the recommended SFP's also:
04-20-2023 03:50 AM
The SFP's are Cisco GLC-TE and are listed as supported. TAC has not stated that I can't use the Te0/1/0 & Te0/1/1 with 1 Gig Copper. The 3560X in my office does not support 10G. When I do move the 9800L to the datacenter, it will be connected to 2 Te Ports in our 4500X L3 switch.
04-20-2023 06:12 AM - edited 04-20-2023 06:50 AM
Use the gigabit ports since you are not connecting it to a 4500x. What you need to validate is if that is the issue or not. You should really start with basic config and even only using 1 port. Then build it up and see when it breaks. You are also only testing with one controller right?
What I would do since this has been an issue for a while is to wipe the controllers, only use one and just configure the basics for network connectivity. Use only one gigabit port and just see if the ap joins or not. Then I would do the same for the other controller. This eliminates any issue with one or the other and gives you a good starting point. Also pick a code that you will use in production. Only once you get the ap's joined, is when you can go ahead and make small changes and or setup SSO if that is what you will be implementing.
04-19-2023 04:34 PM
@perrymcgrew wrote:
[*04/19/2023 14:36:22.2778] CAPWAP State: DTLS Setup
[*04/19/2023 14:36:37.3428] Discarding msg CAPWAP_WTP_EVENT_REQUEST(type 9) in CAPWAP state: DTLS Setup(3).
[*04/19/2023 14:37:19.3018] OOBImageDnld: OOBImageDownloadTimer expired for image download..
[*04/19/2023 14:37:19.3018] OOBImageDnld: Do common error handler for OOB image download..
Has the AP been factory-reset?
04-20-2023 03:55 AM
Yes. The APs have been factory reset as one of the tests. This AP's show ver displays: AP Running Image : 17.11.0.155 which matches the 9800L 17.11.1
04-20-2023 04:25 AM
I'm going to suggest downgrading to 17.9.3 and see if makes any difference.
04-20-2023 05:12 AM
We upgraded to 17.9.3, then upgraded to 17.10.1 and then 17.11.1 per troubleshooting recommendation.
04-20-2023 05:21 AM
It is counter-productive to use Early Deployment (ED) releases like 17.10.1 or 17.11.1.
No idea what was in TAC's mindset to recommend upgrading to ED just to troubleshoot AP joining issue.
I would strongly recommend re-queuing the case to EMEA or NAM/LATAM.
04-20-2023 06:00 AM
17.11.1 provided some debugging enhancements. Probably not fair to say it was an official Cisco TAC suggestion -- more like trying to see if we may be hitting a new bug. We have had the same problems on all releases, including 17.9.3. I have asked current TAC if we should revert back from 17.11.1 and so far has not said we should need to do that.
04-20-2023 04:45 PM - edited 04-20-2023 04:46 PM
@perrymcgrew wrote:
17.11.1 provided some debugging enhancements. Probably not fair to say it was an official Cisco TAC suggestion
If this was the recommendation provided by TAC, I'd be reaching for the phone and getting the case re-queued to EMEA or NAM/LATAM. If the case goes back to the same TAC "zone", get it requeued to anywhere else but the same TAC zone.
I am suspect TAC is "taking you for a ride" and playing games.
04-20-2023 06:02 AM
04-20-2023 06:24 AM
>...I just factory reset a 9115AXi. Console output attached. Does not resolve the issue.
I was wondering if there is full capwap connectivity from the access points subnet to the controller , put a machine with nmap installed in that subnet and try : nmap -sU -p5246-5247 controller-hostname ,
M.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide