cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3040
Views
0
Helpful
13
Replies

Client connection issues in an easy multi-site environment

Heiko Kelling
Level 1
Level 1

Hi all,

I manage a school environment with 4 locations. Until last year, there was one Cisco 3504 and 20-30 1832 APs at each site. Then came a central vWLC to manage all 4 locations. This is installed on a HyperV at one of the 4 locations. With the move to the central controller, the WLANs were switched to FlexConnect and the 4 3504 controllers were kept at the locations as backup controllers (as would actually not be necessary with FlexConnect, but that's what we have).

Three SSIDs are configured for each location on the central controller and the respective VLANs (which are identical at each location) are assigned via a FlexConnect Group. Authentication works easily in every SSID using a WPA2 password.

Across the entire environment, there are now frequent problems with clients logging in that are unable to log in to the SSIDs despite multiple connection attempts. After debugging on the central controller for several clients, I increasingly suspect a problem in combination with WPA2 authentication. If I set the authentication to open or WPA(1) as a test, then there are no problems with the login. And the quick switching between the SSIDs is possible with a client without any problems. I have attached the logs from a failed connection attempt. Maybe someone can help me with the problem.

att1.txt -> Android phone tries 2 or 3 times to connect to a SSID and failed. As a result, the device then switches back to a different SSID to which it was previously connected.

att2.txt -> Here the client is not able to connect at all and is rejected.

att3.txt -> Same behavior like att1

 

13 Replies 13

Mark Elsen
Hall of Fame
Hall of Fame

 

                     - Have the attachments analyzed with : https://cway.cisco.com/tools/WirelessDebugAnalyzer/

 M.



-- Let everything happen to you  
       Beauty and terror
      Just keep going    
       No feeling is final
Reiner Maria Rilke (1899)

JPavonM
VIP
VIP

in att1 I see client receive a valid DHCP and goes into RUN state:

*DHCP Socket Task: Nov 30 13:06:32.339: e4:84:d3:65:b2:0d 10.4.50.36 RUN (20) NO release MSCB
*DHCP Socket Task: Nov 30 13:06:32.339: e4:84:d3:65:b2:0d Assigning Address 10.4.50.36 to mobile
*DHCP Socket Task: Nov 30 13:06:32.339: e4:84:d3:65:b2:0d DHCP success event for client. Clearing dhcp failure count for interface management.

 While in att2 I see client disconnected due to a change in WLAN:

*apfMsConnTask_7: Nov 30 13:18:31.754: e4:84:d3:65:b2:0d Deleting client immediately since WLAN has changed

And same for att3 than for att2:

*apfMsConnTask_7: Nov 30 13:18:31.754: e4:84:d3:65:b2:0d Deleting client immediately since WLAN has changed

Heiko Kelling
Level 1
Level 1

Thanks for both answers. I had to re-upload the att3, it was the wrong file. I didn't know the "Wireless debug analyzer" yet. Only the Config analyzer. I once looked at the 3 logs in this tool, but I'm not really any smarter as before. @JPavonM : If the client ends up going into run state, it happens in the wrong SSID. He is supposed to connect to the SSID where he would get 10.4.55.x and that fails. The client then switches to the SSID to which it was previously connected and then receives an address 10.4.50.x

Heiko Kelling
Level 1
Level 1

In my opinion, the error always occurs in the same way. But I still don't know what the problem is.

2023-12-01 10_54_34-Wireless Debug Analyzer.png

 

                    >...In my opinion, the error always occurs in the same way. But I still don't know what the problem is.
   - Note that for Wireless Debug Analyzer it is advised to input a longer debug session (from a particular client)  to get a  better overview of what is happening , 

 M.



-- Let everything happen to you  
       Beauty and terror
      Just keep going    
       No feeling is final
Reiner Maria Rilke (1899)

JPavonM
VIP
VIP

Check both Policy profile forwarding VLAN or Flex profile's vlan-2-wlan mapping.

Heiko Kelling
Level 1
Level 1

What do you mean specifically by “Policy profile forwarding VLAN”

2023-11-29 23_36_59-GRS-HE-ZWLC – Mozilla Firefox.png

2023-12-01 11_21_30-KELLING-PC - TeamViewer.png

 

JPavonM
VIP
VIP

your wlan-vlan map is associating GRS-HE-UT and GRS-BEB-UT to VLAN55 and that vlan is not allowed on the trunk port:

*apfMsConnTask_7: Nov 30 13:02:10.865: e4:84:d3:65:b2:0d Processing assoc-req station:e4:84:d3:65:b2:0d AP:dc:f7:19:45:a0:e0-00 ssid : GRS-BEB-UT thread:dcef819d30
*

 

Heiko Kelling
Level 1
Level 1

switchport trunk allowed vlan remove 1-49,51-54,56-59,61-249,251-4094

->in configuration of the switch you can all not allowed (removed) vlans. it is a cisco small business switch with an extraordinary way to display the Vlans in the trunk.

 

JPavonM
VIP
VIP

@Heiko Kelling as previously said you are missing VLAN55 on the trunk port of the switch, so any device connecting to those SSID won't join.

JPavonM_0-1701434966501.png

 

@JPavonM you're misreading the switchport config.  It's REMOVED VLANs not ALLOWED VLANS.  So 55 is allowed - between 54 and 56.

@Heiko Kelling 
1. What version of software are you running on the vWLC?
2. What version of software are you running on the 3504 WLCs? (should be same version as central)
(hint: refer to TAC recommended versions link below - currently should be 8.10.190.0)
3. Do you have mobility configured between the vWLC and the site WLCs?
4. You say "WLANs were switched to FlexConnect" but it's APs not WLANs which are configured for FlexConnect.  WLANs on a FlexConnect AP can be configured for central or local switching and authentication so it's the specifics of the WLAN configuration which matter.
5. Do you have Fast SSID changing enabled?  Who knows why Cisco did not make this default but you have to enable it explicitly otherwise any client switching between SSIDs will get access denied (your mention of switching SSIDs hints that this could be an issue)
https://www.cisco.com/c/en/us/td/docs/wireless/controller/8-10/config-guide/b_cg810/client_roaming.html#fast-ssid-changing
6.  Make sure your AP groups and Flexconnect groups on the local and central WLCs are identical (same number of identically named and numbered WLANs and in the same order).  There have been a number of bugs around this fixed over the years but we've still seen occasional problems if they are not the same - the AP gets "confused" sometimes.
7.  Obviously you should have a separate flexconnect group for each site.

Going back to your design - if you have on-site and central WLCs why not make the on-site WLC the primary with the central WLC as backup?

Hello @Rich R , thank you very much for your detailed answer and your effort to help me!

to 1. Version 8.8.130.0 is currently running on the central controller and version 8.10.190.0 is running on the local controller at one of the schools. The reason I moved the access point at one school to the local controller was to be able to recreate the problem without affecting the other schools. First I had 8.8.130.0 locally and centrally and as a test I later updated to 8.10.190.0.

to 2. see answer 1.

3. No, I haven't done that at the moment because either all access points are always on the local controller or all access points are always on the central controller. There is no case where both controllers have active access points.

4.Ok sorry. You're right. The access points are all in FlexConnect mode. The SSIDs are all configured to use local switching. This is what the settings look like for each SSID.

2023-12-04 12_15_39-KELLING-PC - TeamViewer.png

5. I currently do not have Fast SSID change activated. However, this is the first setting I will try the next time I am at the school. I came across this again and again during my research and when evaluating my logs. In general, however, you can say that there are problems not only when switching between SSIDs, but also when devices completely reconnect without first being in one of the other SSIDs. I was still thinking about unchecking “Client Exclusion” because clients always seem to end up on the dynamic blacklist. The third point I kept finding was changing the “Management Frame Protection”. But I'm not sure if a change will make a difference. But these are all things that I would like to change and test step by step at the next on-site appointment.

6. I'm relatively sure that there are at least minor differences, but I've always been of the opinion that as soon as the APs switch to a different controller they reload all their settings...or is that not the case?

7. I've already thought about it. I'll plan that for the next appointment too.

"Going back to your design - if you have on-site and central WLCs why not make the on-site WLC the primary with the central WLC as backup?"

This was done at the request of the project performance back then. From my point of view, that makes sense, because the goal is to have central management in one place.

 

 

The trouble with having APs switch between WLCs on different software versions is that every time they move (even for a few minutes they must download the software (if they don't already have it on flash) and then reload.  If you're using N+1 then all your WLCs should run the same version.

Again I will say if you have N+1 then AP groups and WLANs should be identical - APs do get confused.  Don't risk differences, it will catch you out eventually.

Just enable Fast SSID switching - I've never heard a reason for disabling it.

Although your WLAN is using flex local switching it's still using central auth.  What authentication method are you using?  You might want to consider using flex local auth too.

Review Cisco Networking for a $25 gift card