cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
56795
Views
30
Helpful
6
Comments
Jeal Jimenez
Cisco Employee
Cisco Employee

I think the best way to troubleshoot a WLAN authentication issue is by first understanding clearly the process of the security method implemented, so I will first summarize how 802.1X/EAP works, as this is basically the secure Layer-2 authentication framework implemented on WLANs:

 

*** Components ***

Authentication Server: For WLANs this is a RADIUS Server where the authentication of the wireless clients actually takes place (ACS, ISE, Windows NPS, etc.). Here you define the specific EAP method that you want to allow and its settings (certificates, policies, etc.). Here you also have a local database with the client´s credentials to be checked, or it could be linked to an external database (AD for example).

Authenticator: This is the WLC/AP, and the role is basically to act as a "proxy" between the wireless client to be authenticated and the RADIUS Server that performs the authentication. Here you configure the SSID to force dot1X/EAP negotiation with the wireless clients trying to associate to this WLAN (using a specific key management and encryption method, like WPA/TKIP or WPA2/AES), and you define the RADIUS Server that will be used for this authentication.

Supplicant: The wireless client trying to connect to the secure WLAN. Here you configure the SSID profile that will be used for the association to this WLAN. You define the specific EAP method, and the client security credentials (username/password, certificates, tokens, etc.).

 

*** Authentication Process ***

- The wireless client starts the association to the WLAN using the specific SSID and selecting an AP to connect.

- Once basic wireless association is successful, WLC/AP sends an EAP identity request to the client in order start doing 802.1X/EAP. (Client could also start the EAP process by sending an EAPOL-Start to the AP).

- The wireless client should answer back with an EAP identity response.

- This response is sent from the WLC/AP to the RADIUS Server on a RADIUS Access-Request packet.

- RADIUS Server processes the request and it replies back with a RADIUS Access-Challenge to start the EAP negotiation with the client (basically offering the client to start doing the EAP method that was configured for the wireless security policy).

- WLC/AP receives this Challenge, and forwards it to the client as an EAP request.

- Client answers back with an EAP response, which is forwarded by the WLC/AP to the RADIUS Server on another Access-Request. From here, the same exchange will happen multiple times between the server and the client (using the WLC/AP as intermediary) while the EAP method is negotiated and credentials are validated/authenticated. Depending on the EAP method we will see more/less exchanges, but in the end, the server should send to the WLC/AP a RADIUS Access-Accept if it passes (which contains the Master Session Key that will be used as the seed for the encryption cipher deployed, as well as the attributes that should be applied to this client -if used-) or a RADIUS Access-Reject if it fails.

- The WLC/AP will inform the client about this result and deauthenticate it from the WLAN if it failed, or allowing it access to the network so it can start passing data traffic (if using WEP encryption, there is basically nothing else to do as the Master Session key derived during this authentication process is basically used as the seed for the encryption of the data frames that can now start passing between the client and the AP; if using WPA/WPA2, a key management process known as the 4-Way handshake should happen after the EAP-Success is sent to the client, so the Supplicant and Authenticator can derive first the encryption keys that will be used for the data frames following).

EAP Process.png

 

The Enterprise Mobility Design Guide is a very good public document we have with some of this information and more, so you can analyze this further:

https://www.cisco.com/c/en/us/td/docs/wireless/controller/8-5/Enterprise-Mobility-8-5-Design-Guide/Enterprise_Mobility_8-5_Deployment_Guide/MobilityGuide_Security.html

 

*** Important Facts ***

- EAP is not a specific authentication method, but just the name of the base protocol. Based on this, a specific EAP method is actually configured for the secure authentication, which is selected depending on our needs, requirements, and what the devices support (such as LEAP, PEAP/EAP-MSCHAPv2, PEAP/EAP-GTC, EAP-TLS, EAP-FAST/MSCHAPv2, EAP-FAST/TLS, EAP-FAST/GTC, etc.). The following post can help as a quick summary about the EAP methods most commonly used for WLANs:

https://supportforums.cisco.com/document/12572951/eap-methods-summary

- This EAP method is only defined on the client (supplicant) and RADIUS Server (Authentication Server). Not on the WLC/AP (Authenticator).

- EAPOL is the protocol used between the Supplicant and the Authenticator to transport the authentication frames between them (as 802.11 wireless Data Frames over the air), while RADIUS is the protocol used between the Authenticator and the Authentication Server to transport the authentication frames of this process (as UDP RADIUS packets over the wired infrastructure).

 

*** Troubleshooting ***

Knowing that, without the need of understanding all the details about the specific EAP method deployed or about WiFi, you can easily Troubleshoot a WLAN authentication issue as follows:

1) First of all, check if the issue is happening to multiple clients/devices. If this is only happening to one, then more likely the issue is with the credentials of that client, or with the supplicant SSID setup on that client for the specific EAP method (or with the device itself, hardware or software).

2) If you confirm this is happening to multiple clients, then select one or two to troubleshoot. Check if the client is trying to connect to the SSID. If it is not even trying, the SSID is more likely not properly configured on the WLC/LAP.

3) Check the WLC/AP client association table so you can see the client status, if it is even trying to connect and what the status is. For example, on a Cisco WLC you will notice that the client is stuck on the authentication process because the client status will be 8021X_REQD.

4) If you noticed that the client is trying but failing or stuck on the authentication phase, then immediately check the failed attempts (failure authentication logs) on the RADIUS Server.

5) The server logs should normally tell you the reason of the authentication failure, but if you don´t see any attempts at all, then there could be an issue between the WLC/AP and the server. If there are failures on the server, the reasons are normally: EAP method not defined or not properly configured on the server or the client, wrong credentials, request is not matching the security policy defined, RADIUS Server has an issue with the external database (if used), RADIUS Server doesn´t know the WLC/AP trying to talk to it (it should be configured as AAA client -NAS- with a valid shared secret), the EAP method defined requires certificates from a CA (at least on the Server) and this is not properly setup... The server logs should normally be clear on the specific reason.

6) If you can´t find any logs at all on the server for attempts from this client and you know the WLC/AP is properly configured to communicate with this specific RADIUS Server, then there could be an issue between the WLC/AP and the server, so you might want to check network connectivity between them (proper routing, no ACLs/firewall blocking RADIUS traffic, no fragmentation issues, etc.). At this point, you could start this network troubleshooting between the Authenticator and the Authentication Server, but you could also start some debugging on the Authenticator, as this will clearly confirm if the problem is on the WLC/AP itself, in the client side, or in the server side, as you should see IF:

- The WLC/AP even started with the EAP identity request or not.

- If the client is responding with the EAP identity response or not.

- If the WLC/AP is sending that response as a RADIUS Access-Request to the proper server or not.

- If the server is responding (and how) or not.

- If this response is forwarded to the client and if the client is responding, and then forwarded back to the server... Until the point you notice a failure due to a lack of response from the client, from the server, or because the WLC/AP is not forwarding what it should where it should. You could even check if the failure is happening during the key management negotiation (which normally happens due to software issues if the client actually supports the one configured on the WLC/AP).

7) After this, you will definitely focus your troubleshooting on the WLC/AP, the client side, or the server side, checking deeper connectivity between them (packet captures if needed), the settings of the specific EAP method defined on the client and the server, or looking for software issues due to bad behaviors on any of the three components.

 

*** Debug Notes ***

CUWN

On a Cisco WLC, you could use the following debugs:

debug client <MAC-ADDRESS of the client you want to troubleshoot>

debug aaa all enable

 

Some good material to understand the debugs:

https://www.cisco.com/c/en/us/support/docs/wireless/aironet-1200-series/100260-wlc-debug-client.html

https://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/116493-technote-technology-00.html

 

Autonomous IOS

On Autonomous IOS APs, you could use the following debugs:

debug radius authentication

debug dot11 aaa authenticator process

debug dot11 aaa authenticator state-machine

https://www.cisco.com/c/en/us/support/docs/wireless/aironet-1200-series/50843-debug-authen.html

 

NGWC

For NGWC controllers developers have confirmed that running debugs on these platforms is processor intensive, so they want us to run traces instead, which are still very helpful to analyze the behavior. Therefore, we could use here ALL the following traces to troubleshoot the issue with one specific client:

set trace wcm-dot1x eaptrace level debug 
set trace wcm-dot1x detail level debug 
set trace access-session method dot1x level debug 
set trace group-wireless-client level debug 
set trace wcm-dot1x event level debug 
set trace wcm-dot1x aaa level debug 
set trace aaa wireless events level debug 
set trace access-session core sm level debug 
set trace access-session method dot1x level debug

set trace group-wireless-client filter mac xxxx.xxxx.xxxx 
set trace wcm-dot1x event filter mac xxxx.xxxx.xxxx 
set trace wcm-dot1x aaa filter mac xxxx.xxxx.xxxx 
set trace aaa wireless events filter mac xxxx.xxxx.xxxx 
set trace access-session core sm filter mac xxxx.xxxx.xxxx 
set trace access-session method dot1x filter mac xxxx.xxxx.xxxx

 

- To clear the Traces from the log buffer use the following command (this will clear configured TRACE):
set trace control sys-filtered-traces clear

- To clear the trace Filter use the below commands:
set trace group-wireless-client filter none
set trace wcm-dot1x event filter none 
set trace wcm-dot1x aaa filter none
set trace aaa wireless events filter none
set trace access-session core sm filter none
set trace access-session method dot1x filter none
set trace access-session core sm client dot11 filter none
set trace access-session core sm client spi filter none
set trace access-session core sm client filter none
set trace access-session core sm feature filter none

 

More information about NGWC traces and debugs for EAP authentication troubleshooting is documented on the following configuration examples:

http://www.cisco.com/c/en/us/support/docs/wireless/5700-series-wireless-lan-controllers/116600-config-eap-radius-00.html

http://www.cisco.com/c/en/us/support/docs/wireless/5700-series-wireless-lan-controllers/117684-configure-WLAN-00.html

http://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/116532-configure-technology-00.html

http://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/117664-configure-WLAN-00.html

 

Comments
Pritesh.Patel41
Community Member

Question :We have 802.11r FT enabled on the Cisco 5508 WLC , all the APs are in the same mobility group. When the mobile devices move out to Wi-Fi Coverage and return to the Wi-Fi coverage (connect to the same SSID which is EAP authentication), the mobile devices undergo FULL Radius authentication, instead of just reassociating.  Is this expected behavior ? I think per 802.11r the device should only be performing re-association and NOT full association when it leaves Wi-Fi coverage and returns to the Wi-Fi coverage with same AP or another AP under the same mobility group. Please help me understand the behavior here.

Thanks,

Pritesh

Jeal Jimenez
Cisco Employee
Cisco Employee

Hi Pritesh,

When a wireless client moves out of the AP's coverage cell and then returns, Fast Secure Roaming (with 802.11r FT or any other Fast-Secure Roaming method) will happen ONLY if the client comes back with a Reassociation Request and providing proper fast roaming information elements and key material trying to perform Fast Roaming.

If the client comes back sending an Association Request (like starting a brand new association to the WLAN/SSID), then it will undergo FULL EAP authentication as you said, but because the client didn't provide the information needed to perform a fast roam.

Keep in mind that, most of the time, when the client moves out of the AP's coverage cell and then returns, the client itself has timed-out the association and hence decides to start a brand new association, OR, the WLC could have deauthenticated the client if the user idle timeout already expired (WLC needs to clear its client entries over time, so if a client goes away for a long time -longer than idle timeout-, then that client will be removed from the association table, and even if the client comes back with a proper Reassociation, it will definitely need to start a brand new Association with the WiFi infrastructure).

You can check the document I posted about Fast-Secure Roaming and how all this works, with examples showing Over-The-Air packet captures and WLC debugs for you check what you are supposed to see if the client comes back with a proper Reassociation:

802.11 WLAN Roaming and Fast-Secure Roaming on CUWN

And where I actually made some reference to this specific concern you have:

"If you expect a roaming event, but the client sends an Association Request instead of a Reassociation Request (which you can confirm from some captures and debugs similar to those explained earlier in this document), then the client is not really roaming. The client begins a new association to the WLAN as if a disconnection took place, and tries to reconnect from scratch. This can happen for multiple reasons, such as when a client moves away from the coverage areas and then finds an AP with enough signal quality to start an association, but it normally indicates a client issue where the client does not initiate a roaming event due to drivers, firmware, or software issues."

Hope this helps!

Jeal

aluft0001
Level 1
Level 1

I am having troubles with an RVS4000 and 802.1x wired authentication.

Can you comment on techniques to investigate the authenticator to authentication server connection on that device?

I can observe a supplicant request going to the RVS4000 but I see no response at the supplicant. Many previous requests work to deny MD5 and select TLS, but eventually a packet is sent that gets no response.

mimohideen
Level 1
Level 1

god doc. thanks 

smalkawi
Level 1
Level 1

Thnx for the Info

Carlos Souza
Level 1
Level 1

Jimenez help me please.


We have this scenario in logs, where 1 single access point is disconnected from the network and consequently disconnected from users, based on this log, where do I start to analyze it?

 

2023/02/14 13:53:04.850 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 3c41.0eff.faae, WLAN WEBWIFI, Slot 1 AP 3c41.0eff.faa0, SP-AP01-TR03-11
2023/02/14 13:53:04.851 dot11 Association success for client, assigned AID is: 9
2023/02/14 13:53:05.794 client-auth Starting EAPOL 4-Way Handshake
2023/02/14 13:53:05.804 client-keymgmt Negotiated the following encryption mechanism: AKM:DOT1X Cipher:CCMP WPA Version: WPA2
2023/02/14 13:53:05.805 client-orch-state Starting Mobility Anchor discovery for client
2023/02/14 13:53:05.807 client-orch-state Entering IP learn state
2023/02/14 13:53:07.797 client-iplearn Client got IP: 10.11.167.70, discovered through: DHCP
2023/02/14 13:53:07.798 client-orch-state Client reached RUN state, connection completed.
Connection attempt #2
2023/02/14 13:53:37.422 client-orch-sm Client roamed to a new AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 3c41.0eff.faae, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:53:37.422 dot11-validate Controller could not validate PMKID for fast roaming
2023/02/14 13:53:37.422 dot11 Association success for client, assigned AID is: 8
Connection attempt #3
2023/02/14 13:54:08.787 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:54:08.787 dot11 Association success for client, assigned AID is: 8
Connection attempt #4
2023/02/14 13:54:20.937 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:54:20.937 dot11 Association success for client, assigned AID is: 8
Connection attempt #5
2023/02/14 13:55:22.669 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:55:22.669 dot11 Association success for client, assigned AID is: 8
Connection attempt #6
2023/02/14 13:55:27.879 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:55:27.879 dot11 Association success for client, assigned AID is: 8
2023/02/14 13:56:57.882 errmsg Client failed EAP authentication with following reason: Timeout
2023/02/14 13:58:27.883 errmsg Client failed EAP authentication with following reason: Timeout
Connection attempt #7
2023/02/14 13:58:38.238 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:58:38.238 dot11 Association success for client, assigned AID is: 8
Connection attempt #8
2023/02/14 13:58:58.636 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, old BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 13:58:58.637 dot11 Association success for client, assigned AID is: 8
2023/02/14 14:00:28.643 errmsg Client failed EAP authentication with following reason: Timeout
2023/02/14 14:01:58.643 errmsg Client failed EAP authentication with following reason: Timeout
2023/02/14 14:03:28.643 errmsg Client failed EAP authentication with following reason: Timeout
2023/02/14 14:03:33.636 client-orch-sm Controller initiated client deletion with code: CO_CLIENT_DELETE_REASON_L2AUTH_CONNECT_TIMEOUT. Code means: Client did not complete L2 authentication before timeout
Connection attempt #9
2023/02/14 14:03:42.921 client-orch-sm Client made a new Association to an AP/BSSID: BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 14:03:42.921 dot11 Association success for client, assigned AID is: 8
2023/02/14 14:04:13.351 client-auth Starting EAPOL 4-Way Handshake
2023/02/14 14:04:13.408 client-keymgmt Negotiated the following encryption mechanism: AKM:DOT1X Cipher:CCMP WPA Version: WPA2
2023/02/14 14:04:13.409 client-orch-state Starting Mobility Anchor discovery for client
2023/02/14 14:04:13.410 client-orch-state Entering IP learn state
2023/02/14 14:04:17.255 client-iplearn Client got IP: 10.11.167.70, discovered through: DHCP
2023/02/14 14:04:17.255 client-orch-state Client reached RUN state, connection completed.
2023/02/14 14:05:27.319 client-orch-sm Controller initiated client deletion with code: CO_CLIENT_DELETE_REASON_L2AUTH_CONNECT_TIMEOUT. Code means: Client did not complete L2 authentication before timeout
Connection attempt #10
2023/02/14 14:05:27.785 client-orch-sm Client roamed to a new AP/BSSID: BSSID 889c.ade8.31ce, WLAN WEBWIFI, Slot 1 AP 889c.ade8.31c0, SP-AP02-TR03-11
2023/02/14 14:05:27.785 dot11 Association success for client, assigned AID is: 8
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: