I think the best way to troubleshoot a WLAN authentication issue is by first understanding clearly the process of the security method implemented, so I will first summarize how 802.1X/EAP works, as this is basically the secure Layer-2 authentication framework implemented on WLANs:
*** Components ***
- Authentication Server: For WLANs this is a RADIUS Server where the authentication of the wireless clients actually takes place (ACS, ISE, Windows NPS, etc.). Here you define the specific EAP method that you want to allow and its settings (certificates, policies, etc.). Here you also have a local database with the client´s credentials to be checked, or it could be linked to an external database (AD for example).
- Authenticator: This is the WLC/AP, and the role is basically to act as a "proxy" between the wireless client to be authenticated and the RADIUS Server that performs the authentication. Here you configure the SSID to force dot1X/EAP negotiation with the wireless clients trying to associate to this WLAN (using a specific key management and encryption method, like WPA/TKIP or WPA2/AES), and you define the RADIUS Server that will be used for this authentication.
- Supplicant: The wireless client trying to connect to the secure WLAN. Here you configure the SSID profile that will be used for the association to this WLAN. You define the specific EAP method, and the client security credentials (username/password, certificates, tokens, etc.).
*** Authentication Process ***
- The wireless client starts the association to the WLAN using the specific SSID and selecting an AP to connect.
- Once basic wireless association is successful, WLC/AP sends an EAP identity request to the client in order start doing 802.1X/EAP. (Client could also start the EAP process by sending an EAPOL-Start to the AP).
- The wireless client should answer back with an EAP identity response.
- This response is sent from the WLC/AP to the RADIUS Server on a RADIUS Access-Request packet.
- RADIUS Server processes the request and it replies back with a RADIUS Access-Challenge to start the EAP negotiation with the client (basically offering the client to start doing the EAP method that was configured for the wireless security policy).
- WLC/AP receives this Challenge, and forwards it to the client as an EAP request.
- Client answers back with an EAP response, which is forwarded by the WLC/AP to the RADIUS Server on another Access-Request. From here, the same exchange will happen multiple times between the server and the client (using the WLC/AP as intermediary) while the EAP method is negotiated and credentials are validated/authenticated. Depending on the EAP method we will see more/less exchanges, but in the end, the server should send to the WLC/AP a RADIUS Access-Accept if it passes (which contains the Master Session Key that will be used as the seed for the encryption cipher deployed, as well as the attributes that should be applied to this client -if used-) or a RADIUS Access-Reject if it fails.
- The WLC/AP will inform the client about this result and deauthenticate it from the WLAN if it failed, or allowing it access to the network so it can start passing data traffic (if using WEP encryption, there is basically nothing else to do as the Master Session key derived during this authentication process is basically used as the seed for the encryption of the data frames that can now start passing between the client and the AP; if using WPA/WPA2, a key management process known as the 4-Way handshake should happen after the EAP-Success is sent to the client, so the Supplicant and Authenticator can derive first the encryption keys that will be used for the data frames following).
The Enterprise Mobility Design Guide is a very good public document we have with some of this information and more, so you can analyze this further:
*** Important Facts ***
- EAP is not a specific authentication method, but just the name of the base protocol. Based on this, a specific EAP method is actually configured for the secure authentication, which is selected depending on our needs, requirements, and what the devices support (such as LEAP, PEAP/EAP-MSCHAPv2, PEAP/EAP-GTC, EAP-TLS, EAP-FAST/MSCHAPv2, EAP-FAST/TLS, EAP-FAST/GTC, etc.). The following post can help as a quick summary about the EAP methods most commonly used for WLANs:
- This EAP method is only defined on the client (supplicant) and RADIUS Server (Authentication Server). Not on the WLC/AP (Authenticator).
- EAPOL is the protocol used between the Supplicant and the Authenticator to transport the authentication frames between them (as 802.11 wireless Data Frames over the air), while RADIUS is the protocol used between the Authenticator and the Authentication Server to transport the authentication frames of this process (as UDP RADIUS packets over the wired infrastructure).
*** Troubleshooting ***
Knowing that, without the need of understanding all the details about the specific EAP method deployed or about WiFi, you can easily Troubleshoot a WLAN authentication issue as follows:
1) First of all, check if the issue is happening to multiple clients/devices. If this is only happening to one, then more likely the issue is with the credentials of that client, or with the supplicant SSID setup on that client for the specific EAP method (or with the device itself, hardware or software).
2) If you confirm this is happening to multiple clients, then select one or two to troubleshoot. Check if the client is trying to connect to the SSID. If it is not even trying, the SSID is more likely not properly configured on the WLC/LAP.
3) Check the WLC/AP client association table so you can see the client status, if it is even trying to connect and what the status is. For example, on a Cisco WLC you will notice that the client is stuck on the authentication process because the client status will be 8021X_REQD.
4) If you noticed that the client is trying but failing or stuck on the authentication phase, then immediately check the failed attempts (failure authentication logs) on the RADIUS Server.
5) The server logs should normally tell you the reason of the authentication failure, but if you don´t see any attempts at all, then there could be an issue between the WLC/AP and the server. If there are failures on the server, the reasons are normally: EAP method not defined or not properly configured on the server or the client, wrong credentials, request is not matching the security policy defined, RADIUS Server has an issue with the external database (if used), RADIUS Server doesn´t know the WLC/AP trying to talk to it (it should be configured as AAA client -NAS- with a valid shared secret), the EAP method defined requires certificates from a CA (at least on the Server) and this is not properly setup... The server logs should normally be clear on the specific reason.
6) If you can´t find any logs at all on the server for attempts from this client and you know the WLC/AP is properly configured to communicate with this specific RADIUS Server, then there could be an issue between the WLC/AP and the server, so you might want to check network connectivity between them (proper routing, no ACLs/firewall blocking RADIUS traffic, no fragmentation issues, etc.). At this point, you could start this network troubleshooting between the Authenticator and the Authentication Server, but you could also start some debugging on the Authenticator, as this will clearly confirm if the problem is on the WLC/AP itself, in the client side, or in the server side, as you should see IF:
- The WLC/AP even started with the EAP identity request or not.
- If the client is responding with the EAP identity response or not.
- If the WLC/AP is sending that response as a RADIUS Access-Request to the proper server or not.
- If the server is responding (and how) or not.
- If this response is forwarded to the client and if the client is responding, and then forwarded back to the server... Until the point you notice a failure due to a lack of response from the client, from the server, or because the WLC/AP is not forwarding what it should where it should. You could even check if the failure is happening during the key management negotiation (which normally happens due to software issues if the client actually supports the one configured on the WLC/AP).
7) After this, you will definitely focus your troubleshooting on the WLC/AP, the client side, or the server side, checking deeper connectivity between them (packet captures if needed), the settings of the specific EAP method defined on the client and the server, or looking for software issues due to bad behaviors on any of the three components.
*** Debug Notes ***
On a Cisco WLC, you could use the following debugs:
debug client <MAC-ADDRESS of the client you want to troubleshoot>
debug aaa all enable
Some good material to understand the debugs:
On Autonomous IOS APs, you could use the following debugs:
debug radius authentication
debug dot11 aaa authenticator process
debug dot11 aaa authenticator state-machine
For NGWC controllers developers have confirmed that running debugs on these platforms is processor intensive, so they want us to run traces instead, which are still very helpful to analyze the behavior. Therefore, we could use here ALL the following traces to troubleshoot the issue with one specific client:
set trace wcm-dot1x eaptrace level debug
set trace wcm-dot1x detail level debug
set trace access-session method dot1x level debug
set trace group-wireless-client level debug
set trace wcm-dot1x event level debug
set trace wcm-dot1x aaa level debug
set trace aaa wireless events level debug
set trace access-session core sm level debug
set trace access-session method dot1x level debug
set trace group-wireless-client filter mac xxxx.xxxx.xxxx
set trace wcm-dot1x event filter mac xxxx.xxxx.xxxx
set trace wcm-dot1x aaa filter mac xxxx.xxxx.xxxx
set trace aaa wireless events filter mac xxxx.xxxx.xxxx
set trace access-session core sm filter mac xxxx.xxxx.xxxx
set trace access-session method dot1x filter mac xxxx.xxxx.xxxx
- To clear the Traces from the log buffer use the following command (this will clear configured TRACE):
set trace control sys-filtered-traces clear
- To clear the trace Filter use the below commands:
set trace group-wireless-client filter none
set trace wcm-dot1x event filter none
set trace wcm-dot1x aaa filter none
set trace aaa wireless events filter none
set trace access-session core sm filter none
set trace access-session method dot1x filter none
set trace access-session core sm client dot11 filter none
set trace access-session core sm client spi filter none
set trace access-session core sm client filter none
set trace access-session core sm feature filter none
More information about NGWC traces and debugs for EAP authentication troubleshooting is documented on the following configuration examples: