cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
394
Views
25
Helpful
5
Replies
Highlighted
Beginner

Analyzing Roaming Problem (WLC 5508) - debug included

I'm trying to troubleshoot a roaming problem, where some of our client sometimes experience a 20 second interruption of connectivity. The problem occurs about once in every 50 times the clients roams.

All involved access-points are connected to the same, local WLC 5508 (SW Version 8.5.161.0).

 

I have recorded debug logs of good and bad roamings to compare ("debug client xxxx.xxxx.xxxx") and have identified this section as the culprit:

 

*spamApTask6: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Setting ADD_MOBILE (idx 2, seqno 243, action 6, count 243, last count 0) ack state for STA on AP cc:d5:39:ba:18:00
*spamApTask6: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Delete Mobile request on slot 0 sent to the AP cc:d5:39:ba:18:00 IP: 10.58.44.121:53857

*spamApTask6: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Received add/del ack packet with sequence number: got 243 expected 243 action = 6
*spamApTask6: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 apfUpdateDeleteAckInMscb (apf_api.c:54425) Expiring Mobile!
*spamApTask7: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Add SGT:0 to AP 24:01:c7:15:01:40
*spamApTask7: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Add CTS mobile SGT - Encoded the capwap payload for the mobile with SGT 0
*spamApTask7: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Successful transmission of LWAPP Add-Mobile to AP 24:01:c7:15:01:40
*spamApTask7: Oct 23 14:10:50.518: 8c:27:8a:05:6b:99 Setting ADD_MOBILE (idx 17, seqno 222, action 1, count 222, last count 232) ack state for STA on AP 24:01:c7:15:01:40
*spamApTask7: Oct 23 14:10:50.520: 8c:27:8a:05:6b:99 Received add/del ack packet with sequence number: got 222 expected 222 action = 1
*spamApTask7: Oct 23 14:10:50.520: 8c:27:8a:05:6b:99 Received ADD_MOBILE ack - Initiating 1x to STA 8c:27:8a:05:6b:99 (idx 17)
*spamApTask7: Oct 23 14:10:50.520: 8c:27:8a:05:6b:99 APF Initiating 1x to STA 8c:27:8a:05:6b:99
*spamApTask7: Oct 23 14:10:50.520: 8c:27:8a:05:6b:99 Sent dot1x auth initiate message for mobile 8c:27:8a:05:6b:99

In this example, everything works as expected. After this follows a section of "Dot1x_NW_MsgTask"-Messages and the connection is established.

 

Here's an example of a bad roaming to compare:

 

*spamApTask6: Oct 23 15:55:34.539: 8c:27:8a:08:10:4b Add SGT:0 to AP cc:d5:39:ba:18:00
*spamApTask6: Oct 23 15:55:34.539: 8c:27:8a:08:10:4b Add CTS mobile SGT - Encoded the capwap payload for the mobile with SGT 0
*spamApTask6: Oct 23 15:55:34.540: 8c:27:8a:08:10:4b Successful transmission of LWAPP Add-Mobile to AP cc:d5:39:ba:18:00
*spamApTask6: Oct 23 15:55:34.540: 8c:27:8a:08:10:4b Setting ADD_MOBILE (idx 68, seqno 141, action 1, count 141, last count 206) ack state for STA on AP cc:d5:39:ba:18:00
*spamApTask7: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b Setting DEL_MOBILE (seqno 0, action 6) ack state for STA on AP 24:01:c7:15:01:40
*spamApTask7: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b Setting ADD_MOBILE (idx 44, seqno 81, action 6, count 81, last count 0) ack state for STA on AP 24:01:c7:15:01:40
*spamApTask7: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b Delete Mobile request on slot 0 sent to the AP 24:01:c7:15:01:40 IP: 10.58.44.116:53973

*spamApTask6: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b Received add/del ack packet with sequence number: got 141 expected 141 action = 1
*spamApTask6: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b Received ADD_MOBILE ack - Initiating 1x to STA 8c:27:8a:08:10:4b (idx 68)
*spamApTask6: Oct 23 15:55:34.541: 8c:27:8a:08:10:4b expected addMobileAckCount 81, got 141
*spamApTask7: Oct 23 15:55:34.542: 8c:27:8a:08:10:4b Received add/del ack packet with sequence number: got 81 expected 81 action = 6
*spamApTask7: Oct 23 15:55:34.542: 8c:27:8a:08:10:4b apfUpdateDeleteAckInMscb (apf_api.c:54425) Expiring Mobile!

Differences:

 

  • These two messages don't appear:
    APF Initiating 1x to STA xxxx.xxxx.xxxx
    Sent dot1x auth initiate message for mobile xxxx.xxxx.xxxx
  • After this section, there are no debug messages until the client performs another "REASSOCIATION REQUEST" on another AP (so there are no "Dot1x_NW_MsgTask"-Messages).
  • The "spamApTask#" Numbers appear out of order:
    spamApTask6 is for the new AP and appears first
    spamApTask7 is for the AP the client was originally connected to and appears second
  • The last message is the "Expiring Mobile!" on the AP the client was originally connected.
    apfUpdateDeleteAckInMscb (apf_api.c:54425) Expiring Mobile!

My questions:

  • Could the problem simply be, that the "Expiring Mobile!"-message appears last without any spamApTask left running to add the client to the new AP? If this is the case, what could cause the spamApTasks to be out of order?
  • The first message that is missing in the bad cases, is "APF Initiating 1x to STA xxxx.xxxx.xxxx". Is this action initiated by the WLC or the client? And thus would you suspect the problem to be with the WLC or the client?

I would appreciate any insights you can provide!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Enthusiast

I think you are victim of the bug CSCvq55777. A solution would be to upgrade your controller to the 8.5.164.0 version that correct the problem.

Extract of the bug description:

 

Symptom:

[...]

Symptom 2: a client roams to a new AP but is unable to pass traffic. "debug client" shows this message immediately
after the client associates:
*spamApTask4: Mar 06 12:40:24.312: 00:01:02:cd:00:6b apfUpdateDeleteAckInMscb (apf_api.c:54425) Expiring Mobile!
rather than:
*spamApTask4: Mar 06 12:40:01.262: 00:01:02:cd:00:6b Sent dot1x auth initiate message for mobile 00:01:02:cd:00:6b

[...]

View solution in original post

5 REPLIES 5
Highlighted
Enthusiast

I think you are victim of the bug CSCvq55777. A solution would be to upgrade your controller to the 8.5.164.0 version that correct the problem.

Extract of the bug description:

 

Symptom:

[...]

Symptom 2: a client roams to a new AP but is unable to pass traffic. "debug client" shows this message immediately
after the client associates:
*spamApTask4: Mar 06 12:40:24.312: 00:01:02:cd:00:6b apfUpdateDeleteAckInMscb (apf_api.c:54425) Expiring Mobile!
rather than:
*spamApTask4: Mar 06 12:40:01.262: 00:01:02:cd:00:6b Sent dot1x auth initiate message for mobile 00:01:02:cd:00:6b

[...]

View solution in original post

Highlighted

Thanks a lot! The description does perfectly match what I see and our release is listed as affected, so I'm pretty sure this must be it.

Highlighted
VIP Advocate

 

   - Besides other reply , you can also have your debugs analyzed with this tool :

                      https://cway.cisco.com/tools/WirelessDebugAnalyzer/

 M.

Highlighted
Cisco Employee

8.5.164.0 is for the IRCM deployment. As per the TAC recommended AireOS document, this image is not supported for other platforms, nor is it supported for non-IRCM deployments. The CSCvq55777 is fixed in 8.5.161.x escalation image. If you are not running IRCM deployment you should reach out to TAC and they can provide you escalation image with the fix.

Highlighted

Thanks for the input. After headquarters consulted with Cisco, we went with 8.5.161.7.