cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1435
Views
5
Helpful
8
Replies

Controller 5500 + LAP roaming problem

telecom2012
Level 1
Level 1

Hi,We are experiencing a LAP roaming problem.
We  are using 4 Cisco LAP1042N (remote office) + Wireless LAN Controller 5500 (Main office). Using WPA2 + 802.1X as the security protection. The mobile devices are Intermec CN3 palms used to read price lables.
When using under the same LAP the application goes great, but when roaming from one to another it looses 5 or 6 ping packets and at this point the application keeps "wainting" and user goes crazy.
The mobile devices use, for all LAP´s, allways the same SSID/vlan.
When using a test SSID with WPA2 + PSK roaming from one to another it looses 1 ping packet, but we have security req. and need to use 802.1X.

1) When using WPA2 + 802.1X every roaming makes the client goes all the way to the radius server ? Or it just goes to the Controller and back to the client ?


2) What protocol should be enabled on qos (radius, capwap) ?

3) Any best practice guide for this ?

8 Replies 8

Justin Kurynny
Level 4
Level 4

Waldir,

UPDATED: See Wesley Terry's post later in this thread.

As long as your 4 APs are all associated to the same controller, there should be no involvement with the AAA server when the client roams from one AP to the next. When more than one controller is involved and you don't have mobility groups set up between controllers properly, then roams can can take longer because reauth has to happen. Your setup is one of the most basic examples of L2 roaming and you shouldn't be seeing this kind of latency.

Some questions for you:

  • What version of code is your 5508 running?

  • What is the bandwidth between your remote office and your main office? Is it a congested or delayed link (do you see packet drops on any of the WAN/tunnel interfaces)?

  • Do you have client load balancing enabled on the WLAN? If so, turn it off. You don't want your APs sending association reject messages if quick roaming between APs is important.

  • Where is your AAA server in relation to the WLC? Is there high latency between the controller and the AAA server?

  • What is your session timeout on the WLAN? If it's short, it will force client reauths frequently, and roaming or no roaming, this can delay packets.

This is the mobilty design guide. The whole document is probably overkill for your purposes, so start out with the section on roaming. It is basically a best practices manual for delay-sensitive traffic in a multi-AP, multi-WLC environment.

http://www.cisco.com/en/US/docs/solutions/Enterprise/Mobility/emob41dg/emob41dg-wrapper.html

QoS is definitely a consideration for low-latency applications and now CAPWAP for high-bandwith APs, especially over bottlenecked links like WAN connections, but it depends on how much total client traffic is coming and going over wireless, how much other non-wireless traffic is on the link, and how much that's congesting your end-to-end connection between APs and the controller. If CAPWAP control traffic is getting starved out, then you're going to have problems with APs staying joined to the WLC. For RADIUS, it's not as important if you have plenty of pipe between your WLC and AAA server, especially if they are attached to the same switch or live in the same datacenter.

One final note--if you are running your AAA server on an old TRS-80, don't expect it to pump out access-accept messages too quickly. The RADIUS protocol is by nature unreliable (UDP), so you'll want to make up for it (at least a little) by giving your AAA server lots of horsepower.

Justin

weterry
Level 4
Level 4

Unfortunately my response is going to differ from Justin's in some points.

If you are doing 802.1x, then the expectation is that every roam is going to involve your AAA server unless your client is doing some method of Fast Secure Roaming. This is true for APs on the same WLC or APs on different WLCs within the same mobility group. Typically this shouldn't take but a few hundred milliseconds, so if your dot1x is taking several seconds to completely then I'd be questioning that process in general. 

Anyhow, WPA2 defines methods of Fast Secure Roaming which removes the need to re-authenticate through a AAA server every roam. Most clients do one of two methods of FSR with WPA2 but they work differently and prior to 7.2 code, only Opportunistic Key Caching is supported.  In 7.2 I believe we now have limited support for Sticky-Key caching but this method still isn't quite as good as OKC (in my opinion).

What you really should do is a "debug client " on the WLC.

You should do this for the client in PSK WLAN and then you should do this for a client on the 802.1x wlan (catch several roams) as this is going to tell you how long the association/authentication is occuring.

If your client associates and goes into a RUN state in 2 seconds,  but your client is down for 10 seconds, its something else going on....

If it is taking 10 seconds for the association to complete per the debug, then you should be able to see where the hold up is occuring.

Capture the debug and post it here if you want opinions on how long the association is taking....

By the way, how does this work for other 802.1x clients on this network? Exact same results or better?

Thank You,

Wesley Terry

Wesley,

Nothing unfortunate about it. You are absolutely right about 802.1x re-auth on every roam, regardless of controller configuration, inter-controller relationship, etc. There have been some great discussions in this forum, and on the web generally, about dot1x roaming. Thanks for chiming in and getting me motivated to do the deep reading!

Justin

Wesley,

As earlier indicated, the client will always have to re-authenticated when roaming and that should not be noticeable on data but could have impact for voice. Have tried creating HREAP groups and see if it improves performance

Thanks for the updates. Here are some other info:

  

WLC ver is 7.0.116.0
Client load balancing is not enabled
AAA server is behind WLC, both at main office. The is no latency between then. Maybe from remote

office to WLC (ping avg 35 with some at 400ms).
Session timeout on the WLAN is default = 1800
The debug´s were done for the 801x client.
WPA2-PSK client debug to come.

AP01 d0:c2:82:f7:56:e0

AP02 d0:c2:82:f1:89:80

AP03 d0:c2:82:f7:3e:20

AP04 d0:c2:82:f7:4b:20

502-coletor2.txt.zip is interesting file.

12:15:24.039: 00:0b:6b:b4:e7:0a Reassociation        

12:15:24.140: 00:0b:6b:b4:e7:0a Sending EAP-Request/Identity  <-- sticky key client,  no fast roam, we send ID REQ

12:15:25.183: 00:0b:6b:b4:e7:0a Sending EAP Request from AAA to mobile

12:15:25.502: 00:0b:6b:b4:e7:0a Received EAPOL EAPPKT from mobile   <-- 400ms delay during part of EAP

12:15:25.857: 00:0b:6b:b4:e7:0a Sending EAPOL-Key Message to mobile 00:0b:6b:b4:e7:0a

                    state PTKINITNEGOTIATING (message 3),

12:15:26.115: 00:0b:6b:b4:e7:0a Reassociation   <-- client didn't respond to M3, it did a new association.

12:15:27.791: 00:0b:6b:b4:e7:0a 10.31.32.11 L2AUTHCOMPLETE (4) Change state to RUN (20) last state RUN (20)

So just looking at the above, client roams to an AP, does sticky-key which we cant computer a fast roam from, we do a full EAP authentication (with a little more delay than usual) then after EAP is done the client reassociates before the rest of the encryption was setup...

So there are almost 4 full seconds of no connectivity here.....

Is that what matches what the client saw?

You could test 7.2 and see if sticky-key support helps out, but the fact of the matter here is that your client is roaming while in the middle of association/authentication which is basically wasting 2 seconds and having to start all over again..

I would question the client vendor while this occurs.....

Hi, we have limited control over QoS due to ISP contract used today, so QoS changes at main office LAN made no big difference. Now ew have created a specific SSID with 8021x+CCKM and enabled CCKM at the mobile device. User said performance is not bad but this was just a 1st fast test. I just wanted to be sure (from this log) that that full authentication is now  partial when going from on AP to another.

ap1 d0:c2:82:f7:56:e0

ap2 d0:c2:82:f1:89:80

ap3 d0:c2:82:f7:3e:20

ap4 d0:c2:82:f7:4b:20

Thanks

Client does not appear to be doing CCKM.  I see it doing WPA2 FSR, but sometimes it times out on the 4-way handshake and then it tries to start a full EAP exchange (but I guess this errors out...... I dont like that "unable to find a valid PMK" because this appears related to a different WLC software issue,   either way the client is being stupid for started EAP when he should have just done a handshake)

00:0b:6b:ad:c6:38 Retransmit 1 of EAPOL-Key M1 (length 121) for mobile 00:0b:6b:ad:c6:38

00:0b:6b:ad:c6:38 Received EAPOL START from mobile 00:0b:6b:ad:c6:38

00:0b:6b:ad:c6:38 Disable re-auth, use PMK lifetime.

00:0b:6b:ad:c6:38 dot1x - moving mobile 00:0b:6b:ad:c6:38 into Connecting state

00:0b:6b:ad:c6:38 Sending EAP-Request/Identity to mobile 00:0b:6b:ad:c6:38 (EAP Id 1)

00:0b:6b:ad:c6:38 Received EAPOL-Key from mobile 00:0b:6b:ad:c6:38

00:0b:6b:ad:c6:38 Unable find a valid PMK for mobile 00:0b:6b:ad:c6:38

00:0b:6b:ad:c6:38 Received EAPOL-Key from mobile 00:0b:6b:ad:c6:38

00:0b:6b:ad:c6:38 Unable find a valid PMK for mobile 00:0b:6b:ad:c6:38

You say these clients support CCKM?  What is it?

Maybe it doesn't do CCKM with WPA2... as alot of devices (like Cisco Phones) originally only did CCKM with WPA-TKIP. 

If it was doing CCKM, we shouldnt see the 4-way handshake,   which is what makes CCKM even faster or more reliable (no chance for key timeouts)

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card