cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2203
Views
0
Helpful
11
Replies

WLC Fail-Back (Cisco drops the ball?)

Hello,

We've recently sold and implemented a wireless solution using a WLC, WCS and 1140 APs.

There is a HQ site where the WLC, WCS, DNS and DHCP reside. Active Directory and a RADIUS server are also located there. There is then a WAN link to remote sites which sometimes fails. At the remote sites you'll just have a router, switches and the APs. The intention is for the APs to work in lightweight mode, falling back to H-REAP when the WAN link fails.

That works fine, but what doesn't work fine is the APs rejoining once the WAN link is restored.

They just don't. Even days later, the APs are still all disassociated from the controller despite the WAN link being up. I've 'hardcoded' the controller IP into the AP configuration, while the APs initally get the IP for the WLC from DHCP using Option 43. Despite the APs therefore knowing where WLC is, once they're disassociated from it (WAN link failed) they will not reassociate by themselves. Restarting the APs is the only way to get them to rejoin.

With hundreds of APs and in excess of 30 switches, restarting all the APs each time the WAN link fails is pretty ridiculous.

I've logged a TAC case and gone through the whole rigmarole, this is an offical bug and Cisco have informed us that it's due to be fixed sometime early 2011, but besides that there is nothing they can do to help me. So to be perfectly clear, Cisco have sold and shipped a product that doesn't work as advertised and they best they can offer us is a promise to fix it soon. I'm pretty shocked, I've never had this experience with Cisco in the past.

Ok, so now I've got to come up with a decent workaround until we get a firmware release where this is fixed. I'm looking at using CNA to automate the reloading of all the switches, I guess when an outage is reported I'll just write a procedure for the client to follow to reload all their APs.

Has anyone here come up with something more elegant? A script that can query the associated status of APs and reload them as needed, automatically, would be pretty cool. Perhaps that can be done with SNMP.

Cisconians, tell me your thoughts!

11 Replies 11

Darren Ramsey
Level 4
Level 4

Can you post your WLC code version and the BugId? I have seen this in WAN situations after upgrading to 6.0.199.4. You might try an earlier version 6.0.196.159 or go up to 7.0.98.0. TAC should be able to tell you what versions the bug exists in.

Hi Darren,

Well that sounds exactly like us. We upgraded to the 199 firmware to address a bug with APs dropping their VLAN membership settings.

Specifically the version of firmware on our WLC is 6.0.199.157

I might let the TAC know about this thread and ask for their advice based on the information presented here..

Cheers

weterry
Level 4
Level 4

Were you confirmed to be hitting: CSCtj95360    Single radio h-reap ap not joining back to WLC.

So all your APs are single-radio that are affected, right?

If so, as far as I know this issue is resolved in both the Escalation build 6.0.199.159 (fully TAC supported) as well as the 6MR4 beta candidate. You should be able to get either from TAC. I am frustrated to hear that neither were offered is if this is in fact your problem.

Hello weterry,

We were told we were hitting:

-          CSCtj95360: single radio H-REAP AP not joining back to WLC

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtj95360

-          CSCtd50133: AP taking long time to move from standalone to connected, repeat.

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtd50133


The 'single radio' thing has me wondering. I've said in my OP (and in the ticket to the TAC) that the APs are 1140s. They have two radio interfaces, one for a/n and the other for b/g. Are these single radio APs? They don't sound like it. So why do people keep referring to this bug, if I've stated that the APs are 1140s I would have thought at least the TAC would have picked up wether this bug is relevant or not.

Anyways, you can see the version of firmware we're running above.

If you have an 1142 (a/g/n) then you have a dual-radio AP which should rule out CSCtj95360.

I just looked over the TAC case and I don't see any reference to data that actually shows you are hitting that bug.

Were any debugs actually collected that prove these bugs?

Either way, my suggestion would be that you re-open the TAC case and provide a "show run-config" from WLC and the output of "debug capwap client event" from an AP before/during/after the event in question....

I also need to confirm one thing real quick.  Are we talking about AP Fallback (the act of an AP going from Secondary WLC to Primary WLC) or are we talking about HREAP recovery from Standalone mode to Connected Mode?  I assume the latter.

For the record, either of these concepts are core features. If there was an outstanding bug here, it would be high priority and an image correcting the problem would be made available to you.

Let me know if/when you have a take case open and I'll monitor it from my side.

Thank You,

Wesley Terry

You took a look in my TAC case? Wow, that's a step beyond the help I'd usually expect from an online forum!

They're 1142 APs (just checked the boxes). So that means CSCtj95360 isn't relevant. Hrm.

I can't look up the second bug right now (the Cisco site doesn't seem to be working?) but it says in the title 'Takes a long time..'. Well it doesn't take a long time, it takes infinate time (never rejoins).

So now you’ve got me wondering if either of the bugs we’ve been told we’re effected by are relevant to us and our case.

We’re talking about recovery from Standalone mode to Connected mode. There is only one WLC.

I sent a show-run and some AP debug to the TAC along with my case. I have to say though, I don’t even think they were looked at. The reason I say this is because I mailed the two AP debugs to the engineer assigned to my case, but when I contacted Cisco next the case was recycled to another engineer that knew nothing about the debugs. They didn’t even appear in the case notes.

I’ve got it all still saved locally, so I can produce the debugs easily without having to return to the site and plug in a console cable. In the debugs it shows the error message ‘I don’t have an IP!’ which apparently means that it doesn’t know where WLC is (because it’s still got an IP from DHCP).

I hate to say it, but I really feel like relogging this TAC case but specifically asking for a team other than what I got. Maybe I should relog and point the engineer to this thread.

I’ve already spent a stack of time on this, so when I get authority to do some more on it I’ll relog the TAC case and let you know.

Taking this offline.

weterry

I think this should be online as its relevant.

I have also had several less than brilliant experiences with bugs reported for wireless an support

Pete,

I'll post a follow-up response (with details) once a corrected solution has been provided.

Well, Cisco fixed it. There is another engineering pre-release of the firmware (200) that resolves this issue. I've put in on my WLC and it's working fine, APs re-joining on their own after a WAN outage. Haven't seen any new bugs yet, so I've closed my TAC case and this thread can be closed.

I couldn't believe the mighty Cisco would leave us in the lurch like that, and they didn't. Good show!

Vinay Sharma
Level 7
Level 7

Hello Lord,

Please mark the Question as Answered, if the provided information is correct and it helped. By doing that others can take benefit as well.

Thanks,

Vinay Sharma

Community Manager – Wireless

Thanks & Regards
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card