cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3320
Views
15
Helpful
6
Replies

9800 Serious Problems

neteng1
Level 3
Level 3

We spent several months designing a move to the 9800 WLC according to Cisco best practices. We currently have over 4000 APs on a 9800-80 controller. The recommendation is less than 500 APs per site tag. This required us to reuse the same site tag configuration for many sites.

 

Shortly after a large migration, APs within several sites stopped allowing client connections. Clients were stuck in 'Associating' state. The workaround we discovered was to change site tags. There is no configuration difference between sites.

 

This is a reproducible problem that we opened a Sev 1 case with Cisco about. They observed the problem and discovered associated logs. However, we are going on two weeks without a resolution and faced with migrating back to 8540 controllers. Does anyone have similar experience or this size environment deployed on 9800?

 

Edit: The latest recommendation from Cisco is to try reducing to less than 200 APs per site tag.

6 Replies 6

Arshad Safrulla
VIP Alumni
VIP Alumni

Assuming that your RF environment is perfect, could you share the below

Which IOS-XE code running in 9800-80?

Which AP models are impacted?

Which clients are impacted?

What is the client driver version? 

What mode AP's are deployed in?

Is all 4000AP's in the same campus or you have remote locations as well?

Do you have FT enabled?

Are you using WPA3?

What authentication mechanism impacted SSID's employ? (EAP-TLS,EAP-PEAP,PSK, Open etc)

What does the Radio Active trace say? 

Hi, we're running 17.3.4. This problem is not isolated to specific APs, clients, or WLANs. It affects all connections assigned to a given site tag. Changing site tag temporarily resolves the problem, even when AP Join Profile is the same. We have determined is not related to any configuration from Policy Tag or RF Tag.

We haven't done anything to that scale yet.

The guidelines included in the 9800 migration webinar series a few months back are pages 51-54 of the Session 1 presentation at https://web.cvent.com/event/bcba04b5-6a9b-4a17-ac1e-ae718fd184bd/websitePage:332afdf8-3ce9-492a-bc88-102ec737bf1e

There's more info in the Session 5 presentation pages 9-15 and as per @Arshad Safrulla 's question above your exact design and architecture matters a LOT.  For example - page 14:
- Don't use the same site tag across multiple Flex sites

- If support for Fast Seamless Roaming (802.11r, CCKM, OKC) is needed, then the max number of APs per site-tag for a Flex
site is 100

So that limit of 500 is a general number for a basic local mode deployment but you've really not given any proper details of your design/deployment.  You mentioned using the same site tag for multiple sites so you may already have breached the design best practice guidelines.  Suggest you have a good read through those documents and see whether you need to revise your design.

 

------------------------------
Please click Helpful if this post helped you and Accept as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's   and   TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's,   Best Practices for 9800 WLC's   and   Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
Field Notice: FN74383 APs Running 17.12.4/5/6/6a May Run Out of Flash Space Preventing Upgrades
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

Thank you. The document you provided references a syslog about WNCD overload, which we have no history of. I will clarify, we are using the same Join Profile, but different site tags across our environment. All APs are in local mode. We have adjusted all site tags to less than 200 APs per tag based on TAC recommendation. TAC did find the following log which they have an internal bug for.

 

2021/07/30 13:30:06.694250 {wncd_x_R0-3}{1}: [radius] [20560]: (ERR): RSPE- Crete New Socket Data : Dynamic socket pool limit reaced Max : 96

 

We still have an open case and I will post if a fix is discovered.

Ok thanks will be interesting to hear the outcome as we'll be looking to scale up at some point.

------------------------------
Please click Helpful if this post helped you and Accept as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's   and   TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's,   Best Practices for 9800 WLC's   and   Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
Field Notice: FN74383 APs Running 17.12.4/5/6/6a May Run Out of Flash Space Preventing Upgrades
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

This is the bug Cisco provided. It is marked as Catastrophic. The status says fixed, but I have not been provided a service pack yet. The workaround is to disable AAA Accounting.

 

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvz30708

Review Cisco Networking for a $25 gift card