9800 CLI/SSH wireless show command output slow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2022 05:33 AM
Output of wireless related show commands like 'show wireless client summary detail', 'show ap dot11 5ghz summary', or 'show ap summary' on the 9800 CLI is very slow. It takes ~42 seconds to display data on ~2700 clients on a 9800-40 if you disable paging. This has been the case in 17.3, 17.6, and 17.9.
On the 5520 controller on any AireOS version >= 8.5 'show ap summary', 'show client summary', or 'show advanced 802.11a summary' only takes a couple of seconds at most to output around the same amount of clients and APs. I'm only noticing this because we're running wireless controllers with >=1000 APs and >=2700 clients.
Something is happening on the 9800 in all IOS XE versions that is causing a big performance hit to SSH output of wireless data which makes it very annoying to verify operational parameters for example.
I don't really have time to open a TAC case on this one, but this issue should be extremely easy to reproduce in a lab environment. Someone at Cisco with a lab environment should really spin up some virtual APs and clients and test it - it's a pretty big performance decrease going from 5520 to 9800 which is not what we want to see for a new flagship product I would expect.
- Labels:
-
Catalyst Wireless Controllers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2022 09:45 AM
>...Someone at Cisco with a lab environment should really spin up some virtual APs and clients and test it
This forum is not populated by Cisco employees but by customers, to get that kind of commitment from Cisco you need to create a ticket (TAC). For the time being have a health checkup of the 9800 configuration with the CLI command : show tech wireless , have the output analyzed by https://cway.cisco.com/
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2022 10:51 PM
By any chance all those 1000AP manage by a single WNCd of that controllers or are they distributed among 5x WNCds using different site tag values.
I got 300AP (9800-80) & did not see any such delays.
HTH
Rasika
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2022 04:50 AM
There are 1000 APs across 66 site tags, it's set up with a site tag being used for the set of APs belonging to each building.
I also have a 9800-80 that we're in the progress of migrating to with 8 site tags, 132 APs, and 302 clients (at this moment) if I do "term len 0" on that one and "show wireless client summary" it takes ~4 sec to output the list of 302 clients. On AireOS that would take something like 100ms.
It's the kind of thing that's more annoying than impeding, so that's the reason why I've just posted here for now. I'm fine with just waiting for successive IOS XE releases to fix it at some point.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2022 05:27 AM
- Remember to use WirelessAnalyzer (as per my initial reply) , very informative
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2022 07:56 AM
If this is a problem and others are not seeing this, you should open a TAC case. I don't think future versions will fix your issue if its not a reported issue/bug. Can be something else that might be causing this for you.
*** Please rate helpful posts ***
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2022 05:22 AM
9800-80 with 861 APs and 1573 clients - just tested:
APs about 9-10 seconds
Clients about 6 seconds.
So agreed it may be slightly slower than AireOS but not a problem for us.
Worth checking you have TCP tuned properly in IOS? What works best will depend on your environment but look at ip tcp mss <> - sized to ensure no fragmentation will be required, ip tcp selective-ack, ip tcp window-size 65535 and ip ssh window-size 65535. Note there seems to be a bug with scp download of large files *from* the device which hang at 99% when window is 65535 - in that case reducing to 32768 works. I've never got round to testing the breakpoint or opening a TAC case for it though. Uploading files to the device is not affected though.
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2022 08:48 AM
I think you actually reproduced the problem if you were testing with "show ap summary" and "show wireless client summary". Our results almost the same when looking at it in terms of ratios of output time/ap or output time/client if those commands were used.
I just tested again and the output of "show ap summary" is ~14sec for 999 APs and the output of "show wireless client summary" is ~18sec for 4086 clients.
The ~40 sec at or so time I gave above was for "show wireless client summary detail", and indeed if I run that command on my WLC with the current client count of ~4086 it takes ~65sec.
"show ap dot11 5ghz summary" for 999 APs is ~27sec for me.
All non-wireless-related SSH output from the WLC is not affected. The problem looks to be a performance issue internal to the WLC, especially since wireless show commands that return more data seem to take a lot more time for the same number of APs and clients.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 06:21 AM
This is still an issue in 17.9.3, it gets worse the more clients and APs you have on a WLC. APs are load balanced pretty well across the wncds. CPU utilization on the wncds is low ~12% on the highest wncd over 1 min. Show wireless client summary detail takes about 1m 22s to complete for 5824 clients, and show ap summary takes about 31 seconds for 1439 APs. SSH output itself is fine, I can spam show config for example and it doesn't suffer from this output latency issue - it looks like some kind of database query slowness or IPC slowness issue maybe on SSH but not sure. Anyway, the commands that fetch more data from the back end(s) are slower, so the slowness appears related to the amount of data you're asking the thing for.
Unfortunately TAC is borderline useless to me these days. In the past (Around 2016-2018) they would actually investigate issues and file bug reports but those days seem to be gone except for maybe the high profile/very important customers. I am not one of those so for me TAC is not really much more than a glorified RMA engine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 07:10 AM
- Might be related to a more general issue concerning available resources , the following bunch of commands may be useful to investigate :
show platform resources
show processes cpu platform sorted | ex 0% 0% 0%
show processes memory platform sorted
show processes memory platform accounting
show int po1 | i line protocol|put rate|drops|broadcast (Check volume of traffic received and tx by WLC , e.g.)
show platform hardware chassis active qfp statistics drop (check for packet drops)
show platform hardware chassis active qfp feature wireless punt statistics (Check for packets punted to CPU)
show buffers | i buffers|failures (Check for buffer failures)
show platform hardware chassis active qfp datapath utilization | i Load ((Check Processing Load (pct) below to see the utilization , should not exceed 92 %)
>....TAC is not really much more than a glorified RMA engine.
That 's why we are here (LOL!)
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 07:36 AM
It's a 9800-80, everything is verrry below the limits including the wncd cpu util and overall cpu util. The QFPs are are at 2% util over 1 min and 60min. There aren't any drops at the front panel ports/PHY level going on.
On 9-17-22 Rich R noted above
"9800-80 with 861 APs and 1573 clients - just tested:
APs about 9-10 seconds
Clients about 6 seconds."
Assuming that was just for a "show ap summary" and "show wireless client summary" (which would have to retrieve less data from the wncds I'd imagine) for him it took 0.011 sec per AP for "show ap summary" and 0.003sec per client. If I repeat the test with my values I get 20.00sec/1439 APs = 0.013 sec per AP, and 29.72sec/ 7477 clients = 0.003 sec per client. The values are very close, so it appears that the same issue is happening for the both of us but it just affects me more because I have more APs and more clients.
Also the output of commands which return more data from the underlying DBs like "show ap dot11 5ghz summary" and "show wireless client summary detail" take a lot longer to output per line than the more basic commands. Feels like a database/IPC performance issue to me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 07:26 AM
I will be honest, I don't have many issues with TAC at all. If I don't get traction on a case that I have because it not vary obvious, I usually would escalate to have the BE help out. From what you are seeing, TAC would need to gather the data in order to escalate/get support up to the BE.
*** Please rate helpful posts ***
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 08:26 AM
TAC used to be able to check internally if behavior was expected or not and file bug reports if not but they seem to have lost this capability as far as I can tell.
For example I had another case open with them about a client RA trace showing the client being deleted for IP theft when we have IP theft disabled as an exclusion reason on the WLC. As far as I can tell that's a pretty obvious bug but TAC was just totally ineffective even after being shown the RA trace of it happening and the WLC config with ip theft disabled as an exclusion condition. What was happening was: Android phone, user disabled MAC randomization, reassociates, and gets excluded for IP theft because the phone kept the same IPv6 link-local addr. I have "no wireless wps client-exclusion ip-theft" configured, but WLC still deletes IPv6 thieves regardless of that. I'd rather just use dhcp-required and IPSG to handle that instead of excluding clients because even with a short timer some clients don't handle being deauth'd well and they'll do things like switch to mobile data.
Case is still open because technically it's still a problem, but the problem is rare enough that both TAC and I have given up on it now. It's probably a lot more common of a problem in environments that force users to switch off randomized MAC - which we don't, but one user happened to and claimed it wasn't working which is how I found the issue. Assume a lot of helpdesks would probably just tell people to reboot and hope the exclusion timer ran out by then
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 08:56 AM
- Perhaps this one is related : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwc31406
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2023 09:52 AM
Saw that one, but that bug looked more related to the scenario where you have IP theft exclusion enabled and it's just not working the way it's intended. In my case, I have it disabled but it's only really disabled for IPv4 apparently, IPv6 devices still cause client deletes. Anyway, getting kind of off topic for this thread I guess, sorry.
