cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2058
Views
8
Helpful
21
Replies

9800 CLI/SSH wireless show command output slow

jasonm002
Level 1
Level 1

Output of wireless related show commands like 'show wireless client summary detail', 'show ap dot11 5ghz summary', or 'show ap summary' on the 9800 CLI is very slow. It takes ~42 seconds to display data on ~2700 clients on a 9800-40 if you disable paging. This has been the case in 17.3, 17.6, and 17.9.

On the 5520 controller on any AireOS version >= 8.5 'show ap summary', 'show client summary', or 'show advanced 802.11a summary' only takes a couple of seconds at most to output around the same amount of clients and APs. I'm only noticing this because we're running wireless controllers with >=1000 APs and >=2700 clients. 

Something is happening on the 9800 in all IOS XE versions that is causing a big performance hit to SSH output of wireless data which makes it very annoying to verify operational parameters for example.

I don't really have time to open a TAC case on this one, but this issue should be extremely easy to reproduce in a lab environment. Someone at Cisco with a lab environment should really spin up some virtual APs and clients and test it - it's a pretty big performance decrease going from 5520 to 9800 which is not what we want to see for a new flagship product I would expect.

 

 

21 Replies 21

marce1000
VIP
VIP

 

             >...Someone at Cisco with a lab environment should really spin up some virtual APs and clients and test it

 This forum is not populated by Cisco employees but by customers, to get that kind of  commitment from Cisco you need to create a ticket (TAC). For the time being  have a health  checkup of the 9800 configuration with the CLI command : show  tech   wireless , have the output analyzed by  https://cway.cisco.com/tools/WirelessAnalyzer/  , please note do not use classical show tech-support (short version) , use the command denoted in green for Wireless Analyzer.               Checkout all advisories!

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

By any chance all those 1000AP manage by a single WNCd of that controllers or are they distributed among 5x WNCds using different site tag values. 
I got 300AP (9800-80) & did not see any such delays.

HTH
Rasika

There are 1000 APs across 66 site tags, it's set up with a site tag being used for the set of APs belonging to each building.

I also have a 9800-80 that we're in the progress of migrating to with 8 site tags, 132 APs, and 302 clients (at this moment)  if I do "term len 0" on that one and "show wireless client summary"  it takes ~4 sec to output the list of 302 clients. On AireOS that would take something like 100ms.

It's the kind of thing that's more annoying than impeding, so that's the reason why I've just posted here for now. I'm fine with just waiting for successive IOS XE releases to fix it at some point.

 

 

                 - Remember to use WirelessAnalyzer (as per my initial reply)  , very informative

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Scott Fella
Hall of Fame
Hall of Fame

If this is a problem and others are not seeing this, you should open a TAC case.  I don't think future versions will fix your issue if its not a reported issue/bug.  Can be something else that might be causing this for you.

-Scott
*** Please rate helpful posts ***

Rich R
VIP
VIP

9800-80 with 861 APs and 1573 clients - just tested:
APs about 9-10 seconds
Clients about 6 seconds.
So agreed it may be slightly slower than AireOS but not a problem for us.
Worth checking you have TCP tuned properly in IOS?  What works best will depend on your environment but look at ip tcp mss <> - sized to ensure no fragmentation will be required, ip tcp selective-ack, ip tcp window-size 65535 and ip ssh window-size 65535.  Note there seems to be a bug with scp download of large files *from* the device which hang at 99% when window is 65535 - in that case reducing to 32768 works.  I've never got round to testing the breakpoint or opening a TAC case for it though.  Uploading files to the device is not affected though.

I think you actually reproduced the problem if you were testing with "show ap summary" and "show wireless client summary". Our results almost the same when looking at it in terms of ratios of output time/ap or output time/client if those commands were used.

I just tested again and the output of "show ap summary" is ~14sec for 999 APs and the output of "show wireless client summary" is ~18sec for 4086 clients.

The ~40 sec at or so time I gave above was for "show wireless client summary detail", and indeed if I run that command on my WLC with the current client count of ~4086 it takes ~65sec. 

"show ap dot11 5ghz summary" for 999 APs is ~27sec for me.

All non-wireless-related SSH output from the WLC is not affected. The problem looks to be a performance issue internal to the WLC, especially since wireless show commands that return more data seem to take a lot more time for the same number of APs and clients.

 

This is still an issue in 17.9.3, it gets worse the more clients and APs you have on a WLC. APs are load balanced pretty well across the wncds. CPU utilization on the wncds is low ~12% on the highest wncd over 1 min. Show wireless client summary detail takes about 1m 22s to complete for 5824 clients, and show ap summary takes about 31 seconds for 1439 APs. SSH output itself is fine, I can spam show config for example and it doesn't suffer from this output latency issue - it looks like some kind of database query slowness or IPC slowness issue maybe on SSH but not sure. Anyway, the commands that fetch more data from the back end(s) are slower, so the slowness appears related to the amount of data you're asking the thing for.

Unfortunately TAC is borderline useless to me these days. In the past (Around 2016-2018) they would actually investigate issues and file bug reports but those days seem to be gone except for maybe the high profile/very important customers. I am not one of those so for me TAC is not really much more than a glorified RMA engine.

 

 

 - Might be related to a more general issue concerning available resources  , the following bunch of commands may be useful to investigate :
               show platform resources
               show processes cpu platform sorted | ex 0%      0%      0%
               show processes memory platform sorted
               show processes memory platform accounting

              show int po1 | i line protocol|put rate|drops|broadcast (Check volume of traffic received and tx by WLC , e.g.)
              show platform hardware chassis active qfp statistics drop (check for packet drops)
              show platform hardware chassis active qfp feature wireless punt statistics (Check for packets punted to CPU)

             show buffers | i buffers|failures (Check for buffer failures)
             show platform hardware chassis active qfp datapath utilization | i Load ((Check Processing Load (pct) below to see the utilization , should not exceed 92 %)

               >....TAC is not really much more than a glorified RMA engine.
                           That 's why we are here (LOL!)

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

It's a 9800-80, everything is verrry below the limits including the wncd cpu util and overall cpu util. The QFPs are are at 2% util over 1 min and 60min. There aren't any drops at the front panel ports/PHY level going on.

On 9-17-22 Rich R noted above

"9800-80 with 861 APs and 1573 clients - just tested:
APs about 9-10 seconds
Clients about 6 seconds."

Assuming that was just for a "show ap summary" and "show wireless client summary" (which would have to retrieve less data from the wncds I'd imagine) for him it took 0.011 sec per AP for "show ap summary" and 0.003sec per client. If I repeat the test with my values I get 20.00sec/1439 APs = 0.013 sec per AP, and 29.72sec/ 7477 clients = 0.003 sec per client. The values are very close, so it appears that the same issue is happening for the both of us but it just affects me more because I have more APs and more clients.

Also the output of commands which return more data from the underlying DBs like "show ap dot11 5ghz summary" and "show wireless client summary detail" take a lot longer to output per line than the more basic commands. Feels like a database/IPC performance issue to me.

 

 

 

 

I will be honest, I don't have many issues with TAC at all.  If I don't get traction on a case that I have because it not vary obvious, I usually would escalate to have the BE help out.  From what you are seeing, TAC would need to gather the data in order to escalate/get support up to the BE.

-Scott
*** Please rate helpful posts ***

TAC used to be able to check internally if behavior was expected or not and file bug reports if not but they seem to have lost this capability as far as I can tell.

For example I had another case open with them about a client RA trace showing the client being deleted for IP theft when we have IP theft disabled as an exclusion reason on the WLC. As far as I can tell that's a pretty obvious bug but TAC was just totally ineffective even after being shown the RA trace of it happening and the WLC config with ip theft disabled as an exclusion condition. What was happening was: Android phone, user disabled MAC randomization, reassociates, and gets excluded for IP theft because the phone kept the same IPv6 link-local addr. I have "no wireless wps client-exclusion ip-theft" configured, but WLC still deletes IPv6 thieves regardless of that. I'd rather just use dhcp-required and IPSG to handle that instead of excluding clients because even with a short timer some clients don't handle being deauth'd well and they'll do things like switch to mobile data. 

Case is still open because technically it's still a problem, but the problem is rare enough that both TAC and I have given up on it now. It's probably a lot more common of a problem in environments that force users to switch off randomized MAC - which we don't, but one user happened to and claimed it wasn't working which is how I found the issue. Assume a lot of helpdesks would probably just tell people to reboot and hope the exclusion timer ran out by then

 

 

 

 

 

                - Perhaps this one is related : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwc31406

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Saw that one, but that bug looked more related to the scenario where you have IP theft exclusion enabled and it's just not working the way it's intended. In my case, I have it disabled but it's only really disabled for IPv4 apparently, IPv6 devices still cause client deletes. Anyway, getting kind of off topic for this thread I guess, sorry.

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card