cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
23032
Views
46
Helpful
104
Replies

Poor 5Ghz throughput - 9800 / 17.9.4a / 1800-series APs

Stuart Patton
Level 1
Level 1

Hi,

Per the title, I've got an issue with 5Ghz clients that have good signal strength because they are sat almost underneath the AP (1832 APs) yet but struggling to get more than 1Mbps throughput.  I can see the channel utilisation is over 50% but zero TX/RX, which I assume is a measurement of wifi frames in the channel?  We have sub 1ms latency from WLC to the AP on the wired network.

I'm not sure if this has only happened since upgrading to 17.9.4a from 17.9.3 (only done to close IOS XE HTTPS vuln).  Out of curiosity, would upgrading WLC and APs cause all APs to change channel or do they remember their last channel and use it post upgrade?   Reason for asking is that I have the same issue on multiple APs, which we managed to fix by manually changing the channel.

From what I can understand from the output below, there are no 5Ghz wifi interferers, RSSI is good and SNR is excellent.  Does this mean it's likely to be a non-wifi interferer?  

We do have Ekahau Pro and a Sidekick, so we can try to identify get to this location and scan the 5Ghz bands to see if we can see an interferer.

 

MA-WLC-HA#show ap name MGH.1.27.AP24 auto-rf dot11 5ghz
###################################################################

Number of Slots : 2
AP Name : MGH.1.27.AP24
MAC Address : 501c.b0b1.b3a0
Ethernet MAC Address : 501c.b0b0.5368
Slot ID : 1
Radio Type : 802.11ac
Subband Type : All

Noise Information
Noise Profile : Passed
Channel 36 : -102 dBm
Channel 40 : -102 dBm
Channel 44 : -102 dBm
Channel 48 : -102 dBm
Channel 52 : -103 dBm
Channel 56 : -102 dBm
Channel 60 : -103 dBm
Channel 64 : -102 dBm
Channel 100 : -102 dBm
Channel 104 : -102 dBm
Channel 108 : -102 dBm
Channel 112 : -102 dBm
Channel 116 : -100 dBm
Channel 132 : -100 dBm
Channel 136 : -101 dBm
Channel 140 : -101 dBm

Interference Information
Interference Profile : Passed
Channel 36 : -128 dBm @ 0% busy
Channel 40 : -128 dBm @ 0% busy
Channel 44 : -128 dBm @ 0% busy
Channel 48 : -128 dBm @ 0% busy
Channel 52 : -128 dBm @ 0% busy
Channel 56 : -128 dBm @ 0% busy
Channel 60 : -128 dBm @ 0% busy
Channel 64 : -128 dBm @ 0% busy
Channel 100 : -128 dBm @ 0% busy
Channel 104 : -128 dBm @ 0% busy
Channel 108 : -128 dBm @ 0% busy
Channel 112 : -128 dBm @ 0% busy
Channel 116 : -128 dBm @ 0% busy
Channel 132 : -128 dBm @ 0% busy
Channel 136 : -128 dBm @ 0% busy
Channel 140 : -128 dBm @ 0% busy

Rogue Histogram (20/40/80)
Channel 36 : 0/ 0/ 0
Channel 40 : 0/ 0/ 0
Channel 44 : 0/ 0/ 0
Channel 48 : 0/ 0/ 0
Channel 52 : 0/ 0/ 0
Channel 56 : 0/ 0/ 0
Channel 60 : 0/ 0/ 0
Channel 64 : 0/ 0/ 0
Channel 100 : 0/ 0/ 0
Channel 104 : 0/ 0/ 0
Channel 108 : 0/ 0/ 0
Channel 112 : 0/ 0/ 0
Channel 116 : 0/ 0/ 0
Channel 132 : 0/ 0/ 0
Channel 136 : 0/ 0/ 0
Channel 140 : 0/ 0/ 0

Load Information
Load Profile : Passed
Receive Utilization : 0%
Transmit Utilization : 0%
Channel Utilization : 55%
Attached Clients : 7 clients

Coverage Information
Coverage Profile : Passed
Failed Clients : 0 clients

Client Signal Strengths
RSSI -100 dBm : 0 clients
RSSI -92 dBm : 0 clients
RSSI -84 dBm : 0 clients
RSSI -76 dBm : 0 clients
RSSI -68 dBm : 0 clients
RSSI -60 dBm : 2 clients
RSSI -52 dBm : 5 clients

Client Signal to Noise Ratios
SNR 0 dB : 0 clients
SNR 5 dB : 0 clients
SNR 10 dB : 0 clients
SNR 15 dB : 0 clients
SNR 20 dB : 0 clients
SNR 25 dB : 0 clients
SNR 30 dB : 0 clients
SNR 35 dB : 0 clients
SNR 40 dB : 1 clients
SNR 45 dB : 6 clients

Nearby APs
AP 501c.b0b1.9daf slot 1 : -34 dBm on ( 64, 20 MHz) (10.90.160.116)
AP 501c.b0b1.a2ef slot 1 : -53 dBm on (140, 20 MHz) (10.90.160.116)
AP 501c.b0b1.bd2f slot 1 : -54 dBm on ( 36, 20 MHz) (10.90.160.116)
AP 501c.b0b1.9d4f slot 1 : -76 dBm on ( 64, 20 MHz) (10.90.160.116)
AP 501c.b0b1.b78f slot 1 : -79 dBm on (100, 20 MHz) (10.90.160.116)
AP 501c.b0b1.ba0f slot 1 : -82 dBm on ( 48, 20 MHz) (10.90.160.116)
AP 501c.b0b1.bc2f slot 1 : -82 dBm on ( 64, 20 MHz) (10.90.160.116)

Radar Information
Channel changes due to radar : 0

Channel Assignment Information via DCA
Current Channel Average Energy : -86 dBm
Previous Channel Average Energy : -86 dBm
Channel Change Count : 0
Last Channel Change Time : 11/02/2023 23:14:35
Recommended Best Channel : 132

RF Parameter Recommendations
Power Level : 6
RTS/CTS Threshold : 2347
Fragmentation Threshold : 2346
Antenna Pattern : 0

Persistent Interference Devices
Class Type Channel DC (%%) RSSI (dBm) Last Update Time
------------------------- ------- ------ --------- ----------------
All third party trademarks are the property of their respective owners.

MA-WLC-HA#show ap name MGH.1.27.AP24 neighbor summary
BSSID Channel Channel-width Slot RSSI Last-Heard SSID Neighbour
--------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------
b095.7582.a5ca 3 40 Mhz 0 -88 11/16/2023 09:32:27 TP-Link_A5CA FALSE

MA-WLC-HA#

Thanks,

Stuart

104 Replies 104

Actual soft reboots did not work for us. We successfully tested both a path of denying power to the port for five minutes and rebooting their associated switches.

@casanavep, thank you for the reminder on that. For some reason, I didn't get an email when you replied... if I had, I would have saved myself a trip to the res hall to address a complaint and a lot of (continued) anger towards Cisco, haha. I have the switches in that building scheduled to reboot tomorrow morning and I'll see how things look after.

@eglinsky2012 a lot of community notifications are going down a black hole at the moment.  We have raised an issue for it and Cisco community team are investigating with the software vendor they use for the community boards but most of Cisco US are shut down this week so unlikely we'll get any update before next week.

@casanavep, no such luck after the switch reboot in the first building early this morning. Today I shut off PoE to all the APs in a different, empty building (0 total wireless clients) twice and still high interference/utilization, even after a DCA restart. Guess I'll open that TAC case after all.

Leo Laohoo
Hall of Fame
Hall of Fame

We've been using 17.12.3 for the last 12 weeks (or so).  During those time we've found several "discrepancies": 

Firstly, my favourite subject:  AP load and memory leak.  

9800-80, 17.12.3, 3050 x APs, Uptime:  6 weeks9800-80, 17.12.3, 3050 x APs, Uptime: 6 weeks

To anyone on a 9800-40 or 9800-80:  Do not load more than 50% APs or the WLC will memory leak.  This means, for a 9800-80, do not put more than 3000 AP count.  In the graph (above), 16 May 2024 is when we added APs, taking the total from 2900, to 3060.

Next, with an uptime of 4 weeks, all our 2800/3800/4800/1560 started spamming our switches with "duplex mismatch" error.  The good news to this is TAC was able to reproduce this in their lab.  And it took them only 4 calendar days (uptime) for this bug to kick in.  

The next bug we've seen is in regards to DCA/RRM.  We believe that DCA/RRM will silently crash and stop processing.  All our APs, for example will join the controller on Channel 36.  Read it and weep.  All.  APs.  Channel.  36.  The Workaround to this bug, you guessed it, reboot the controller every four weeks.   We've reported this DCA/RRM issue to TAC.  

Just for hahas, are are the CPU and memory of our two 9800-80 pairs since upgrading from 17.9.4a to 17.9.5 in late June. The number of APs has been fluctuating due to new installs, but between 1,500 and 2,000. Note the big dip in memory after.

9800-1 CPU:

eglinsky2012_0-1721944360445.png

9800-1 memory:

eglinsky2012_1-1721944394614.png

9800-2 CPU:

eglinsky2012_2-1721944419889.png

9800-2 memory:

eglinsky2012_3-1721944440077.png

 

 

Is this the control- or data-plane memory. 

My graph is taken from the control-plane memory.  

@Leo Laohoo Good question that I'm not sure the answer to. The memory graphs above are from the "Cisco CPU" datasource, CPU R0/0. The graphing software is LogicMonitor.

Below are graphs from the "Cisco Memory Pools" datasource. There's a top 10 graph for % that has lines for "Reserve Processor" "Processor," and also a "Total Memory" graph, which shows a similar trend to the "Top 10" graph. Here are the "Total Memory" graphs. Maybe you can tell which is which based off the ~9.3GB total available?

9800-1:

eglinsky2012_0-1722006511994.png

9800-2:

eglinsky2012_1-1722006537027.png

 

Please compare the value to the control-plane memory &/or CPU utilization: 

sh platform resource
sh platform software status con brief

I suspect the graph is the data-plane.  


@Leo Laohoo 
@Leo Laohoo wrote:

Please compare the value to the control-plane memory &/or CPU utilization: 

 

 

sh platform resource
sh platform software status con brief

 

 

I suspect the graph is the data-plane.  


Here are the outputs:

 

 

 

9800-Pair1#sh platform resource
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource                 Usage                 Max             Warning         Critical        State
----------------------------------------------------------------------------------------------------
RP0 (ok, active)                                                                               H
 Control Processor       4.31%                 100%            80%             90%             H
  DRAM                   9816MB(15%)           62891MB         88%             93%             H
  harddisk               0MB(0%)               0MB             80%             85%             H
ESP0(ok, active)                                                                               H
 QFP                                                                                           H
  TCAM                   78cells(0%)           1048576cells    65%             85%             H
  DRAM                   615698KB(14%)         4194304KB       85%             95%             H
  IRAM                   14764KB(11%)          131072KB        85%             95%             H
  CPU Utilization        1.00%                 100%            90%             95%             H



9800-Pair1#sh platform software status con brief
Load Average
 Slot  Status  1-Min  5-Min 15-Min
1-RP0 Healthy   0.76   0.97   1.06
2-RP0 Healthy   1.11   0.69   0.70

Memory (kB)
 Slot  Status    Total     Used (Pct)     Free (Pct) Committed (Pct)
1-RP0 Healthy 64400888 10052484 (16%) 54348404 (84%)  18474780 (29%)
2-RP0 Healthy 64400888  7204396 (11%) 57196492 (89%)  15250308 (24%)

CPU Utilization
 Slot  CPU   User System   Nice   Idle    IRQ   SIRQ IOwait
1-RP0    0   1.90   0.70   0.00  97.40   0.00   0.00   0.00
         1   2.50   1.10   0.00  96.40   0.00   0.00   0.00
         2   2.60   0.60   0.00  96.80   0.00   0.00   0.00
         3   1.60   1.10   0.00  97.10   0.00   0.20   0.00
         4   3.59   1.19   0.00  95.20   0.00   0.00   0.00
         5   5.59   1.79   0.00  92.60   0.00   0.00   0.00
         6   4.89   1.29   0.00  93.80   0.00   0.00   0.00
         7   4.00   1.10   0.00  94.90   0.00   0.00   0.00
         8   4.30   1.00   0.00  94.70   0.00   0.00   0.00
         9   2.70   1.50   0.00  95.80   0.00   0.00   0.00
        10   3.20   1.10   0.00  95.70   0.00   0.00   0.00
        11   2.60   0.70   0.00  96.70   0.00   0.00   0.00
        12   0.99   0.99   0.00  96.40   0.00   1.59   0.00
        13   2.09   0.99   0.00  96.70   0.00   0.19   0.00
        14   6.09   1.49   0.00  92.40   0.00   0.00   0.00
        15   3.90   4.90   0.00  91.20   0.00   0.00   0.00
        16   2.60   0.90   0.00  96.49   0.00   0.00   0.00
        17   2.70   0.70   0.00  96.59   0.00   0.00   0.00
        18   2.40   0.80   0.00  96.79   0.00   0.00   0.00
        19   1.40   0.70   0.00  97.39   0.00   0.50   0.00
        20   5.09   0.69   0.00  94.10   0.00   0.09   0.00
        21   2.29   0.79   0.00  96.90   0.00   0.00   0.00
        22   2.89   0.79   0.00  96.20   0.00   0.09   0.00
        23   1.19   0.39   0.00  98.20   0.00   0.19   0.00
2-RP0    0   0.30   0.60   0.00  99.10   0.00   0.00   0.00
         1   1.69   0.59   0.00  97.70   0.00   0.00   0.00
         2   0.79   0.29   0.00  98.90   0.00   0.00   0.00
         3   0.49   0.39   0.00  99.10   0.00   0.00   0.00
         4   0.39   0.49   0.00  99.10   0.00   0.00   0.00
         5   2.80   1.50   0.00  95.70   0.00   0.00   0.00
         6   1.10   0.40   0.00  98.50   0.00   0.00   0.00
         7   0.49  12.28   0.00  87.21   0.00   0.00   0.00
         8   2.00   0.80   0.00  97.20   0.00   0.00   0.00
         9   2.00   3.20   0.00  94.80   0.00   0.00   0.00
        10   3.10   0.60   0.00  96.30   0.00   0.00   0.00
        11   1.40   0.40   0.00  98.20   0.00   0.00   0.00
        12   1.50   0.60   0.00  97.90   0.00   0.00   0.00
        13   1.40   0.50   0.00  98.10   0.00   0.00   0.00
        14   2.39   1.49   0.00  96.10   0.00   0.00   0.00
        15   1.19   0.49   0.00  98.30   0.00   0.00   0.00
        16   0.60   0.40   0.00  98.99   0.00   0.00   0.00
        17   1.00   0.80   0.00  98.20   0.00   0.00   0.00
        18   1.80   0.80   0.00  97.40   0.00   0.00   0.00
        19   1.70   0.30   0.00  98.00   0.00   0.00   0.00
        20   0.49   0.49   0.00  99.00   0.00   0.00   0.00
        21   1.00   0.30   0.00  98.70   0.00   0.00   0.00
        22   5.39   1.99   0.00  91.30   0.00   1.29   0.00
        23   1.80   1.00   0.00  97.19   0.00   0.00   0.00

 

 

Here are the corresponding graphs -

Cisco CPU datasource, CPU (R0/0) busy %:

eglinsky2012_1-1722630783372.png

Cisco CPU datasource, top 10 by memory used (R0/0):

eglinsky2012_4-1722631267304.png

 

Cisco CPU datasource, memory usage:

eglinsky2012_2-1722630849360.png

The above three seem to show the same information, just presented as % and GB in different formats. Below is where it differentiates:

Cisco Memory Pools datasource, processor:

eglinsky2012_3-1722631004013.png

 

 

eglinsky2012
Level 4
Level 4

Ouch. Guess 17.12.3 isn't the holy grail version after all, which is disappointing given the positive feedback some of you gave it early on.

Yup, I'm friggin' "disappointed" with 17.12.3 right now and no other options on the table either.

patoberli
VIP Alumni
VIP Alumni

17.12.4 has now seen the light of day:

https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-12/release-notes/rn-17-12-9800.html#resolved-caveats-for-cisco-ios-xe-dublin-17.12.3

Contains fixes for various high channel load bugs. Also some memory leaks should be fixed. Nothing about the redundancy-no-configuration bug CSCwj73634 mentioned in the release notes, even though the bug mentions 17.12.4 as the fixed release.


@patoberli wrote:
17.12.4 has now seen the light of day:

I count three wncd/wncmgrd process that are in the Open Caveats that are very concerning:  CSCwj93153CSCwj30587 & CSCwj93876

Risk assessment:
- CSCwj93153 is a typo in the release notes.  Wrong description and is actually fixed in 17.12.4.  It's been there since at least 17.9.  It's a crash and reload so we would have noticed it if it happened often and only 4 cases attached so I'd say it's quite rare.  Probably a victim of their own slow updating of the bug details because the bug was only updated 26th July so when they wrote the release notes probably still looked like it hadn't been fixed <laugh>
5th August 2024: they've updated the release notes to show this is fixed not open.
- CSCwj30587 makes me laugh - the conditions "WLC with CAPWAP enabled with constant messaging" - in other words normal WLC operation!  But we can tell from the workaround section that conditions should actually say "with mesh APs".  So it's only going to affect you if you have mesh APs, and it sounds like the more mesh APs you have the higher the risk of hitting it.  But only 2 cases since it was opened in March so still not very common. Medium risk if you use mesh APs.
- CSCwj93876 open since May, only 3 cases, so fairly rare - I'd say low risk.

I'm more concerned about CSCwk03445, CSCwj39057, CSCwj42305, CSCwj80614 (ip overlap essential for us), CSCwi04855 (open since last year, 14 cases), CSCwk05030 is another that's actually fixed, CSCwi42059 open since last year, 7 cases but now closed as Unreproducible - meaning software devs don't know how to fix it even though bug notes show TAC have observed it happening repeatedly and know how to clear it!
5th August 2024: they've updated the release notes to show CSCwk05030 is fixed not open.

Review Cisco Networking for a $25 gift card